ProtCID provides different types of coordinates including
chain interface clusters, domain interface clusters, peptide interface clusters and Pfam-ligand interactions in PDB format,
also provides sequence files of interface clusters in fasta format and cluster data in text format.
Download coordinates of cluster interface files
The coordinates of interface clusters can be downloaded from the cluster page.
The coordindate file of interface files of each cluster is downloaded by clicking the Cluster ID.
The interface files of all clusters by clicking the "Download Interface Files" button at the top right of each interface clusters page.
Chain Interface clusters
The coordinate file of a chain group is named by Group name + "_" + group ID + ".tar" which includes coordinates of all clusters.
The coordinate file of a chain cluster is named by Group name + "_" + group ID + "_" + cluster ID + ".tar.gz".
For example, (V_ATPase_I)(V-ATPase_H_N)_(V-ATPase_H_C)_7151.tar.gz contains coordinates of all clusters in a heterodimer chain group.
The Pfam architecture of one chain is (V_ATPase_I) and the Pfam architecture of the other chain is (V-ATPase_H_N)_(V-ATPase_H_C),
the group ID is 7151. The maximum length of a chain group file name is not more than 128 characters.
The first cluster is named by (V_ATPase_I)(V-ATPase_H_N)_(V-ATPase_H_C)_7151_1.tar.gz by adding "_1" before ".tar.gz".
Domain Interface clusters
The coordinate file of a domain group is named by group name + "_" + group ID + ".tar" which includes coordinates of all clusters.
The coordinate file of a domain cluster is named by group name + "_" + group ID + "_" + cluster ID + ".tar.gz".
For example, (1-cysPrx_C)(AhpC-TSA)_2.tar is a diff-Pfam group.
The Pfam ID of one domain is (1-cysPrx_C) and the Pfam ID of the other domain is (AhpC-TSA),
the group ID is 2. The first cluster is named by (1-cysPrx_C)(AhpC-TSA)_2.tar.gz by adding "_1" before ".tar.gz".
Peptide Interface clusters
The coordinate file of a peptide binding Pfam is named by Pfam ID + ".tar" which includes coordinates of all clusters.
e.g. "CLP_protease.tar" for Pfam "CLP_protease", which contains three clusters.
The coordinate file of a peptide cluster is named by Pfam ID + "_" + cluster ID + ".tar.gz",
e.g. "CLP_protease_1.tar.gz" for the coordinates of the first cluster of Pfam "CLP_protease".
Chain Interfaces
For those interfaces which are in a chain or domain group, but not in any clusters, can be viewed by the link "Interface Files Not In Clusters".
Individual interface file in PDB format can be downloaded by clicking the unique interface ID in the column "Interface ID".
Download Sequence Files
Seuqence files are provided for chain interface clusters, domain interface clusters and peptide interface clusters.
There is only one file for each group, named by "Seq_" + group name + group ID + ".tar.gz".
The file contains one sequence file for each cluster if the cluster is a homodimer or same-Pfam dimer,
named by "Cluster" + group ID + "A_" + ClusterID + ".fasta", and two sequence files for each heterodimer cluster,
with additional sequence file named by "Cluster" + group ID + "B_" + ClusterID + ".fasta".
One sequence file for a homodimer group is also provided, named by "Group" + group ID + "A.fasta".
If for a heterodimer group, two sequence files are provided for each chain.
The names for peptide interface clusters is formatted in "Cluster" + Pfam ID + "A_" + ClusterID + ".fasta" for protein domains
and "Cluster" + Pfam ID + "B_" + ClusterID + ".fasta" for peptides.
and the names for peptide interface group is formated in "Group" + Pfam ID + "A.fasta" for protein domains,
and "Group" + Pfam ID + "B.fasta" for peptides.
The sequences in Fasta format for each cluster and the group can be downloaded by clicking the button "Download Sequence Files"
at the top right of each interface clusters page.
Download Cluster Data
A text file includes the summary table and all expandable tables about interface details can be downloaded
by clicking the Button "Download Cluster Data" on each interface clusters page.
The text file is tab delimited, and can be opened in Excel or other spreadsheet programs.
This includes the summary table and all expandable tables in the cluster page.
The first line of the cluster text file is the header line. For each cluster, there is a summary line.
The table below gives the description of the columns:
Group ID |
The sequential number of groups |
Cluster ID |
The sequential number of clusters |
CrystForm_ID |
The sequential number of crystal forms |
Space Group |
The space group of PDB structure, NMR for NMR structures |
Crystal Form |
The asymmetric unit (ASU) content. The number next to ASU distinguishes crytal forms
with same space group and ASU size but unit cell dimensions and angles that
are at least 1% different from others. |
InterfaceUnit |
The letters are corresponding to the orders of the chain PFAM architectures of
the group. For instance, the interface unit of the group "(G-alpha);(RGS)"
is AB. A refers to (G-alpha) chain, while B is for (RGS). |
NumOfInterfaces |
The number of unique interface in the crystal |
InPDB |
Does the common interface exist in the PDB biological unit? |
PDBBA |
The PQS format for the PDB biological unit |
InPISA |
Does the common interface exist in the PISA biological unit? |
PISABA |
The PQS format for the PISA biological unit |
#CFs/Cluster |
The number of crytal forms (CFs) in the cluster |
#Entry/Cluster |
The number of entries in the cluster |
#CFs/Group |
The number of crytal forms (CFs) in the group |
#Entry/Group |
The number of entries in the group |
MinSeqIdentity |
The minimum pair-wise sequence identity in the cluster |
Q(MinSeqId) |
The interface similar score (Q) for the pair with minimum sequence identity |
InterfaceType |
The interface is a homodimer (S="Same") or heterodimer (D = "diffferent") |
Name |
Protein names. For heterodimer, names are separated by ";" if they are
different. |