ProtCID provides different types of coordinates including chain interface clusters, domain interface clusters, peptide interface clusters and Pfam-ligand interactions in PDB format, also provides sequence files of interface clusters in fasta format and cluster data in text format.

Download coordinates of cluster interface files

The coordinates of interface clusters can be downloaded from the cluster page. The coordindate file of interface files of each cluster is downloaded by clicking the Cluster ID. The interface files of all clusters by clicking the "Download Interface Files" button at the top right of each interface clusters page.

Chain Interface clusters

The coordinate file of a chain group is named by Group name + "_" + group ID + ".tar" which includes coordinates of all clusters. The coordinate file of a chain cluster is named by Group name + "_" + group ID + "_" + cluster ID + ".tar.gz".

For example, (V_ATPase_I)(V-ATPase_H_N)_(V-ATPase_H_C)_7151.tar.gz contains coordinates of all clusters in a heterodimer chain group. The Pfam architecture of one chain is (V_ATPase_I) and the Pfam architecture of the other chain is (V-ATPase_H_N)_(V-ATPase_H_C), the group ID is 7151. The maximum length of a chain group file name is not more than 128 characters. The first cluster is named by (V_ATPase_I)(V-ATPase_H_N)_(V-ATPase_H_C)_7151_1.tar.gz by adding "_1" before ".tar.gz".

Domain Interface clusters

The coordinate file of a domain group is named by group name + "_" + group ID + ".tar" which includes coordinates of all clusters. The coordinate file of a domain cluster is named by group name + "_" + group ID + "_" + cluster ID + ".tar.gz".

For example, (1-cysPrx_C)(AhpC-TSA)_2.tar is a diff-Pfam group. The Pfam ID of one domain is (1-cysPrx_C) and the Pfam ID of the other domain is (AhpC-TSA), the group ID is 2. The first cluster is named by (1-cysPrx_C)(AhpC-TSA)_2.tar.gz by adding "_1" before ".tar.gz".

Peptide Interface clusters

The coordinate file of a peptide binding Pfam is named by Pfam ID + ".tar" which includes coordinates of all clusters. e.g. "CLP_protease.tar" for Pfam "CLP_protease", which contains three clusters. The coordinate file of a peptide cluster is named by Pfam ID + "_" + cluster ID + ".tar.gz", e.g. "CLP_protease_1.tar.gz" for the coordinates of the first cluster of Pfam "CLP_protease".

Chain Interfaces

For those interfaces which are in a chain or domain group, but not in any clusters, can be viewed by the link "Interface Files Not In Clusters". Individual interface file in PDB format can be downloaded by clicking the unique interface ID in the column "Interface ID".


top

Download Sequence Files

Seuqence files are provided for chain interface clusters, domain interface clusters and peptide interface clusters. There is only one file for each group, named by "Seq_" + group name + group ID + ".tar.gz". The file contains one sequence file for each cluster if the cluster is a homodimer or same-Pfam dimer, named by "Cluster" + group ID + "A_" + ClusterID + ".fasta", and two sequence files for each heterodimer cluster, with additional sequence file named by "Cluster" + group ID + "B_" + ClusterID + ".fasta". One sequence file for a homodimer group is also provided, named by "Group" + group ID + "A.fasta". If for a heterodimer group, two sequence files are provided for each chain. The names for peptide interface clusters is formatted in "Cluster" + Pfam ID + "A_" + ClusterID + ".fasta" for protein domains and "Cluster" + Pfam ID + "B_" + ClusterID + ".fasta" for peptides. and the names for peptide interface group is formated in "Group" + Pfam ID + "A.fasta" for protein domains, and "Group" + Pfam ID + "B.fasta" for peptides.

The sequences in Fasta format for each cluster and the group can be downloaded by clicking the button "Download Sequence Files" at the top right of each interface clusters page.


top

Download Cluster Data

A text file includes the summary table and all expandable tables about interface details can be downloaded by clicking the Button "Download Cluster Data" on each interface clusters page. The text file is tab delimited, and can be opened in Excel or other spreadsheet programs. This includes the summary table and all expandable tables in the cluster page.

The first line of the cluster text file is the header line. For each cluster, there is a summary line. The table below gives the description of the columns:

Group ID The sequential number of groups
Cluster ID The sequential number of clusters
CrystForm_ID The sequential number of crystal forms
Space Group The space group of PDB structure, NMR for NMR structures
Crystal Form The asymmetric unit (ASU) content. The number next to ASU distinguishes crytal forms with same space group and ASU size but unit cell dimensions and angles that are at least 1% different from others.
InterfaceUnit The letters are corresponding to the orders of the chain PFAM architectures of the group. For instance, the interface unit of the group "(G-alpha);(RGS)" is AB. A refers to (G-alpha) chain, while B is for (RGS).
NumOfInterfaces The number of unique interface in the crystal
InPDB Does the common interface exist in the PDB biological unit?
PDBBA The PQS format for the PDB biological unit
InPISA Does the common interface exist in the PISA biological unit?
PISABA The PQS format for the PISA biological unit
#CFs/Cluster The number of crytal forms (CFs) in the cluster
#Entry/Cluster The number of entries in the cluster
#CFs/Group The number of crytal forms (CFs) in the group
#Entry/Group The number of entries in the group
MinSeqIdentity The minimum pair-wise sequence identity in the cluster
Q(MinSeqId) The interface similar score (Q) for the pair with minimum sequence identity
InterfaceType The interface is a homodimer (S="Same") or heterodimer (D = "diffferent")
Name Protein names. For heterodimer, names are separated by ";" if they are different.

top