Protein Common Interface Database

Please refer to HELP section for using ProtCID web site.

Protein Common Interface Database

(Based on Pfam v34.)

ProtCID contains comprehensive, PDB-wide structural information on the interactions of proteins and individual protein domains with other molecules, including four types of interactions: chain interfaces, Pfam domain interfaces, Pfam-peptide interfaces and Pfam-ligand/nucleic acids interactions. A common interaction here indicates chain-chain or Pfam domain-domain interfaces that occur in different crystal forms or Pfam-peptide or Pfam-ligand interactions that occur in multiple homologous proteins.

Its main goal is to identify and cluster homodimeric and heterodimeric interfaces observed in multiple crystal forms of homologous proteins, and interactions of peptide and ligands in homologous proteins. Such interfaces and interactions, especially of non-identical proteins or protein complexes, have been associated with biologically relevant interactions [1]. For more details about the algorithm and benchmarking, please refers to our paper "Statistical Analysis of Interface Similarity in Crystals of Homologous Proteins." and the Help File.

Chain-chain interfaces [2]. All protein sequences in the PDB are assigned a "Chain Pfam architecture", which denotes the ordered Pfam assignments for that sequence, e.g. (Pkinase)_(SH2) or (Cyclin_N)_(Cyclin_C). Then we compare homodimeric interfaces in all crystals that contain a particular architecture, for instance (Pkinase)_(SH2), regardless of whether there are other protein types in the crystals. We also compare all interfaces between two different Pfam architectures in all PDB entries that contain them, e.g. (Pkinase) and (Cyclin_N)_(Cyclin_C). For both homodimers and heterodimers, the interfaces are clustered by a distance-weighted Jaccard-index metric (similarity score, Q score) based on shared residue-residue contacts across each interface.
Pfam domain-domain interfaces[3]. We use Pfam domains defined in our PDBfam database. PDB entries are grouped into Pfam pairs of individual Pfam domains from single and multi-domain proteins. This includes both same-Pfam pairs and diffferent Pfam pairs. One entry may belong to different Pfam pairs, if it contains more than one Pfam domains. Domain-domain interfaces are also clustered based on Q scores.

Search/Search.aspx.designer.cs

Pfam-peptide interfaces[3] are also clustered based on sharing Pfam hidden Markov models and RMSD of peptides. We also defined a set of Professional Peptide Binding Domains (PPBDs) which are not enzymes, repeats, and must have at least 3 human proteins that contain the domain in a cluster.
Pfam-ligand interactions[3] are the interactions between Pfam domains and all non-polymer molecules except water in the PDB, including DNA and RNA.

We report the number of crystal forms that contain a common interface, the number of PDB entries, the number of PDB and PISA biological assembly annotations that contain the same interface, the average surface area, and the minimum sequence identity of proteins that contain the interface. We find that PDB and PISA are not always consistent in their biological assemblies in a homologous family, even when an interface is present in all crystal forms. Therefore, our data provide an independent check on publicly available annotations of biological interactions for PDB entries.

The clusters of chain interfaces and domain interfaces can be used to identify biological protein complexes, especially those weak interactions like the asymmetric homodimers of the EGFR kinase, the homodimer of cytosolic sulfotransferases and homodimer common to both H-RAS and K-RAS proteins. If a cluster contains at least one protein has been well validated as a homodimer but the cluster contains other proteins in the same family which have not been described as such. Hypotheses can be generated that these proteins function as same oligomer.

The clusters of Pfam-peptide and Pfam-ligand interactions can be used to develop hypotheses for the structures of other protein families within the same superfamilies (Clans).

You can search ProtCID in different inputs:

PDB Code. Searching by PDB code returns a list of PFAM architectures for each sequence of the entry. Selecting one or two PFAM architectures returns the interface clusters.
One PFAM ID or PFAM Accession Code. A list PDB of entries that contain the query PFAM is returned. Selecting one PDB ID from this list is similar to inputting a PDB code.
Two PFAM IDs or PFAM Accession Codes. Searching by a Pfam pair returns the common Pfam-Pfam domain interactions. A list PDB of entries that contain the query PFAM is returned. Selecting one PDB ID from this list is similar to inputting a PDB code.
One sequence. Input one sequence to find out the interactions between the input sequence and any sequences in the PDB. ProtCID assigns Pfams to the input sequence, returns interactions of Pfam architectures in the PDB.
Two sequences. Input two sequence to find out the interactions between two sequences. ProtCID assigns Pfams to the input sequences, returns interactions of Pfam architectures between the input sequences.
UniProt IDs. Input one or more UniProt IDs to find out the interactions and common interfaces among them. There are two types of interactions provided: Interfaces on Pfams and Interfaces on Structures. Structures-based interactions only contain the interfaces of these input proteins, while Pfams-based interactions return the interfaces of Pfams in these input proteins, also include the interfaces of homologous proteins in the same Pfams.

Or you can browse Pfams, Clans, Pfam-Pfams, peptide-Pfams and Ligands in the PDB

Citing ProtCID

If you find ProtCID useful, please cite the references that describe the work:

ProtCID: a data resource for structural information on protein interactions: Q. Xu and R. Dunbrack. Nat Commun 11, 711 (2020). https://doi.org/10.1038/s41467-020-14301-4

The protein common interface database (ProtCID) - a comprehensive database of interactions of homologous proteins in multiple crystal forms: Q. Xu and R. Dunbrack. Nucleic Acids Research, Volume 39, Issue suppl_1, 1 January 2011, Pages D761�D770.

References

1. Xu, Q, et al, Statistical Analysis of Interface Similarity in Crystals of Homologous Proteins. J. Mol. Biol. (2008) 381: 487-507.

2. Q. Xu and R. Dunbrack, The protein common interface database (ProtCID) - a comprehensive database of interactions of homologous proteins in multiple crystal forms. Nucleic Acids Research, Volume 39, Issue suppl_1, 1 January 2011, Pages D761�D770.

3. Q. Xu and R. Dunbrack, ProtCID: a data resource for structural information on protein interactions Nat Commun 11, 711 (2020). https://doi.org/10.1038/s41467-020-14301-4