Summary

ProtCID contains a large set of data compiled from various sources: protein structure data from the Protein Data Bank (PDB), protein family architectures from Pfam, biological Units from the PDB and Protein Interfaces, Surfaces and Assemblies (PISA) and alignment data generated from PISCES and the programs FATCAT and HMMER, as well as sequence information from the Universal Protein Resource (UniProt).

ProtCID can be queried by input PDB code or with one or two protein sequences. It returns the Pfam architectures of the chains in the PDB entry or in the input sequences. The user then selects one or two Pfam architectures and the database server returns a list of interfaces observed in two or more crystal forms. The server provides information on how many crystal forms are available and how many each interface is observed in, as well as information on surface area, minimum sequence identity in the cluster, and presence of the interface in PDB and PISA biological units. We found that PDB and PISA are not always consistent in their biological units in a homologous family, even when an interface is present in all crystal forms. Therefore, our data provide an independent check on publicly available annotations of biological interactions for PDB entries. Coordinates for all interfaces in a cluster in PDB format, and the cluster data in text file are downloadable for further analysis and modeling.