Glossary

Assembly

ProtCAD uses assemblies detected in PDB crystals using EPPIC plus those PDB biological assemblies which are not defined by EPPIC, like asymmetric EGFR dimers. EPPIC assemblies are topologically valid, exhibits point group symmetry and are isomorphous across the unit cell(EPPIC Help).


Pfam architecture

Assemblies with same chain Pfam architecture components are in a Pfam architecture. chain Pfam architecture is composed of Pfams in the order of protein sequence, connected by "_" if there are more than one Pfams (e.g. (GST_N)_(GST_C_3)). Please refer to the glossary help page in ProtCID for more details. A Pfam architecture is defined by its distinct chain Pfam architectures, that is, without copy numbers. We give two examples here.

  • Same-Pfam architecture. (Globin) Pfam architecture group contains assemblies in 15 different copy numbers of Pfam Globin, including (Globin), (Globin)2, (Globin)3, (Globin)4, (Globin)5, (Globin)6, (Globin)7, (Globin)8, (Globin)9, (Globin)10, (Globin)11, (Globin)12, (Globin)19, (Globin)21 and (Globin)24,
  • Diff-Pfam architecture. (Urease_alpha)_(Amidohydro_1)(Urease_beta)(Urease_gamma) Pfam architecture group contains assemblies in 28 different copy numbers of Pfam Urease_alpha, Pfam Urease_beta and Pfam Urease_gamma, including (Urease_alpha)_(Amidohydro_1)(Urease_beta)(Urease_gamma), (Urease_alpha)_(Amidohydro_1)3(Urease_beta)3(Urease_gamma)3, and (Urease_alpha)_(Amidohydro_1)6(Urease_beta)6(Urease_gamma)6.

Assembly cluster

An assembly cluster contains assemblies in same symmetry, same stoichiometry and similar interfaces. Each cluster is represented by a list of numbers, shown in the below table.

Column Annotation
Group ID The integer number of a Pfam architecture
Cluster ID The integer number of a cluster
CF_clus The number of distinct crystal forms in a cluster (#CFs)
CF_arch The number of distinct crystal forms in a Pfam architecture (#CFs_arch)
R_CF the ratio of CF_clus and CF_arch (CF_clus / CF_arch)
UNP_clus The number of distinct UniProt codes in a cluster (#UNPs)
UNP_arch The number of distinct UniProt codes in a Pfam architecture (#UNPs_arch)
R_UNP the ratio of UNP_clus and UNP_arch
ENT_clus The number of PDB entries of a cluster (#Entries)
ENT_arch The number of PDB entries of a Pfam architecture (#Entries_arch)
R_ENT the ratio of ENT_clus and ENT_arch
CF_UNPclus The number of crystal forms of the UniProts in a cluster (#CFs_UNPclus)
CF_UNParch The number of crystal forms of these same UniProts of a cluster in a Pfam architecture (#CFs_UNParch)
R_CF_UNPclus the ratio of CF_UNPclus and CF_UNParch
PDBBA The number of PDB entries that have at least one biological assemblies in a cluster (#PdbBAs)
R_PDB the ratio of PDBBA and ENT_clus
PISABA The number of PDB entries that have PISA biological assemblies in a cluster (#PisaBAs)
R_PISA the ratio of PISABA and ENT_clus
EPPICBA The number of PDB entries that have EPPIC biological assemblies in a cluster (#EppicBAs)
R_EPPIC the ratio of EPPICBA and ENT_clus
PEBeBA The number of PDB entries that have the PDBe preferred assemblies in a cluster (#PDBeBAs)
R_PDBe the ratio of PDBeBA and ENT_clus
MinSeqIdentity The minimum sequence identity of assembly sequences in a cluster