PyIgClassify2 (Data last updated on September 3, 2022)
PyIgClassify is a server that provides canonical cluster assignments and
associated information of the complementarity determining regions (CDRs)
of antibody structures in the Protein Data Bank (PDB). The current database
contains 6,486 PDB antibody entries.
The list of PDB entries can be browsed here.
The database contains an updated clustering compared to
the North-Lehmann-Dunbrack clustering of 2011 [1][2].
The new clustering was performed by Simon Kelow, Bulat Faezov, and Mitchell Parker in the Dunbrack lab [3].
In short, in our 2011 clustering (updated over the years on this website),
we had 72 non-H3 clusters (covering CDRs H1, H2, L1, L2, and L3) and 43 H3 clusters
for a total of 115 clusters.
In the new clustering, we have retired 36 of the non-H3 clusters and their names,
and added 16 new ones, for a new total of 52 non-H3 clusters.
For example, the common CDR length, H1-13, now has 4 clusters compared to the 12 we had previously.
For H3, we kept 11 of the 43 North clusters, and retired the rest because they had too few sequences.
We added two new ones, for a total of 13 H3 clusters.
For H4 and L4 (the loops that connect the D and E strands in each variable domain),
we followed the work of Kelow et al[4] and have kept 3 clusters
for H4 and 4 clusters for L4.
The new clustering was performed for several reasons:
- The original clustering was performed on approximately 300
non-redundant antibody structures. Today there are approximately
3000 unique antibodies in the PDB.
- Many of the North clusters have remained small (less than 5 unique sequences)
when updated to the 2022 PDB. These clusters are not really likely to be "canonical"
-- that is observed for germline sequences and ordinary somatic mutations of
germline sequences from an in vivo immune response. Some CDR lengths were clustered
that consist entirely of sequences that are somatic insertions or deletions
from their original germlines (e.g., H1-10, H1-12, H1-16, H2-15, and L2-6).
- Some of the North clusters have low electron density for specific residues within the CDR,
often as the result of a mismodeled "peptide flip." This occurs when the backbone coordinates
do not fit the electron density such that psi of one residue is shifted by 180 degrees and
the phi of the next residue is shifted by 180 degrees, compared to more common conformations.
These flips are often observed in small clusters, such that when modeled correctly,
the structures would fall into the largest cluster for the given CDR length.
The protocol of the new clustering were.
- 1. We produced new hidden Markov models (HMMs) to identify antibodies in the PDB
and to accurately identify the CDR sequences in each structure.
Previously some CDRs were misaligned (e.g., long bovine H3 CDRs, and some CDRs and
frameworks with somatic insertions).
- With the new HMMS, we renumbered antibody variable domains and
produced PDB files in the mmCIF format (instead of legacy-PDB format) for further procesing.
We use the Honegger-Plueckthun numbering system that we used previously.
- We used the EDIA program of Meyder et al [5] to calculate the
electron density support for each atom of the backbone of each CDR
and constructed data sets with different minimum EDIA cutoffs (no cutoff, 0.1, 0.2, 0.3, ..., 0.9).
Atoms at EDIA above 0.8 are considered well placed in the electron density.
Structures with resolution of 2.6 Ångstroms or worse tend to have EDIA values averaging at 0.8 or below.
- At each EDIA cutoff, we performed density-based clustering with
the DBSCAN program over a grid of the two DBSCAN parameters, Eps and
MinPts. We removed clusters from individual DBSCAN runs that are
caused by the merger of two or more clusters (as identified with by
points more than 150 degrees apart in phi or psi). We then merge
clusters across DBSCAN runs if they overlap substantially (using a
metric called the Simpson score, equal to the number of points two
clusters have in common divided by the size of the smaller
cluster).
- 5. The resulting clusters across the EDIA levels were grouped together
by hierarchical clustering with the Simpson metric for comparing clusters.
So for example, a large cluster that appears at the EDIA=0.0 level (no cutoff)
will also appear at EDIA=0.1, 0.2, etc. As the EDIA cutoff increases,
some clusters disappear, indicating that they have weak electron density support.
- 6. The final clusters were selected as those that passed the following criteria:
a) Having a 0.7 EDIA cutoff cluster;
b) having a 0.0 EDIA cluster with at least 10 unique sequences;
c) having 0.0 and 0.7 EDIA clusters representing at least 1.0% of the chains
for that CDR length data set.
d) Exceptions were made if no clusters resulted from the first three criteria,
if there were at least 5 sequences in the EDIA 0.0 cluster.
Ramachandran maps for new clusters
Example: L1 Ramachandran maps
Sequence logos (redundant data) and Ramachandran map logos (A,B,L,E)
Example: L1-11 sequence logo and Ramachandram map logo
EDIA plots for each CDR cluster (all data)
Example: H1-13-1 EDIA plot
Please browse the
statistics page for more information pertaining to each of
these clusters. We believe these clusters and their associated
structure and sequence data will lead to field-wide improvements
in antibody structure prediction and design.
Citing PyIgClassify2
A penultimate classification of canonical antibody CDR conformations
Simon Kelow; Bulat Faezov; Qifang Xu; Mitchell Parker; Jared Adolf-Bryfogle; Roland L. Dunbrack Jr. bioRxiv 2022.
References
1. North, B., Lehmann A., Dunbrack R.L. A new clustering of antibody CDR
loop conformations. J. Mol. Biol. (2011), 406:228-256.
pdf.
2.Adolf-Bryfogle, J.; Xu, Q.; North, B.; Lehmann, A. and Dunbrack, R.L.
PyIgClassify: a database of antibody CDR structural classifications
Nucleic Acids Research 2014; doi: 10.1093/nar/gku1106
pdf
3. Kelow, S.; Faezov, B.; Xu, Q.; Parker, M.; Adolf-Bryfogle, J. and Dunbrack R.L..
A penultimate classification of canonical antibody CDR conformations. bioRxiv 2022
pdf.
4. Kelow, S., Adolf-Bryfogle, J.A., and Dunbrack, R.L. Hiding
in plain sight: structure and sequence analysis reveals the
importance of the antibody DE loop for antibody-antigen binding. mAbs (2020) 12:1.
pdf.
5. Meyder, A. et al. Estimating Electron Density Support for
Individual Atoms and Molecular Fragments in X-ray
Structures. J. Chem. Inf. Model. (2017), 57:2437–2447.
pdf