FCCC    PyIgClassify2: Classification of Antibody CDR Conformations     Dunbrack Lab
 
Antibody CDR         H2-10 Clusters

PyIgClassify2 (Data last updated on September 3, 2022)

PyIgClassify is a server that provides canonical cluster assignments and associated information of the complementarity determining regions (CDRs) of antibody structures in the Protein Data Bank (PDB). The current database contains 6,486 PDB antibody entries. The list of PDB entries can be browsed here.

The database contains an updated clustering compared to the North-Lehmann-Dunbrack clustering of 2011 [1][2]. The new clustering was performed by Simon Kelow, Bulat Faezov, and Mitchell Parker in the Dunbrack lab [3].

In short, in our 2011 clustering (updated over the years on this website), we had 72 non-H3 clusters (covering CDRs H1, H2, L1, L2, and L3) and 43 H3 clusters for a total of 115 clusters.

In the new clustering, we have retired 36 of the non-H3 clusters and their names, and added 16 new ones, for a new total of 52 non-H3 clusters. For example, the common CDR length, H1-13, now has 4 clusters compared to the 12 we had previously.

For H3, we kept 11 of the 43 North clusters, and retired the rest because they had too few sequences. We added two new ones, for a total of 13 H3 clusters.

For H4 and L4 (the loops that connect the D and E strands in each variable domain), we followed the work of Kelow et al[4] and have kept 3 clusters for H4 and 4 clusters for L4.

The new clustering was performed for several reasons:

  • The original clustering was performed on approximately 300 non-redundant antibody structures. Today there are approximately 3000 unique antibodies in the PDB.
  • Many of the North clusters have remained small (less than 5 unique sequences) when updated to the 2022 PDB. These clusters are not really likely to be "canonical" -- that is observed for germline sequences and ordinary somatic mutations of germline sequences from an in vivo immune response. Some CDR lengths were clustered that consist entirely of sequences that are somatic insertions or deletions from their original germlines (e.g., H1-10, H1-12, H1-16, H2-15, and L2-6).
  • Some of the North clusters have low electron density for specific residues within the CDR, often as the result of a mismodeled "peptide flip." This occurs when the backbone coordinates do not fit the electron density such that psi of one residue is shifted by 180 degrees and the phi of the next residue is shifted by 180 degrees, compared to more common conformations. These flips are often observed in small clusters, such that when modeled correctly, the structures would fall into the largest cluster for the given CDR length.

The protocol of the new clustering were.

  1. 1. We produced new hidden Markov models (HMMs) to identify antibodies in the PDB and to accurately identify the CDR sequences in each structure. Previously some CDRs were misaligned (e.g., long bovine H3 CDRs, and some CDRs and frameworks with somatic insertions).
  2. With the new HMMS, we renumbered antibody variable domains and produced PDB files in the mmCIF format (instead of legacy-PDB format) for further procesing. We use the Honegger-Plueckthun numbering system that we used previously.
  3. We used the EDIA program of Meyder et al [5] to calculate the electron density support for each atom of the backbone of each CDR and constructed data sets with different minimum EDIA cutoffs (no cutoff, 0.1, 0.2, 0.3, ..., 0.9). Atoms at EDIA above 0.8 are considered well placed in the electron density. Structures with resolution of 2.6 Ångstroms or worse tend to have EDIA values averaging at 0.8 or below.
  4. At each EDIA cutoff, we performed density-based clustering with the DBSCAN program over a grid of the two DBSCAN parameters, Eps and MinPts. We removed clusters from individual DBSCAN runs that are caused by the merger of two or more clusters (as identified with by points more than 150 degrees apart in phi or psi). We then merge clusters across DBSCAN runs if they overlap substantially (using a metric called the Simpson score, equal to the number of points two clusters have in common divided by the size of the smaller cluster).
  5. 5. The resulting clusters across the EDIA levels were grouped together by hierarchical clustering with the Simpson metric for comparing clusters. So for example, a large cluster that appears at the EDIA=0.0 level (no cutoff) will also appear at EDIA=0.1, 0.2, etc. As the EDIA cutoff increases, some clusters disappear, indicating that they have weak electron density support.
  6. 6. The final clusters were selected as those that passed the following criteria: a) Having a 0.7 EDIA cutoff cluster; b) having a 0.0 EDIA cluster with at least 10 unique sequences; c) having 0.0 and 0.7 EDIA clusters representing at least 1.0% of the chains for that CDR length data set. d) Exceptions were made if no clusters resulted from the first three criteria, if there were at least 5 sequences in the EDIA 0.0 cluster.

Ramachandran maps for new clusters

Example: L1 Ramachandran maps

L1 Ramachandran maps

Sequence logos (redundant data) and Ramachandran map logos (A,B,L,E)

Example: L1-11 sequence logo and Ramachandram map logo

L1-11 sequence and Ramachandran logos

EDIA plots for each CDR cluster (all data)

Example: H1-13-1 EDIA plot

H1-13-1 Edia plot


Please browse the statistics page for more information pertaining to each of these clusters. We believe these clusters and their associated structure and sequence data will lead to field-wide improvements in antibody structure prediction and design.

Citing PyIgClassify2

A penultimate classification of canonical antibody CDR conformations Simon Kelow; Bulat Faezov; Qifang Xu; Mitchell Parker; Jared Adolf-Bryfogle; Roland L. Dunbrack Jr. bioRxiv 2022.

References

1. North, B., Lehmann A., Dunbrack R.L. A new clustering of antibody CDR loop conformations. J. Mol. Biol. (2011), 406:228-256. pdf.

2.Adolf-Bryfogle, J.; Xu, Q.; North, B.; Lehmann, A. and Dunbrack, R.L. PyIgClassify: a database of antibody CDR structural classifications Nucleic Acids Research 2014; doi: 10.1093/nar/gku1106 pdf

3. Kelow, S.; Faezov, B.; Xu, Q.; Parker, M.; Adolf-Bryfogle, J. and Dunbrack R.L.. A penultimate classification of canonical antibody CDR conformations. bioRxiv 2022 pdf.

4. Kelow, S., Adolf-Bryfogle, J.A., and Dunbrack, R.L. Hiding in plain sight: structure and sequence analysis reveals the importance of the antibody DE loop for antibody-antigen binding. mAbs (2020) 12:1. pdf.

5. Meyder, A. et al. Estimating Electron Density Support for Individual Atoms and Molecular Fragments in X-ray Structures. J. Chem. Inf. Model. (2017), 57:2437–2447. pdf