Both HMMER3 and HHsearch produce many alignments to PDB sequences with either weak E-values or alignments shorter than the Pfam model definition, or both. We investigated whether we could confirm some of the weak hits and extend short alignments by comparing structures. We define exemplars as structure/Pfam pairs with good HMMER E-values (≤10-5) or HH E-value (≤10-4) to the Pfam and HMM coverage of at least 80%. Out of 7,145 Pfams in the PDB, 5,793 (81%) have exemplars. The structures of weak Pfam hits were aligned to the exemplar structures in the same clan including the Pfam of the weak hit. A total of 7,381 structure assignments were added to our Pfam assignments by replacing the original alignment to the HMM by a transitive alignment via the structure alignment. The number of PDB residues aligned to Pfam HMMs for these sequences rises by 36%.

Structure alignment can verify and greatly extend the alignments to the Pfam models.

For instance, in the upper row of the figure, the original sequence of PDB entry 2EAB does not align to any Pfam model. The consensus sequences have very short alignments at positions [149, 216], aligned to Pfam Glyco_hydro_65N model positions [90, 167] with E-value = 0.022 (top-left, magenta) and sequence region [515, 544] aligned to Pfam DUF608 model [108, 137] with E-value = 0.15 (top-left, yellow). The exemplar for both Glyco_hydro_65N (magenta) and Glyco_hydro_65m (yellow) is 1H54 which also contains Glyco_hydro_65C domain (blue) (top-right). After structure alignment, 2EAB contains a complete Glyco_hydro_65N domain with the sequence region [19, 91][125, 285] and the HMM model [1, 59][60, 254] (top middle, magenta). A 33-residue alpha-helix in the PDB sequence are inserted and not assigned by any PFAM. The FATCAT p-value is 4.35E-6. Since in the Pfam, the DUF608 is in the same clan as Glyco_hydro_65m, the middle section [376, 784] becomes a complete Glyco_hydro_65m domain (top middle, yellow). The p-value of the structure alignment is 3.88E-11.

Another example is shown on the bottom row of the figure: 3DAM with the short assignment [294, 435] to Pfam p450 (bottom-left, magenta) with E-value = 4E-35. After structure alignment to the exemplar 3NA0 (bottom-right), the p450 domain is complete in 3DAM (bottom middle).

