PFAM Assignments in the PDB

The first three lines in pdbpfam.txt. (tab-delimited)

PdbID DomainID AsymChain AuthChain EntityID Pfam_Acc Pfam_ID Description SeqStart SeqEnd AlignStart AlignEnd HmmStart HmmEnd PdbSeqStart PdbSeqEnd PdbAlignStart PdbAlignEnd BitScore Evalue SeqAlignment HmmAlignment Source IsStructUpdated IsWeak Clan_ID Clan_Acc

101m 1 A A 1 PF00042 Globin Globin 7 113 7 113 1 108 6 112 6 112 128 1.4E-37 EWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRVKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK--G-HHEAELKPLAQSHATKHKIPIKYLEFISEAII qkalvkaswekvkanaeeigaeilkrlfkaypdtkklFkkfgdls.aedlksspkfkahakkvlaaldeavknldnddnlkaalkklgarHakrg.vdpanfklfgeall unppssm 0 0 Globin CL0090

102l 1 A A 1 PF00959 Phage_lysozyme Phage lysozyme 24 156 24 153 1 107 24 155 24 152 95 2.9E-27 YYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITT ywTigiG.................qkgkdvsphkritkseaagryqkdldtaerkikqyikg....kdleperqdalvslafnvGkgGkrkastllralnagqwkkacsai..wkslksgGkeynglkrr unppssm 0 0 Lysozyme CL0037

...

 

Column Description Example
PDBID 4-letter PDB code 101m
DomainID The unique domain id for each PDB structure. Since a PFAM domain may be a split domain, containing multiple lines, please use the domain ID to get the complete domain. 1
AsymChain The asymmetric chain ID which is the unique chain ID in a PDB file. A
AuthChain The author-name chain ID which may be same as water, ligands or other polymers in a PDB file. A
EntityID The entity ID in the PDB XML file 1
Pfam_Acc Pfam Accession code PF000420
Pfam_ID PFAM ID Globin
Description Annotation of PFAM Globin
SeqStart The domain start position of the sequence. For domains from Consensus + HMMER3, it is the envelope start position. For domains from HHsearch, it is the same as the alignment start position. This is the SEQRES number of the residue which starts from 1, not author/PDB number 7
SeqEnd The domain end position of the sequence. For domains from Consensus + HMMER3, it is the envelope end position. For domains from HHsearch, it is the same as the alignment end position. This is the SEQRES number of the residue, not author/PDB number. 113
AlignStart The domain start position of the sequence, which is the start position actually aligned to PFAM model. This is the SEQRES number of the residue, not author/PDB number. 7
AlignEnd The domain end position of the sequence, which is the end position actually aligned to PFAM model. This is the SEQRES number of the residue, not author/PDB number. 113
HmmStart The start position of the HMM model 1
HmmEnd The end position of the HMM model 108
PdbSeqStart The start location in PDB sequence number which not always starts from 1. 6
PdbSeqEnd The end position in PDB sequence number 112
PdbAlignStart The domain start position of the sequence, which is the start position actually aligned to PFAM model. This is the PDB sequence number. 6
PdbAlignEnd The domain end position of the sequence, which is the end position actually aligned to PFAM model. This is the PDB sequence number. 112
BitScore The bit score from HMMER and HHsearch. 128
Evalue E-value from HMMER and HHsearch. 1.4E-37
SeqAlignment The aligned region of the PDB sequence EWQLVLHVWAKVEADVAGHGQDILIRLFK
SHPETLEKFDRVKHLKTEAEMKASEDLKKH
GVTVLTALGAILKKK--G-HHEAELKPLAQS
HATKHKIPIKYLEFISEAII
HmmAlignment The aligned region of the PFAM model qkalvkaswekvkanaeeigaeilkrlfkaypdtkklFkkfgdls.a
edlksspkfkahakkvlaaldeavknldnddnlkaalkklgarHa
krg.vdpanfklfgeall
Source Where is it from? pdb (origincal pdb sequence), pdbpssm (the consensus sequence with best PSSM score), pdbpercent (the consensus sequence with highest percentage), unp (original uniprot sequence), unpssm (the consensus sequence with best PSSM score), unppercent (the consensus sequence with highest percentage) unppssm
IsStructUpdated Is it a struct hit? 1 means the domain is updated from structure alignment. 0
IsWeak Is it a weak hit? 1 means this Pfam domain is added since it is either statistically significant or there is strong Pfam assignments with same PFAM in the same sequence. 0
Clan_ID Clan ID. '-' means no Clan annotation for this PFAM. Globin
Clan_Acc Clan Accession code. '-' means no clan annotation for this PFAM. CL0090