Protein sequences not assigned to Pfams

The first three lines in UnassignedUniqueSequences.txt. (tab-delimited)

1a0n 1 A A P85A_HUMAN P2L PPRPLPVAPGSSKT 14 T 1.7 SKI pdbhh F Eukaryota T

1a1m 3 C C PEPTIDE TPYDINQML TPYDINQML 9 T 1.2 TAFII55_N pdbhh F T

1a1n 3 C C PEPTIDE VPLRPMTY VPLRPMTY 8 T 5.9E-05 F-protein pdbhh F T

...

Annotations of table headers

Column Description Example
PDBID 4-letter PDB code 1a0n
EntityID The entity ID 1
AsymChainIDs The unique asymmetric chain IDs A
AuthorChainIDs The author-name chain IDs which may be same as water, ligands or other polymers in a PDB file. A
UnpCode Uniprot Code P85A_HUMAN
Name The name of protein defined in the PDB P2L
Sequence amino acid sequence of the entity PPRPLPVAPGSSKT
SeqLength the length of the sequence 14
HasWeakHits Whether there are weak Pfam hits from HMMER or hh-suit. T for true, F for false T
BestWeakEvalue if there are weak hits, the best E-value 1.7
BestWeakPfamID The Pfam of weak hit with best E-value SKI
Source Which program is used to identify the best Pfam Hit: either from original PDB sequence, or Psiblast consensus sequences based on percentage or PSSM (pdbpercent or pdbpssm), or hidden Markov models (HMMs) of hh-suit (pdbhh) pdbhh (the best weak hit was generated from HHsearch of hh-suit)
IsVirus Is the protein a virus protein. T for true, F for false F
Category Whether it is prokaryote or eukaryota Eukaryota
IsValid A sequence with less than or equal to 5 distinct amino acid types is not valid. e.g. a sequence with all Xs is not valid. T for true, F for false T