We investigated the properties of the 18% of the nearly 13 million residues in unique protein sequences in the PDB that are not assigned to any Pfam by our method.

The histogram figure shows the histograms of the lengths of unassigned regions (N/C terminal regions) (a) or internal regions (b) and completely unassigned sequences (c). More than 90% of unassigned regions/entities are short peptides with length less than 50.

Unassigned Histograms

The barplot figure shows the secondary structures for the unassigned regions. The last bar in each figure is for proteins and protein regions with Pfam assignments. The percentage of N and C terminal unassigned regions is more than that for Pfam-assigned regions (10%). The secondary structures of the internal unassigned residues are closer to that of Pfam assignments. For completely unassigned sequences, the percentages of residues in coil or disordered are somewhat higher for shorter sequences than for Pfam assignments, especially in the amount of disorder.

Unassigned SC