Protein N-terminal segment datasets used in TermiNator
Datasets of N-terminal transmembrane helix protein sequences used in the study were generated from two sources: the 188 well-annotated membrance proteins (Moller et al. 2000) and Swiss-prot 40.0 (Boeckmann et al. 2003).
Redundancy reduction has been carried
out on all protein datasets obtained from these two sources by calculating the
pairwise identity of protein segments using ClustalW (Thompson et al., 1994),
followed by the determination of the largest representative dataset using the
algorithm developed by Hobohm et al. (1992).
All protein segments within a dataset have pairwise identity less than
25%.
Extracted from the 188
well-annotated membrane proteins are M27.seg
comprised of 27 eukaryotic protein sequences, and M70.seg
of 70 Gram-negative bacterial sequences. anchor247.seg
of 247 signal anchor type II membrane proteins is derived from Swiss-prot 40.0.
com272.seg
of 272 eukaryotic protein sequences is generated by merging M27.seg and Anchor247.seg
with redundancy reduction, while com89.seg of 89 Gram-negative bacterial sequences
is generated from Swiss-prot 40.0 and M70.seg.
gram-sig232.seg
of 232 Gram-negative bacterial signal peptides, gram-non186.seg
of 186 non-secretary soluble bacterial proteins, and eu-sig943.seg
of 943 eukaryotic signal peptides, eu-non820.seg
of 820 non-secretary soluble eukaryotic proteins are derived from SignalP
server (Nielsen et al. 1999).
Click on the following links to download the
protein segment datasets.