Variations on probabilistic suffix trees: statistical modeling and prediction of protein families

G Bejerano¹, G Yona

Affiliations

PMID: 11222260
DOI: 10.1093/bioinformatics/17.1.23

Comparative Study

Variations on probabilistic suffix trees: statistical modeling and prediction of protein families

G Bejerano et al. Bioinformatics. 2001 Jan.

. 2001 Jan;17(1):23-43.

doi: 10.1093/bioinformatics/17.1.23.

Authors

G Bejerano¹, G Yona

Affiliation

¹ School of Computer Science and Engineering, Hebrew University, Jerusalem 91904, Israel. jill@cs.huji.ac.il

PMID: 11222260
DOI: 10.1093/bioinformatics/17.1.23

Abstract

Motivation: We present a method for modeling protein families by means of probabilistic suffix trees (PSTs). The method is based on identifying significant patterns in a set of related protein sequences. The patterns can be of arbitrary length, and the input sequences do not need to be aligned, nor is delineation of domain boundaries required. The method is automatic, and can be applied, without assuming any preliminary biological information, with surprising success. Basic biological considerations such as amino acid background probabilities, and amino acids substitution probabilities can be incorporated to improve performance.

Results: The PST can serve as a predictive tool for protein sequence classification, and for detecting conserved patterns (possibly functionally or structurally important) within protein sequences. The method was tested on the Pfam database of protein families with more than satisfactory performance. Exhaustive evaluations show that the PST model detects much more related sequences than pairwise methods such as Gapped-BLAST, and is almost as sensitive as a hidden Markov model that is trained from a multiple alignment of the input sequences, while being much faster.

PubMed Disclaimer

Cited by

Stochastic computing with biomolecular automata.
Adar R, Benenson Y, Linshiz G, Rosner A, Tishby N, Shapiro E. Adar R, et al. Proc Natl Acad Sci U S A. 2004 Jul 6;101(27):9960-5. doi: 10.1073/pnas.0400731101. Epub 2004 Jun 23. Proc Natl Acad Sci U S A. 2004. PMID: 15215499 Free PMC article.
Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences.
Choi SC, Redelings BD, Thorne JL. Choi SC, et al. Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3931-9. doi: 10.1098/rstb.2008.0167. Philos Trans R Soc Lond B Biol Sci. 2008. PMID: 18852105 Free PMC article.
Comparison of imputation methods for univariate categorical longitudinal data.
Emery K, Studer M, Berchtold A. Emery K, et al. Qual Quant. 2025;59(2):1767-1791. doi: 10.1007/s11135-024-02028-z. Epub 2024 Dec 26. Qual Quant. 2025. PMID: 40433560 Free PMC article.
TransportTP: a two-phase classification approach for membrane transporter prediction and characterization.
Li H, Benedito VA, Udvardi MK, Zhao PX. Li H, et al. BMC Bioinformatics. 2009 Dec 14;10:418. doi: 10.1186/1471-2105-10-418. BMC Bioinformatics. 2009. PMID: 20003433 Free PMC article.
Local similarity search to find gene indicators in mitochondrial genomes.
Moritz RL, Bernt M, Middendorf M. Moritz RL, et al. Biology (Basel). 2014 Mar 11;3(1):220-42. doi: 10.3390/biology3010220. Biology (Basel). 2014. PMID: 24833343 Free PMC article.

See all "Cited by" articles

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Variations on probabilistic suffix trees: statistical modeling and prediction of protein families

Affiliation

Variations on probabilistic suffix trees: statistical modeling and prediction of protein families

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials