Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2005 Jun;15(3):254-60.
doi: 10.1016/j.sbi.2005.05.005.

The limits of protein sequence comparison?

Affiliations
Review

The limits of protein sequence comparison?

William R Pearson et al. Curr Opin Struct Biol. 2005 Jun.

Abstract

Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Homologs, analogs(?) and convergent evolution. Three-dimensional structures of five serine proteases: (a) bovine trypsin (PDB code 5PTP), (b) Streptomyces griseus trypsin (PDB code 1SGT), (c) S. griseus protease A (PDB code 2SGA), (d) viral serine protease (PDB code 1BEF) and (e) subtilisin (PDB code 1SBT). The CATH structure classification places 5PTP, 1SGT and 2SGA in the same homology category, whereas 1BEF has the same topology, but is classified as non-homologous to 5PTP. SCOP places 1BEF in the same superfamily as 5PTP. Subtilisin (1SBT) has a very different structure to the trypsin-like serine proteases and is clearly non-homologous. However, the active sites of subtilisin and trypsin are examples of convergent evolution.
Figure 2
Figure 2
Accuracy of statistical estimates. The expected Poisson probability of seeing the reported E()-value versus the observed probability of seeing a domain with a different fold according to CATH (i.e. the domains have different CATH topology classifications) for SSEARCH, PSI-BLAST, COMPASS, DALI and VAST. The E()-values for the highest scoring false-positive (different topology) for each of 86 queries from different CATH homologous superfamilies are shown. The Z-scores reported by DALI were converted into E()-values assuming an extreme value distribution (see [51••] for details). The numbers in parentheses show the number of non-homologs with reported E()<0.001.
Figure 3
Figure 3
Homologs found by different search methods. Box plot of the CATH homolog coverage achieved by 86 query domains from different CATH homologous superfamilies under different error criteria for SSEARCH [54], PSI-BLAST [4], COMPASS [19•], DALI [8] and VAST [9]. The upper and lower edges of the boxes are at the 75th and 25th percentile, respectively, with the upper and lower whiskers at the 90th and 10th percentile. The middle line is the median amount of coverage and the circles are the outliers. The fractions of CATH homologs identified at four thresholds are shown: reported E()>0.01 (gray boxes); E()>1 (blue); the first non-homolog according to CATH (red); the first non-topolog (different fold) according to CATH (green).

References

    1. Wilbur WJ, Lipman DJ. Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci USA. 1983;80:726–730. - PMC - PubMed
    1. Lipman DJ, Pearson WR. Rapid and sensitive protein similarity searches. Science. 1985;227:1435–1441. - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Apweiler R, Bairoch A, Wu CH. Protein sequence databases. Curr Opin Chem Biol. 2004;8:76–80. - PubMed

Publication types