Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2000 Mar 28;97(7):3288-91.
doi: 10.1073/pnas.97.7.3288.

Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap

Affiliations

Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap

K R Wollenberg et al. Proc Natl Acad Sci U S A. .

Abstract

Quantitative analyses of biological sequences generally proceed under the assumption that individual DNA or protein sequence elements vary independently. However, this assumption is not biologically realistic because sequence elements often vary in a concerted manner resulting from common ancestry and structural or functional constraints. We calculated intersite associations among aligned protein sequences by using mutual information. To discriminate associations resulting from common ancestry from those resulting from structural or functional constraints, we used a parametric bootstrap algorithm to construct replicate data sets. These data are expected to have intersite associations resulting solely from phylogeny. By comparing the distribution of our association statistic for the replicate data against that calculated for empirical data, we were able to assign a probability that two sites covaried resulting from structural or functional constraint rather than phylogeny. We tested our method by using an alignment of 237 basic helix-loop-helix (bHLH) protein domains. Comparison of our results against a solved three-dimensional structure confirmed the identification of several sites important to function and structure of the bHLH domain. This analytical procedure has broad utility as a first step in the identification of sites that are important to biological macromolecular structure and function when a solved structure is unavailable.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Inverse cumulative frequency distribution of MI values for the alignment of 237 bHLH protein sequences and 1,000 parametric bootstrap replicates using either the JTT substitution matrix or the rind substitution matrix. MI values were calculated by using Eq. 1 with n = 20 so that the maximum possible value is unity. Line A is the P < 0.01 threshold for the JTT replicates at MI = 0.188. Line B is the P < 0.001 threshold for the JTT replicates at MI = 0.250. Line A′ is the P < 0.01 threshold for the rind replicates at MI = 0.359. Line B′ is the P < 0.001 threshold for the JTT replicates at MI = 0.408. These were the MI values that were >99% (for P < 0.01) or 99.9% (for P < 0.001) of the MI values calculated in the parametric bootstrap replicates. Because MI is a pairwise measure, x(x − 1)/2 values were calculated in each replicate, where x is the number of nongapped sites in the alignment. For the alignment of 237 bHLH sequences, there were 32 sites without gaps, resulting in 496 MI values per replicate.

Similar articles

Cited by

References

    1. Swofford D L, Olsen G J, Waddell P J, Hillis D M. In: Molecular Systematics. 2nd Ed. Hillis D M, Moritz C, Mable B K, editors. Sunderland, MA: Sinauer; 1996. pp. 407–514.
    1. Chelvanayagam G, Eggenschwiler A, Knecht L, Gonnet G H, Benner S A. Protein Eng. 1997;10:307–316. - PubMed
    1. Pollock D D, Taylor W R. Protein Eng. 1997;10:647–657. - PubMed
    1. Thompson M J, Goldstein R A. Proteins. 1996;25:28–37. - PubMed
    1. Gobel U, Sander C, Schneider R, Valencia A. Proteins Struct Funct Genet. 1994;18:309–317. - PubMed

Publication types

LinkOut - more resources