Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 27;6(5):e1000968.
doi: 10.1371/journal.pgen.1000968.

The use of orthologous sequences to predict the impact of amino acid substitutions on protein function

Affiliations

The use of orthologous sequences to predict the impact of amino acid substitutions on protein function

Nicholas J Marini et al. PLoS Genet. .

Abstract

Computational predictions of the functional impact of genetic variation play a critical role in human genetics research. For nonsynonymous coding variants, most prediction algorithms make use of patterns of amino acid substitutions observed among homologous proteins at a given site. In particular, substitutions observed in orthologous proteins from other species are often assumed to be tolerated in the human protein as well. We examined this assumption by evaluating a panel of nonsynonymous mutants of a prototypical human enzyme, methylenetetrahydrofolate reductase (MTHFR), in a yeast cell-based functional assay. As expected, substitutions in human MTHFR at sites that are well-conserved across distant orthologs result in an impaired enzyme, while substitutions present in recently diverged sequences (including a 9-site mutant that "resurrects" the human-macaque ancestor) result in a functional enzyme. We also interrogated 30 sites with varying degrees of conservation by creating substitutions in the human enzyme that are accepted in at least one ortholog of MTHFR. Quite surprisingly, most of these substitutions were deleterious to the human enzyme. The results suggest that selective constraints vary between phylogenetic lineages such that inclusion of distant orthologs to infer selective pressures on the human enzyme may be misleading. We propose that homologous proteins are best used to reconstruct ancestral sequences and infer amino acid conservation among only direct lineal ancestors of a particular protein. We show that such an "ancestral site preservation" measure outperforms other prediction methods, not only in our selected set for MTHFR, but also in an exhaustive set of E. coli LacI mutants.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Example growth curves from which rate metrics were calculated.
Shown are two examples (major MTHFR allele (open triangle); Y189H substitution variant (closed circle)) where growth in liquid culture was tracked over time, according to Methods. The upper panel shows absorbance (OD595) values and the lower panel shows the log10 transformation of the same absorbance reads. Log10-transformed data were used to calculate maximum slopes that served as growth-rate metrics.
Figure 2
Figure 2. Activities of MTHR mutants.
The average maximum slope (growth-rate metric) and standard deviation for each of the 36 MTHFR variants tested as in Methods. Replicate sets (N = 5) were compared against a positive control (major MTHFR allele) and a negative control (A222V allele) using 2 different statistical criteria as described in Methods. Green circles indicate changes not significantly different from the positive control and significantly better than the A222V control and indicate functionality. Red squares indicate changes significantly less active than the positive control and not significantly better than the A222V control and indicate impaired alleles. Pink triangles are classified as equivocal due to disagreement in the statistical methods. The raw replicate data and statistical metrics are in Table S1.
Figure 3
Figure 3. Phylogenetic tree and ancestral allele determination from orthologs of human MTHFR.
Tree: MTHFR sequences from modern-day species are indicated. Database identifiers for these entries are listed in Table S2. Gene duplication events are shown with orange circles, and speciation events with green circles. Nodes numbered in red correspond to ancestral branch points in the human MTHFR lineage. Longer branch lengths indicate faster evolutionary rate. The chicken sequence was given an arbitrary, long branch length because it is a sequence fragment and the actual branch length could not be accurately determined. Ancestral allele determinations: The right columns show the amino acids found in the modern-day sequences corresponding to positions 134, 240 and 294 in human MTHFR. These are shown to illustrate how ancestral sites are determined and, consequently, how long the identity of the site in the human enzyme has been preserved in the human lineage (see text for details).
Figure 4
Figure 4. Accuracy of discrimination between functional and impaired variants by different methods.
Growth-rate metrics for the 30 variants in Table 1 plotted against scores/classifications from various methods that estimate functional impact. The accuracy of each method was determined by calculating the number of mutations correctly called as functional or impaired divided by the number of mutations unambiguously classified by experimental data. Binning of mutations was determined by using a threshold empirically defined by the functional alleles (dashed vertical line in each panel) to define the functional (left of line) and impaired (right of line) bins. (A) SIFT score; note that the graph plots (1–score) to facilitate comparison with the other methods. All functional variants have a SIFT score >0.09 which, when used as a threshold results in a classification accuracy of 62%. The recommended threshold of 0.05 (solid vertical line) results in a lower classification accuracy (42%). (B) Grantham scale of amino acid dissimilarity between wild-type and substituted amino acid. (C) Ancestral Site Preservation (ASP) measure, using inferred ancestral sequences of human MTHFR. Numbers on the x-axis correspond to increasingly ancient ancestors of human MTHFR as defined by the nodes in Figure 3. (D) Ancestral Site Preservation Extended (ASPext) measure. If a site was preserved in the ancestral lineage for a long period before being substituted by the current-day amino acid, the more ancient ancestor is used to define preservation at this site. The preservation measure for 5 variants is shifted by this criterion (see Table 1).

Similar articles

Cited by

References

    1. Stenson PD, Mort M, Ball EV, Howells K, Phillips AD, et al. The Human Gene Mutation Database: 2008 update. Genome Med. 2009;1:13. - PMC - PubMed
    1. Ng PC, Henikoff S. Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet. 2006;7:61–80. - PubMed
    1. Krawczak M, Ball EV, Cooper DN. Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am J Hum Genet. 1998;63:474–488. - PMC - PubMed
    1. Miller MP, Kumar S. Understanding human disease mutations through the use of interspecific genetic variation. Hum Mol Genet. 2001;10:2319–2328. - PubMed
    1. Grantham R. Amino acid difference formula to help explain protein evolution. Science. 1974;185:862–864. - PubMed

Publication types

MeSH terms

Substances