Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Mar 20:6:25.
doi: 10.1186/1471-2148-6-25.

Analysis of transitions at two-fold redundant sites in mammalian genomes. Transition redundant approach-to-equilibrium (TREx) distance metrics

Affiliations

Analysis of transitions at two-fold redundant sites in mammalian genomes. Transition redundant approach-to-equilibrium (TREx) distance metrics

Tang Li et al. BMC Evol Biol. .

Abstract

Background: The exchange of nucleotides at synonymous sites in a gene encoding a protein is believed to have little impact on the fitness of a host organism. This should be especially true for synonymous transitions, where a pyrimidine nucleotide is replaced by another pyrimidine, or a purine is replaced by another purine. This suggests that transition redundant exchange (TREx) processes at the third position of conserved two-fold codon systems might offer the best approximation for a neutral molecular clock, serving to examine, within coding regions, theories that require neutrality, determine whether transition rate constants differ within genes in a single lineage, and correlate dates of events recorded in genomes with dates in the geological and paleontological records. To date, TREx analysis of the yeast genome has recognized correlated duplications that established a new metabolic strategies in fungi, and supported analyses of functional change in aromatases in pigs. TREx dating has limitations, however. Multiple transitions at synonymous sites may cause equilibration and loss of information. Further, to be useful to correlate events in the genomic record, different genes within a genome must suffer transitions at similar rates.

Results: A formalism to analyze divergence at two fold redundant codon systems is presented. This formalism exploits two-state approach-to-equilibrium kinetics from chemistry. This formalism captures, in a single equation, the possibility of multiple substitutions at individual sites, avoiding any need to "correct" for these. The formalism also connects specific rate constants for transitions to specific approximations in an underlying evolutionary model, including assumptions that transition rate constants are invariant at different sites, in different genes, in different lineages, and at different times. Therefore, the formalism supports analyses that evaluate these approximations. Transitions at synonymous sites within two-fold redundant coding systems were examined in the mouse, rat, and human genomes. The key metric (f2), the fraction of those sites that holds the same nucleotide, was measured for putative ortholog pairs. A transition redundant exchange (TREx) distance was calculated from f2 for these pairs. Pyrimidine-pyrimidine transitions at these sites occur approximately 14% faster than purine-purine transitions in various lineages. Transition rate constants were similar in different genes within the same lineages; within a set of orthologs, the f2 distribution is only modest overdispersed. No correlation between disparity and overdispersion is observed. In rodents, evidence was found for greater conservation of TREx sites in genes on the X chromosome, accounting for a small part of the overdispersion, however.

Conclusion: The TREx metric is useful to analyze the history of transition rate constants within these mammals over the past 100 million years. The TREx metric estimates the extent to which silent nucleotide substitutions accumulate in different genes, on different chromosomes, with different compositions, in different lineages, and at different times.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A first order exponential describes the fraction of two fold sites that are identical (f2) versus the number of changes per site, which can be expressed as process is the consequence Schematic showing the fraction of residues at two fold redundant sites conserved after a time t, with an end point of 0.53. Note that in this plot, if we assume that the rate constant for transition is time-invariant, the x axis corresponds to time.
Figure 2
Figure 2
Schematic showing possible intertaxa relationships for a hypothetical gene family that is found in two taxa, T and U, that shared a last common ancestor (LCA) in which two paralogs of the gene, A and B, were already present as a consequence of a gene duplication that predates the speciation, after which sequences within lineages T and U diverged independently. AT and AU represent true orthologs. Pair BT1 and BT2 represent paralogs. Other pairs of modern proteins are neither orthologs nor paralogs.
Figure 3
Figure 3
Histogram showing the f2Y (a) and f2R (b) values of all mouse:rat intertaxa homolog pairs containing 50 or more characters. The peak centered at ca. 0.88 (a) and ca. 0.90 (b) reflect true orthologs. Pairs with f2 values near 0.53 diverged so long ago that the silent sites have equilibrated.
Figure 4
Figure 4
Histogram showing the frequency of n, the number of characters used to calculate the f2Y (a) and f2R (b) values, in the mouse:rat intertaxa orthologs. The mean (λ) of the Poisson distribution for f2Y is 136.6 (95% ci 130.1.7–141.2) while the one for f2R 138.2 (95% ci 134.3–140.5). ci: confidence interval. The bin size is 25 sites.
Figure 5
Figure 5
Histograms showing the frequency of f2Y and f2R values of mouse:rat intertaxa ortholog pairs. ci = confidence interval. (a). The histogram of observed data (f2Y) from all ortholog pairs (n>50), with the best fit Gaussian superimposed. μ = 0.88 (95% ci 0.877–0.884), σ = 0.040 (95% ci 0.039–0.042). (c). The theoretical histogram from the simulated data that is based on null hypothesis for f2Y of mouse:rat intertaxa ortholog pairs. μ = 0.88 (95% ci 0.878–0.882), σ = 0.030 (95% ci 0.028–0.031). (b). The histogram of observed data (f2R) from all ortholog pairs (n>50) with the best fit Gaussian superimposed. μ = 0.90 (95% ci 0.880–0.903), σ = 0.034 (95% ci 0.033–0.035). (d). The theoretical histogram from simulated data that is based on null hypothesis for f2R of mouse:rat intertaxa ortholog pairs. μ = 0.90 (95% ci 0.888–0.903), σ = 0.028 (95% ci 0.027–0.029). ci: confidence interval.
Figure 6
Figure 6
Histogram showing the frequency of f2 values of mouse:rat intertaxa ortholog pairs. (a). Observed data from all ortholog pairs (n>100), with the best fit Gaussian superimposed. μ = 0.89 (95% ci 0.886–0.893), σ = 0.029 (95% ci 0.028–0.031). (b). Theoretical histogram that assumes the null hypothesis that all sites diverge with equal rate constants, based on a simulation with the same distribution of characters. μ = 0.89 (95% ci 0.888–0.891), σ = 0.022 (95% ci 0.021–0.024). ci: confidence interval.
Figure 7
Figure 7
Histogram showing the frequency of f2 values for intertaxa ortholog pairs (n>100) between humans and rodents. (a) Human:mouse ortholog pairs. (b) Human:rat ortholog pairs. (c) Human:mouse ortholog pairs; for pairs from families that had more than one intertaxon pair, the pair with the highest f2 value is taken, to preferentially extract orthologs. (d) Human:rat ortholog pairs; for pairs from families that had more than one intertaxon pair, the pair with the highest f2 value is taken.
Figure 8
Figure 8
Histogram showing the frequency of f2 values in chicken:mouse intertaxa gene pairs (n>100). (a) All intertaxon pairs. (b) For pairs from families that had more than one intertaxon pair, the pair with the highest f2 value.
Figure 9
Figure 9
Histogram showing the frequency of f2 values in tagifugu:human intertaxa gene pairs (n >100). (a) All intertaxon pairs. (b) For pairs from families that had more than one intertaxon pair, the pair with the highest f2 value.
Figure 10
Figure 10
Histogram showing the orthologs of mouse:rat intertaxon pairs using f4 metric, the fraction identical for four fold redundant codon systems (n > 100). While the separation of orthologs from paralogs is larger, the distribution is wider. We do not reject f4 as a dating tool, but only that its use recognizes its particular advantages (broader sample size) and limitations (greater heterogeneity in microscopic rate constants).
Figure 11
Figure 11
Histogram showing the frequency of orthologs in sister genome pairs, with the best fit Gamma curve superimposed, using TREx and maximum likelihood dS (mldS) metrics: (a) TREx of human:mouse, (b) mldS of human:mouse, (c) TREx of human:rat, (d) mldS of human:rat, (e) TREx of mouse:rat, (f) mldS of mouse:rat.
Figure 12
Figure 12
For individual rat-mouse ortholog pairs, a plot of the likelihood that the null hypothesis is rejected under the disparity metric of Kumar and Gadagkar (x axis) versus the f2. There is no obvious correlation disparity and the f2 value.
Figure 13
Figure 13
The f2 values for putative ortholog pairs in rat and mouse are higher if they lie on the X chromosome (panel (b), mean f2 ≈ 0.93) than pairs on autosomal chromosomes (panel (a), mean f2 ≈ 0.90), implying that the X chromosome genes have accumulated fewer silent transitions at two fold redundant sites than the typical pair of orthologs. Since fewer than 5% of the genes lie on the X chromosome, this can account for only some of the overdispersion in the f2 values for rat-mouse orthologs. Interestingly, an analogous phenomenon was not observed in human-canine ortholog pairs (data not shown).

Similar articles

Cited by

References

    1. Thomson JM, Gaucher EA, Burgan MF, De Kee DW, Li T, Aris JP, Benner SA. Resurrecting ancestral alcohol dehydrogenases from yeast. Nat Genet. 2005;37:630–635. doi: 10.1038/ng1553. - DOI - PMC - PubMed
    1. Gaucher EA, Graddy LG, Li T, Simmen RC, Simmen FA, Schreiber DR, Liberles DA, Janis CM, Benner SA. The planetary biology of cytochrome P450 aromatases. BMC Biol. 2004;2:19. doi: 10.1186/1741-7007-2-19. - DOI - PMC - PubMed
    1. Kumar S, Gadagkar SR. Disparity index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. Genetics. 2001;158:1321–1327. - PMC - PubMed
    1. Li WH, Wu CI, Luo CC. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol Biol Evol. 1985;2:150–174. - PubMed
    1. Nei M, Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986;3:418–426. - PubMed

Publication types

LinkOut - more resources