. 2006 Mar 20:6:25.

doi: 10.1186/1471-2148-6-25.

Analysis of transitions at two-fold redundant sites in mammalian genomes. Transition redundant approach-to-equilibrium (TREx) distance metrics

Tang Li¹, Stephen G Chamberlin, M Daniel Caraco, David A Liberles, Eric A Gaucher, Steven A Benner

Affiliations

PMID: 16545144
PMCID: PMC1435776
DOI: 10.1186/1471-2148-6-25

Analysis of transitions at two-fold redundant sites in mammalian genomes. Transition redundant approach-to-equilibrium (TREx) distance metrics

Tang Li et al. BMC Evol Biol. 2006.

. 2006 Mar 20:6:25.

doi: 10.1186/1471-2148-6-25.

Authors

Tang Li¹, Stephen G Chamberlin, M Daniel Caraco, David A Liberles, Eric A Gaucher, Steven A Benner

Affiliation

¹ Foundation for Applied Molecular Evolution, Gainesville, FL 32604, USA. tli@ffame.org

PMID: 16545144
PMCID: PMC1435776
DOI: 10.1186/1471-2148-6-25

Abstract

Background: The exchange of nucleotides at synonymous sites in a gene encoding a protein is believed to have little impact on the fitness of a host organism. This should be especially true for synonymous transitions, where a pyrimidine nucleotide is replaced by another pyrimidine, or a purine is replaced by another purine. This suggests that transition redundant exchange (TREx) processes at the third position of conserved two-fold codon systems might offer the best approximation for a neutral molecular clock, serving to examine, within coding regions, theories that require neutrality, determine whether transition rate constants differ within genes in a single lineage, and correlate dates of events recorded in genomes with dates in the geological and paleontological records. To date, TREx analysis of the yeast genome has recognized correlated duplications that established a new metabolic strategies in fungi, and supported analyses of functional change in aromatases in pigs. TREx dating has limitations, however. Multiple transitions at synonymous sites may cause equilibration and loss of information. Further, to be useful to correlate events in the genomic record, different genes within a genome must suffer transitions at similar rates.

Results: A formalism to analyze divergence at two fold redundant codon systems is presented. This formalism exploits two-state approach-to-equilibrium kinetics from chemistry. This formalism captures, in a single equation, the possibility of multiple substitutions at individual sites, avoiding any need to "correct" for these. The formalism also connects specific rate constants for transitions to specific approximations in an underlying evolutionary model, including assumptions that transition rate constants are invariant at different sites, in different genes, in different lineages, and at different times. Therefore, the formalism supports analyses that evaluate these approximations. Transitions at synonymous sites within two-fold redundant coding systems were examined in the mouse, rat, and human genomes. The key metric (f2), the fraction of those sites that holds the same nucleotide, was measured for putative ortholog pairs. A transition redundant exchange (TREx) distance was calculated from f2 for these pairs. Pyrimidine-pyrimidine transitions at these sites occur approximately 14% faster than purine-purine transitions in various lineages. Transition rate constants were similar in different genes within the same lineages; within a set of orthologs, the f2 distribution is only modest overdispersed. No correlation between disparity and overdispersion is observed. In rodents, evidence was found for greater conservation of TREx sites in genes on the X chromosome, accounting for a small part of the overdispersion, however.

Conclusion: The TREx metric is useful to analyze the history of transition rate constants within these mammals over the past 100 million years. The TREx metric estimates the extent to which silent nucleotide substitutions accumulate in different genes, on different chromosomes, with different compositions, in different lineages, and at different times.

PubMed Disclaimer

Figures

**Figure 1**
A first order exponential describes the fraction of two fold sites that are identical (f₂) versus the number of changes per site, which can be expressed as process is the consequence Schematic showing the fraction of residues at two fold redundant sites conserved after a time t, with an end point of 0.53. Note that in this plot, if we assume that the rate constant for transition is time-invariant, the x axis corresponds to time.

**Figure 2**
Schematic showing possible intertaxa relationships for a hypothetical gene family that is found in two taxa, T and U, that shared a last common ancestor (LCA) in which two paralogs of the gene, A and B, were already present as a consequence of a gene duplication that predates the speciation, after which sequences within lineages T and U diverged independently. A_Tand A_Urepresent true orthologs. Pair B_T1and B_T2represent paralogs. Other pairs of modern proteins are neither orthologs nor paralogs.

**Figure 3**
Histogram showing the f_2Y(a) and f_2R(b) values of all mouse:rat intertaxa homolog pairs containing 50 or more characters. The peak centered at ca. 0.88 (a) and ca. 0.90 (b) reflect true orthologs. Pairs with f₂values near 0.53 diverged so long ago that the silent sites have equilibrated.

**Figure 4**
Histogram showing the frequency of n, the number of characters used to calculate the f_2Y(a) and f_2R(b) values, in the mouse:rat intertaxa orthologs. The mean (λ) of the Poisson distribution for f_2Yis 136.6 (95% ci 130.1.7–141.2) while the one for f_2R138.2 (95% ci 134.3–140.5). ci: confidence interval. The bin size is 25 sites.

**Figure 5**
Histograms showing the frequency of f_2Yand f_2Rvalues of mouse:rat intertaxa ortholog pairs. ci = confidence interval. (a). The histogram of observed data (f_2Y) from all ortholog pairs (n>50), with the best fit Gaussian superimposed. μ = 0.88 (95% ci 0.877–0.884), σ = 0.040 (95% ci 0.039–0.042). (c). The theoretical histogram from the simulated data that is based on null hypothesis for f_2Yof mouse:rat intertaxa ortholog pairs. μ = 0.88 (95% ci 0.878–0.882), σ = 0.030 (95% ci 0.028–0.031). (b). The histogram of observed data (f_2R) from all ortholog pairs (n>50) with the best fit Gaussian superimposed. μ = 0.90 (95% ci 0.880–0.903), σ = 0.034 (95% ci 0.033–0.035). (d). The theoretical histogram from simulated data that is based on null hypothesis for f_2Rof mouse:rat intertaxa ortholog pairs. μ = 0.90 (95% ci 0.888–0.903), σ = 0.028 (95% ci 0.027–0.029). ci: confidence interval.

**Figure 6**
Histogram showing the frequency of f₂values of mouse:rat intertaxa ortholog pairs. (a). Observed data from all ortholog pairs (n>100), with the best fit Gaussian superimposed. μ = 0.89 (95% ci 0.886–0.893), σ = 0.029 (95% ci 0.028–0.031). (b). Theoretical histogram that assumes the null hypothesis that all sites diverge with equal rate constants, based on a simulation with the same distribution of characters. μ = 0.89 (95% ci 0.888–0.891), σ = 0.022 (95% ci 0.021–0.024). ci: confidence interval.

**Figure 7**
Histogram showing the frequency of f₂values for intertaxa ortholog pairs (n>100) between humans and rodents. (a) Human:mouse ortholog pairs. (b) Human:rat ortholog pairs. (c) Human:mouse ortholog pairs; for pairs from families that had more than one intertaxon pair, the pair with the highest f₂value is taken, to preferentially extract orthologs. (d) Human:rat ortholog pairs; for pairs from families that had more than one intertaxon pair, the pair with the highest f₂value is taken.

**Figure 8**
Histogram showing the frequency of f₂values in chicken:mouse intertaxa gene pairs (n>100). (a) All intertaxon pairs. (b) For pairs from families that had more than one intertaxon pair, the pair with the highest f₂value.

**Figure 9**
Histogram showing the frequency of f₂values in tagifugu:human intertaxa gene pairs (n >100). (a) All intertaxon pairs. (b) For pairs from families that had more than one intertaxon pair, the pair with the highest f₂value.

**Figure 10**
Histogram showing the orthologs of mouse:rat intertaxon pairs using f₄metric, the fraction identical for four fold redundant codon systems (n > 100). While the separation of orthologs from paralogs is larger, the distribution is wider. We do not reject f₄as a dating tool, but only that its use recognizes its particular advantages (broader sample size) and limitations (greater heterogeneity in microscopic rate constants).

**Figure 11**
Histogram showing the frequency of orthologs in sister genome pairs, with the best fit Gamma curve superimposed, using TREx and maximum likelihood dS (mldS) metrics: (a) TREx of human:mouse, (b) mldS of human:mouse, (c) TREx of human:rat, (d) mldS of human:rat, (e) TREx of mouse:rat, (f) mldS of mouse:rat.

**Figure 12**
For individual rat-mouse ortholog pairs, a plot of the likelihood that the null hypothesis is rejected under the disparity metric of Kumar and Gadagkar (x axis) versus the f₂. There is no obvious correlation disparity and the f₂value.

**Figure 13**
The f₂values for putative ortholog pairs in rat and mouse are higher if they lie on the X chromosome (panel (b), mean f₂≈ 0.93) than pairs on autosomal chromosomes (panel (a), mean f₂≈ 0.90), implying that the X chromosome genes have accumulated fewer silent transitions at two fold redundant sites than the typical pair of orthologs. Since fewer than 5% of the genes lie on the X chromosome, this can account for only some of the overdispersion in the f₂values for rat-mouse orthologs. Interestingly, an analogous phenomenon was not observed in human-canine ortholog pairs (data not shown).

See this image and copyright information in PMC

Cited by

The natural history of class I primate alcohol dehydrogenases includes gene duplication, gene loss, and gene conversion.
Carrigan MA, Uryasev O, Davis RP, Zhai L, Hurley TD, Benner SA. Carrigan MA, et al. PLoS One. 2012;7(7):e41175. doi: 10.1371/journal.pone.0041175. Epub 2012 Jul 31. PLoS One. 2012. PMID: 22859968 Free PMC article.

References

1. Thomson JM, Gaucher EA, Burgan MF, De Kee DW, Li T, Aris JP, Benner SA. Resurrecting ancestral alcohol dehydrogenases from yeast. Nat Genet. 2005;37:630–635. doi: 10.1038/ng1553. - DOI - PMC - PubMed
1. Gaucher EA, Graddy LG, Li T, Simmen RC, Simmen FA, Schreiber DR, Liberles DA, Janis CM, Benner SA. The planetary biology of cytochrome P450 aromatases. BMC Biol. 2004;2:19. doi: 10.1186/1741-7007-2-19. - DOI - PMC - PubMed
1. Kumar S, Gadagkar SR. Disparity index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. Genetics. 2001;158:1321–1327. - PMC - PubMed
1. Li WH, Wu CI, Luo CC. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol Biol Evol. 1985;2:150–174. - PubMed
1. Nei M, Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986;3:418–426. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- Saccharomyces Genome Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Analysis of transitions at two-fold redundant sites in mammalian genomes. Transition redundant approach-to-equilibrium (TREx) distance metrics

Affiliation

Analysis of transitions at two-fold redundant sites in mammalian genomes. Transition redundant approach-to-equilibrium (TREx) distance metrics

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous