Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Apr 23:7:66.
doi: 10.1186/1471-2148-7-66.

Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of Arabidopsis thaliana and Arabidopsis lyrata

Affiliations

Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of Arabidopsis thaliana and Arabidopsis lyrata

Leah J DeRose-Wilson et al. BMC Evol Biol. .

Abstract

Background: There has been remarkably little study of nucleotide substitution rate variation among plant nuclear genes, in part because orthology is difficult to establish. Orthology is even more problematic for intergenic regions of plant nuclear genomes, because plant genomes generally harbor a wealth of repetitive DNA. In theory orthologous intergenic data is valuable for studying rate variation because nucleotide substitutions in these regions should be under little selective constraint compared to coding regions. As a result, evolutionary rates in intergenic regions may more accurately reflect genomic features, like recombination and GC content, that contribute to nucleotide substitution.

Results: We generated a set of 66 intergenic sequences in Arabidopsis lyrata, a close relative of Arabidopsis thaliana. The intergenic regions included transposable element (TE) remnants and regions flanking the TEs. We verified orthology of these amplified regions both by comparison of existing A. lyrata--A. thaliana genetic maps and by using molecular features. We compared substitution rates among the 66 intergenic loci, which exhibit ~5-fold rate variation, and compared intergenic rates to a set of 64 orthologous coding sequences. Our chief observations were that the average rate of nucleotide substitution is slower in intergenic regions than in synonymous sites, that rate variation in both intergenic and coding regions correlate with GC content, that GC content alone is not sufficient to explain differences in rates between intergenic and coding regions, and that rates of evolution in intergenic regions correlate negatively with gene density.

Conclusion: Our observations indicated that mutation rates vary among genomics regions as a function of base composition, suggesting that previous observations of "selective constraint" on non-coding regions could more accurately be attributed to a GC effect instead of selection. The negative correlation between nucleotide substitution rate and gene density provides a potential neutral explanation for a previously documented correlation between gene density and polymorphism levels within A. thaliana. Finally, we discuss potential forces that could contribute to rapid synonymous rates, and provide evidence to suggest that transcription-related mutation contributes to rate differences between intergenic and synonymous sites.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A box plot of genetic distances in the two sequence classes: intergenic and coding. The box represents the interquartile range, with the lines extending the range of the data. Points outside the range are mild outliers, with values greater then 1.5 X the upper bound of the interquartile range.
Figure 2
Figure 2
The correlation between recombination rate (x-axis) and genetic distance is not significant for combined coding and non-coding data (r = -0.10). Filled circles represent coding loci, and empty circles are non-coding loci.
Figure 3
Figure 3
The correlation between gene density, based on the number of genes in a 0.5 Mb window, and divergence is negative for both coding and non-coding data. Filled circles represent coding loci, and empty circles are non-coding loci. The higher regression line is based on coding data.
Figure 4
Figure 4
A) The correlation between GC content and genetic distance across both data types (r = 0.35; p < 0.0001). B) Analysis of covariance with sequence type, GC content and genetic distance. GC content contributes significantly (p < 0.003) to the variance in divergence, but there is an additional effect of sequence type on genetic distance that is not accounted for by GC content (p < 0.001). For both graphs, filled circles represent coding data and empty circles represent intergenic data.
Figure 5
Figure 5
Distribution of insertion sizes in A. lyrata and A. thaliana intergenic regions. A. thaliana insertions are presented in black, A. lyrata in grey.

Similar articles

Cited by

References

    1. Duret L, Mouchiroud D. Determinants of Substitution Rates in Mammalian Genes: Expression Pattern Affects Selection Intensity but Not Mutation Rate. Mol Biol Evol. 2000;17:68–70. - PubMed
    1. Hellmann I. I E, SE P, S P, M P. A neutral explanation for the correlation of diversity with recombination rates in humans. American Journal of Human Genetics. 2003;72:1527–1535. doi: 10.1086/375657. - DOI - PMC - PubMed
    1. Spencer CCA, Deloukas P, Hunt S, Mullikin J, Myers S, Silverman B, Donnelly P, Bentley D, McVean G. The Influence of Recombination on Human Genetic Diversity. PLoS Genetics. 2006;2:e148. doi: 10.1371/journal.pgen.0020148. - DOI - PMC - PubMed
    1. Kimura M. Evolutionary Rate at the Molecular Level. Nature. 1968;217:624–626. doi: 10.1038/217624a0. - DOI - PubMed
    1. Akashi H. Synonymous Codon Usage in Drosophila melanogaster: Natural Selection and Translational Accuracy. Genetics. 1994;136:927–935. - PMC - PubMed

Publication types