Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jan 14:2023.05.02.538944.
doi: 10.1101/2023.05.02.538944.

A Narrow Range of Transcript-error Rates Across the Tree of Life

Affiliations

A Narrow Range of Transcript-error Rates Across the Tree of Life

Weiyi Li et al. bioRxiv. .

Update in

Abstract

The expression of genomically-encoded information is not error-free. Transcript-error rates are dramatically higher than DNA-level mutation rates, and despite their transient nature, the steady-state load of such errors must impose some burden on cellular performance. However, a broad perspective on the degree to which transcript-error rates are constrained by natural selection and diverge among lineages remains to be developed. Here, we present a genome-wide analysis of transcript-error rates across the Tree of Life using a modified rolling-circle sequencing method, revealing that the range in error rates is remarkably narrow across diverse species. Transcript errors tend to be randomly distributed, with little evidence supporting local control of error rates associated with gene-expression levels. A majority of transcript errors result in missense errors if translated, and as with a fraction of nonsense transcript errors, these are underrepresented relative to random expectations, suggesting the existence of mechanisms for purging some such errors. To quantitatively understand how natural selection and random genetic drift might shape transcript-error rates across species, we present a model based on cell biology and population genetics, incorporating information on cell volume, proteome size, average degree of exposure of individual errors, and effective population size. However, while this model provides a framework for understanding the evolution of this highly conserved trait, as currently structured it explains only 20% of the variation in the data, suggesting a need for further theoretical work in this area.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Rates and molecular spectra of transcript errors across species. A) A summary of all available estimates of transcript-error rates from mRNAs transcribed from nuclear/nucleoid chromosomes and RNAs from mitochondria- and chloroplast-encoded RNA polymerases. Estimates for A. tumefaciens, B. subtilis, E. coli, M. florum, C. elegans, D. melanogaster, H. sapiens, and M. musculus were obtained from previous studies using the same modified CirSeq method (Table S1). Error bars denote standard errors of the mean transcript-error rates calculated from biological replicates. The phylogeny was generated according to the NCBI taxonomy database. B) The molecular spectra of transcript errors from nuclear chromosomal mRNAs. The conditional error rates of each type of ribonucleotide (rNTP) substitution were calculated from the number of detected errors divided by the number of corresponding rNTPs evaluated. Error bars indicate standard errors.
Figure 2.
Figure 2.
Characterization of potential functional effects of transcript errors A) Fractions of transcript errors from chromosomal protein-coding genes are classified as synonymous, missense, and nonsense if translated. Stop-codon loss errors detected in A. thaliana, C. reinhardtii, P. tetraurelia, and P. caudatum with minimal percentages are not displayed. B) The ratio of observed to expected rates for each type of errors. The expected error rates were calculated assuming a random generation of transcript errors according to the bias of ribonucleotide substitution rates and codon usages of each species, and in the absence of any correction mechanisms. Each dot represents a ratio from one species. The ratio of 1.0 is indicated by a dashed line.
Figure 3.
Figure 3.
A) Transcript-error and genomic-mutation rates (substitutions per nucleotide site) for species across the Tree of Life. Genomic-mutation rate data are derived from Lynch et al. (2016), Long et al. (2017), and Lynch and Trickovic (2020). Diagonal dashed lines are isoclines for ratios of the two rates. B) Log-log regression of the observed transcript-error rates for unicellular species as a function of the composite parameter described in the text; the best fit (given by the regression line) is obtained with exponent value x=0.5 (with the statistical fits with alternative x values given in the inset, and the support interval within which the regression remains significant at the 0.05 level denoted by the dashed line). The data points for multicellular eukaryotes are given for reference, but not used in the regression. PL denotes the total proteome size (summed over all codons, in megabases). Standard errors of the individual measures of transcript-error rates are generally smaller than the widths of the points. All data, including those for unlabeled bacteria (which include all species except for Kineococcus, for which Ne data were unavailable), are contained in Supplementary Table S1. At = Arabidopsis thaliana; Ce = Caenorhabditis elegans; Cr = Chlamydomonas reinhardtii; Dm = Drosophila melanogaster; Hs = Homo sapiens; Mf = Mesoplasma florum; Mm = Mus musculus; Pc = Paramecium caudatum; Pt = Paramecium tetraurelia; Sc = Saccharomyces cerevisiae.
Figure 4.
Figure 4.
Transcript-error rates of protein-coding genes at different expression levels. A generalized linear model (Methods) was applied to expression levels (FPKM) and transcript-error rates of individual protein-coding genes to evaluate potential correlations. Expression levels of genes were obtained from CirSeq reads, which provide estimates for expression levels consistent with regular RNA-seq reads (Figure S1). Slopes with P-values less than 0.05/21 (Bonferroni correction for regression analyses for 21 species) are considered statistically significant. Significant positive and negative regressions are highlighted with blue and red lines, respectively. For visualization purposes only, protein-coding genes are ranked according to expression levels (low to high) and grouped into bins of equal cumulative width in the total number of sequenced nucleotides with 5% increments. Discrete FPKM of some genes in M. musculus, P. caudatum, and M. florum result in a few of the 20 bins being empty, and these are combined with adjacent bins on the larger FPKM side. Error bars denote standard errors of the mean error rates calculated from u(1u)/n, where u is the mean error rate of the bin and n denotes the number of nucleotides assayed in the corresponding bin.

References

    1. Acevedo A., and Andino R.. 2014. Library preparation for highly accurate population sequencing of RNA viruses. Nature Protocols 9: 1760–1769. - PMC - PubMed
    1. Alic N., Ayoub N., Landrieux E., Favry E., Baudouin-Cornu P., Riva M., and Carles C.. 2007. Selectivity and proofreading both contribute significantly to the fidelity of RNA polymerase III transcription. Proc. Natl. Acad. Sci. USA 104: 10400–10405. - PMC - PubMed
    1. Ardehali M. B., and Lis J. T.. 2009. Tracking rates of transcription and splicing in vivo. Nat. Struct. Mol. Biol. 16: 1123–1124. - PubMed
    1. Bacher J. M., Waas W. F., Metzgar D., de Crécy-Lagard V., and Schimmel P.. 2007. Genetic code ambiguity confers a selective advantage on Acinetobacter baylyi. J. Bacteriol. 189: 6494–6496. - PMC - PubMed
    1. Börner T., Aleynikova A. Y., Zubo Y. O., and Kusnetsov V. V.. 2015. Chloroplast NA polymerases: role in chloroplast biogenesis. Biochim. Biophys. Acta (BBA)-Bioenergetics 1847: 761–769. - PubMed

Publication types

LinkOut - more resources