Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Aug;18(8):1216-23.
doi: 10.1101/gr.076570.108. Epub 2008 May 7.

Transcription induces strand-specific mutations at the 5' end of human genes

Affiliations

Transcription induces strand-specific mutations at the 5' end of human genes

Paz Polak et al. Genome Res. 2008 Aug.

Abstract

A regional analysis of nucleotide substitution rates along human genes and their flanking regions allows us to quantify the effect of mutational mechanisms associated with transcription in germ line cells. Our analysis reveals three distinct patterns of substitution rates. First, a sharp decline in the deamination rate of methylated CpG dinucleotides, which is observed in the vicinity of the 5' end of genes. Second, a strand asymmetry in complementary substitution rates, which extends from the 5' end to 1 kbp downstream from the 3' end, associated with transcription-coupled repair. Finally, a localized strand asymmetry, an excess of C-->T over G-->A substitution in the nontemplate strand confined to the first 1-2 kbp downstream of the 5' end of genes. We hypothesize that higher exposure of the nontemplate strand near the 5' end of genes leads to a higher cytosine deamination rate. Up to now, only the somatic hypermutation (SHM) pathway has been known to mediate localized and strand-specific mutagenic processes associated with transcription in mammalia. The mutational patterns in SHM are induced by cytosine deaminase, which just targets single-stranded DNA. This DNA conformation is induced by R-loops, which preferentially occur at the 5' ends of genes. We predict that R-loops are extensively formed in the beginning of transcribed regions in germ line cells.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Regions of analysis around the 5′ end of genes. The substitution analysis was done in the 10,000-bp-long regions centered on the 5′ end of gene (denoted by two vertical lines). This region of analysis was further truncated if the next upstream gene was closer than 10,000 bp, or the 3′ end of the gene was closer than 5000 bp. Further, we excluded all exons. Bold lines depict the finally analyzed sequences.
Figure 2.
Figure 2.
Substitution rates in introns and in intergenic regions in the vicinity of 5′ and 3′ ends of human genes. The plots show the estimated 12 single-nucleotide substitution rates and the CpG deamination rates in nonoverlapping 200-bp-long windows along the nontemplate strand. The distances of the windows’ centers from the 5′ or 3′ end are indicated on the X-axes. The estimation of substitution frequencies has been performed using the nontemplate strand.
Figure 3.
Figure 3.
Ratios between complementary transition rates and the GC content plotted against distance from the 5′ end of genes calculated in 200-bp-long windows along the nontemplate strand, combined with information from all genes and presented by six different genomic contexts. (Intronic) Genes that were used in Figure 2; (introns w/o 200bp) the 200 bp in introns’ edges were excluded; (GC-rich windows and AT-rich windows) DNA sequences with GC content of >50% and <41%, respectively. Windows that contained less than 100 kbp-long DNA sequences were omitted (see Supplemental Fig. S15 for the amount of sequence in each window).
Figure 4.
Figure 4.
Correlation between strand asymmetry and transcription status of genes in embryonic stem cells (ESC). The ratios between complementary transition rates and the GC content are calculated in three gene classes (Guenther et al. 2007): genes that experienced initiation and transcription (exp+init+); genes that experienced initiation but not complete transcription (exp−init+); genes that experienced initiation but not complete transcription (exp−init−). The length and the nucleotides’ composition properties of the sequences that were used for substitution estimation in each window are presented in Supplemental Figure S16. The estimation of substitution frequencies has been performed using the nontemplate strand.

Similar articles

Cited by

References

    1. Aerts S., Thijs G., Dabrowski M., Moreau Y., De Moor B., Thijs G., Dabrowski M., Moreau Y., De Moor B., Dabrowski M., Moreau Y., De Moor B., Moreau Y., De Moor B., De Moor B. Comprehensive analysis of the base composition around the transcription start site in Metazoa. BMC Genomics. 2004;5:34. doi: 10.1186/1471-2164-5-34. - DOI - PMC - PubMed
    1. Aguilera A., Gomez-Gonzalez B., Gomez-Gonzalez B. Genome instability: A mechanistic view of its causes and consequences. Nat. Rev. Genet. 2008;9:204–217. - PubMed
    1. Aladjem M.I. Replication in context: Dynamic regulation of DNA replication patterns in metazoans. Nat. Rev. Genet. 2007;8:588–600. - PubMed
    1. Arndt P.F., Hwa T., Hwa T. Identification and measurement of neighbor-dependent nucleotide substitution processes. Bioinformatics. 2005;21:2322–2328. - PubMed
    1. Arndt P.F., Petrov D.A., Hwa T., Petrov D.A., Hwa T., Hwa T. Distinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation. Mol. Biol. Evol. 2003;20:1887–1896. - PubMed

LinkOut - more resources