Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 May 24;33(9):2908-16.
doi: 10.1093/nar/gki609. Print 2005.

Prevalence of quadruplexes in the human genome

Affiliations

Prevalence of quadruplexes in the human genome

Julian L Huppert et al. Nucleic Acids Res. .

Abstract

Guanine-rich DNA sequences of a particular form have the ability to fold into four-stranded structures called G-quadruplexes. In this paper, we present a working rule to predict which primary sequences can form this structure, and describe a search algorithm to identify such sequences in genomic DNA. We count the number of quadruplexes found in the human genome and compare that with the figure predicted by modelling DNA as a Bernoulli stream or as a Markov chain, using windows of various sizes. We demonstrate that the distribution of loop lengths is significantly different from what would be expected in a random case, providing an indication of the number of potentially relevant quadruplex-forming sequences. In particular, we show that there is a significant repression of quadruplexes in the coding strand of exonic regions, which suggests that quadruplex-forming patterns are disfavoured in sequences that will form RNA.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Left: hydrogen bond pattern in a G-tetrad. A monvalent cation occupies the central position. Right: Schematic diagram of a unimolecular G-quadruplex structure.
Figure 2
Figure 2
Process for generating Markov windowed simulates. A real chromosome (top) is separated into discrete windows. For each of these, a table of base and diad frequencies is generated (middle), which is then used to generate a simulated window (bottom), which are then joined to produce the replicate chromosome.
Figure 3
Figure 3
Left: frequency distributions of loops of lengths 1–7 bases for the entire human genome. Right: percentage excesses of loop 2 counts over the averages of loops 1 and 3 for the entire human genome.
Figure 4
Figure 4
Mosaic plot representing the loop lengths of all putative quadruplexes found in the human genome. The seven principle columns represent the lengths of the first loop, the seven rows the lengths of the second loop, and the seven segments in each box the lengths of the third loop. The area of each box is proportional to the number of sequences found with that combination of loop lengths. The plot was produced using the program R, () using the command mosaicplot.

References

    1. Gellert M., Lipsett M.N., Davies D.R. Helix formation by guanylic acid. Proc. Natl Acad. Sci. USA. 1962;48:2013–2018. - PMC - PubMed
    1. Guschlbauer W., Chantot J.F., Theile D. Four-stranded nucleic structures 25 years later: from guanosine gels to telomere DNA. J. Biomol. Struct. Dyn. 1990;8:491–511. - PubMed
    1. Blackburn E.H. Telomeres and their synthesis. Science. 1990;249:489–490. - PubMed
    1. Blackburn E.H. Structure and function of telomeres. Nature. 1991;350:569–573. - PubMed
    1. Wang Y., Patel D.J. Solution structure of the human telomeric repeat d[AG3(T2AG3)3] G-tetraplex. Structure. 1993;1:263–282. - PubMed

Publication types