Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(9):e45050.
doi: 10.1371/journal.pone.0045050. Epub 2012 Sep 27.

Asymmetry indices for analysis and prediction of replication origins in eukaryotic genomes

Affiliations

Asymmetry indices for analysis and prediction of replication origins in eukaryotic genomes

Marie-Claude Marsolier-Kergoat. PLoS One. 2012.

Abstract

DNA replication was recently shown to induce the formation of compositional skews in the genomes of the yeasts Saccharomyces cerevisiae and Kluyveromyces lactis. In this work, I have characterized further GC and TA skew variations in the vicinity of S. cerevisiae replication origins and termination sites, and defined asymmetry indices for origin analysis and prediction. The presence of skew jumps at some termination sites in the S. cerevisiae genome was established. The majority of S. cerevisiae replication origins are marked by an oriented consensus sequence called ACS, but no evidence could be found for asymmetric origin firing that would be linked to ACS orientation. Asymmetry indices related to GC and TA skews were defined, and a global asymmetry index I(GC,TA) was described. I(GC,TA) was found to strongly correlate with origin efficiency in S. cerevisiae and to allow the determination of sets of intergenes significantly enriched in origin loci. The generalized use of asymmetry indices for origin prediction in naive genomes implies the determination of the direction of the skews, i.e. the identification of which strand, leading or lagging, is enriched in G and which one is enriched in T. Recent work indicates that in Candida albicans and in several related species, centromeres contain early and efficient replication origins. It has been proposed that the skew jumps observed at these positions would reflect the activity of these origins, thus allowing to determine the direction of the skews in these genomes. However, I show here that the skew jumps at C. albicans centromeres are not related to replication and that replication-associated GC and TA skews in C. albicans have in fact the opposite directions of what was proposed.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The author has declared that no competing interests exist.

Figures

Figure 1
Figure 1. GC and TA skew jumps at replication origins (A) and at termination sites (B) in Saccharomyces cerevisiae.
Average values of GC and TA skews computed for third codon positions (thick, red line and thick, blue line, respectively) and for intergenes (thin, red line and thin, blue line, respectively) were calculated every 500 bp over a sliding window of 10 kb. Position 0 corresponds to the position of the extended ACS sequence 5′-(T/A)(T/G)TTTAT(G/A)TTT(T/A)(G/C)(T/G)T-3′ for origins (A) and to the midpoint of termination regions (B). Average GC and TA skew values related to protein coding were substracted from the total GC and TA skew values calculated for third codon positions, so as to plot on the same graph the skew curves corresponding to intergenes and to third codon positions. ACS orientation is marked by an arrow in (A).
Figure 2
Figure 2. Schematics illustrating the computation of asymmetry indices.
(A) Definition of the four strand segments. (B) Four simple cases are shown for different positions x. Segments synthesized as leading and lagging strands are shown as blue and red lines, respectively. The small, black rectangles represent coding sequences with their transcriptional orientation given by the associated arrows. The large, black arrowheads indicate the orientation of the replication forks.
Figure 3
Figure 3. Variations of asymmetry indices around replication origins (A) and termination sites (B) in Saccharomyces cerevisiae.
The average values of IGC,cod (thick, red line), ITA,cod (thick, blue line), IGC,int (thin, red line) and ITA,int (thin, blue line) were computed every 500 bp using a window length L = 10 kb. Position 0 corresponds to the position of the extended ACS sequence 5′-(T/A)(T/G)TTTAT(G/A)TTT(T/A)(G/C)(T/G)T-3′ for origins (A) and to the midpoint of termination regions (B). All origins were oriented according to their ACS, as indicated by the arrow in (A).
Figure 4
Figure 4. Origin efficiency correlates with the global asymmetry index IGC,TA in Saccharomyces cerevisiae.
(A,B) Box plots displaying the differences in IGC,TA values between replication origins annotated as confirmed, likely, and dubious (A) or as chromosomically active and inactive (B). The horizontal lines show the median values. The bottoms and tops of the boxes show the 25th and the 75th percentiles, respectively. (C,D) IGC,TA values are plotted as a function of origin efficiency and average replication time, respectively. The dashed lines correspond to least square fits.
Figure 5
Figure 5. ROC curves displaying the discriminative power of the asymmetry indices IGC,cod, ITA,cod, IGC,int and ITA,int.
(A) Intergenes were classified according to their values of IGC,cod (thick, red line), ITA,cod (thick, blue line), IGC,int (thin, red line) and ITA,int (thin, blue line). (B) Intergenes were classified according to their values of IGC,cod+ITA,cod (thin, red line), IGC,int+ITA,int (thin, blue line) and IGC,TA (thick, red line). The ROC curve corresponding to IGC,cod (thin, black line) is shown for comparison.
Figure 6
Figure 6. Variations of GC skew around Candida albicans centromeres (A) and around Saccharomyces cerevisiae replication origins (B).
(A) Average values of GC skew were computed every 200 bp using sliding windows of 1.5 kb and taking into account all sequences, as described in . (B) Average GC skew values were computed every 200 bp using sliding windows of 1.5 kb and taking into account third codon positions. The horizontal lines correspond to the mean GC skew values, averaged over all positions. The dashed, vertical lines mark positions −3 and 3 (kb) for C. albicans and positions −20 and 20 (kb) for S. cerevisiae.
Figure 7
Figure 7. Variations of average GC and TA skews across interorigin intervals in Candida albicans.
The skews, given in percent, were computed for third codon positions (A) and for intergenes (B). The lines correspond to the fits determined using quasibinomial models (see Materials and Methods).

References

    1. Lobry JR (1995) Properties of a general model of DNA evolution under no-strand-bias conditions. J Mol Evol 40: 326–330. - PubMed
    1. Sueoka N (1995) Intrastrand parity rules of DNA base composition and usage biases of synonymous codons. J Mol Evol 40: 318–325. - PubMed
    1. Pavlov YI, Newlon CS, Kunkel TA (2002) Yeast origins establish a strand bias for replicational mutagenesis. Mol Cell 10: 207–213. - PubMed
    1. Roberts JD, Izuta S, Thomas DC, Kunkel TA (1994) Mispair-, site-, and strand-specific error rates during simian virus 40 origin-dependent replication in vitro with excess deoxythymidine triphosphate. J Biol Chem 269: 1711–1717. - PubMed
    1. Rocha EP, Touchon M, Feil EJ (2006) Similar compositional biases are caused by very different mutational effects. Genome Res 16: 1537–1547. - PMC - PubMed

LinkOut - more resources