Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Jul 1;31(13):3843-9.
doi: 10.1093/nar/gkg627.

GeneFizz: A web tool to compare genetic (coding/non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectives

Affiliations

GeneFizz: A web tool to compare genetic (coding/non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectives

Edouard Yeramian et al. Nucleic Acids Res. .

Abstract

The GeneFizz (http://pbga.pasteur.fr/GeneFizz) web tool permits the direct comparison between two types of segmentations for DNA sequences (possibly annotated): the coding/non-coding segmentation associated with genomic annotations (simple genes or exons in split genes) and the physics-based structural segmentation between helix and coil domains (as provided by the classical helix-coil model). There appears to be a varying degree of coincidence for different genomes between the two types of segmentations, from almost perfect to non-relevant. Following these two extremes, GeneFizz can be used for two purposes: ab initio physics-based identification of new genes (as recently shown for Plasmodium falciparum) or the exploration of possible evolutionary signals revealed by the discrepancies observed between the two types of information.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representation of the helix-coil model. (A) With increasing temperatures, disruptions occur in the double-helix and specific regions (following the sequence) switch from the helical state to the coiled state. For a linear molecule, in addition to internal loops, the disruptions lead to single-stranded free-ends. (B) For the statistical mechanics calculations, simplified representations are adopted following which a base pair is either in the closed (helical) state or open (coiled) state. For a sequence of length n, the partition function is the sum of the weights associated with the 2n possible configurations. From the partition function, various quantities of interest (such as the opening probability along the sequence) are readily calculated. The weight attributed to a given configuration, such as the one represented diagramatically, corresponds to the equilibrium constant for its formation (from two single strands). For base pairs in the helical state, the weight corresponds to the sequence-dependent stacking energies. For denaturation bubbles, the weight corresponds to loop-entropies (power laws in j−α, depending on the length j of the loop; with a penalty σ0 for loop opening in the range 10−5–10−6). (C) Calculation scheme for long genomic sequences, sliced into stretches of length 10 000 bp. The stretches are chosen with an overlapping window of length 1000 bp. For a given stretch, the probabilities for the last 500 bp (represented in red) are discarded and replaced by the probabilities calculated for the same base pairs in the next stretch (with the 500 first probabilities being discarded). This scheme avoids end-effects for the linear helix-coil model calculations. Whenever ‘nnnnn’ or ‘NNNNNN’ stretches are encountered, end-effects cannot be avoided at both extremities of such stretches.
Figure 2
Figure 2
Outputs of GeneFizz and basic features. (AC) GeneFizz outputs for T.whipplei TW08/27 (accession: BX251411). (A) Corresponds to a 20 kb stretch. The GC% is plotted in green, in addition to the probability of helix opening curves (for the GC% curve the scale [0_100%] corresponds to the scale [0_1] for the probabilities). Probability curves, corresponding to increasing temperatures [following the colour legend below the panel in (B)], are superimposed. The zoom button at the right-side of each panel allows a 2× zooming for the output [thus in (A) the sequence extends from 80 000 to 100 000 bp; with the zooming in (B) the sequence extends from 80 000 to 90 000 bp; with the zooming in (C) the sequence extends from 85 000 to 90 000 bp]. Genes (with names as in the annotation) are represented as horizontal bars, above the probability curves. (DF) GeneFizz outputs for D.melanogaster (accession: AE003417). (E) Corresponds to a 2× zoom of (D). The red arrows in (E) indicate the occurrence of ‘nnnnnn’ stretches in the sequence (represented in the GeneFizz output as constant horizontal plots, at the value 0.5 throughout the lengths of the stretches). For split genes, the ‘gene’ and ‘CDS’ features are represented respectively as continuous horizontal bars in magenta (with the names of the genes) and split horizontal bars in blue (associated with the exons, as detailed in the ‘CDS’ feature of the annotation).
Figure 3
Figure 3
GeneFizz analyses for the P.falciparum genome. (AC) GeneFizz outputs for the chromosome 11 (accession: NC_004315). Each output corresponds to a 20 kb stretch. The GC% is plotted in green, in addition to the probability of helix opening curves (temperatures, following the colours, as indicated in the legend). The genes (following the annotations in NC_004315) are indicated as horizontal bars, with the interruptions corresponding to the exons in split genes. (D and E), close-up views plotted with the ‘results file’ downloaded from the GeneFizz output (text files for the probability curves, at the different temperatures). The region in (D) corresponds to the region underlined in blue in (C). Red arrows show putative missed exons.
Figure 4
Figure 4
GeneFizz analyses for the D.melanogaster genome. (A, C and E) GeneFizz outputs for the chromosome 2L (accession: AE003621, version: 14-FEB-2003). Conventions are as in Figure 3. (B, D and F) Close-up views for regions underlined in blue in (A), (C) and (E), respectively. These views were plotted with the ‘results file’ downloaded from the GeneFizz output.

Similar articles

Cited by

References

    1. Poland D. and Scheraga,H.R. (1970) Theory of Helix Coil Transitions in Biopolymers. Academic Press, New York.
    1. Cantor R.C. and Schimmel,P.R. (1980) Biophysical Chemistry. Part III: The Behaviour of Biological Macromolecules. W. H. Freeman and Company, New York.
    1. Yeramian E., Schaeffer,F., Caudron,B., Claverie,P. and Buc,H. (1990) An optimal formulation of the matrix method in statistical mechanics of one-dimensional interacting units: efficient iterative algorithmic procedures. Biopolymers, 30, 481–497.
    1. Watson J.D. and Crick,F.H.C. (1953) Molecular structure of nucleic acids. A structure for deoxyribose nucleic acid. Nature, 171, 737–738. - PubMed
    1. Fixman M. and Freire,J.J. (1977) Theory of DNA melting curves. Biopolymers, 16, 2693–2704. - PubMed

Publication types