Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Dec;3(12):e254.
doi: 10.1371/journal.pcbi.0030254. Epub 2007 Nov 14.

Analysis of sequence conservation at nucleotide resolution

Affiliations

Analysis of sequence conservation at nucleotide resolution

Saurabh Asthana et al. PLoS Comput Biol. 2007 Dec.

Abstract

One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Examples of SCONE p-Value Scores for Coding (A), Highly Conserved Noncoding (B), and Nonconserved Regions
Positions likely to be conserved (p < 0.05) are in light green; other positions are dark. Below each plot is the portion of the multiple sequence used to generate scores for each sequence region. Deviations from human sequence (green) are indicated in red. (A) A portion of an exon from the MET gene (chr7:115,933,744–115,933,793). The pattern of conserved positions is indicative of the triplet structure of the genetic code. (B) A highly conserved intronic sequence in the FOXP2 gene (chr7:113,646,877–113,646,926). (C) An intergenic region near the AXIN1 gene (chr16:343,046–343,095) showing little overall conservation, but containing a significant number of individually conserved positions.
Figure 2
Figure 2. Rare Derived Allele Frequency in Conserved versus Nonconserved Sites
Positions are partitioned according to (i) ENCODE MCS elements for all ENCODE positions, (ii) SCONE conservation score for all ENCODE positions, and (iii) SCONE conservation score for all ENCODE positions outside of MCS elements. p-Values are calculated using Fisher's exact test.
Figure 3
Figure 3. Conservation in Various Functional Classes
For each functional class, the fraction of positions with SCONE p-value ≤ 0.005 is shown, both including (dark) and excluding (light) positions falling within MCS elements. Ancestral repeats are included as a control.
Figure 4
Figure 4. Islands of Conservation in Functionally Annotated Regions
(A) Localization of short (5–12 bp) conserved islands in functionally annotated regions. Shown is the fraction of all islands that fall within a particular region. Nongenic transcribed regions were omitted to preserve scale, but contain 93% of short conserved islands. (B) Fold excess of short (5–12 bp) conserved islands in functionally annotated regions compared to ancestral repeat regions. Shown is the ratio of the density in each region (number of clusters divided by total number of positions in the region) to the density in ancestral repeat positions.

Similar articles

Cited by

References

    1. Koonin EV, Galperin MY. Sequence—evolution—function: computational approaches in comparative genomics. Boston: Kluwer Academic; 2003. p. xiii.461. p., 411 plates. - PubMed
    1. Ponting CP, Schultz J, Copley RR, Andrade MA, Bork P. Evolution of domain families. Adv Protein Chem. 2000;54:185–244. - PubMed
    1. Gaucher EA, Gu X, Miyamoto MM, Benner SA. Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem Sci. 2002;27:315–321. - PubMed
    1. Lichtarge O, Sowa ME. Evolutionary predictions of binding surfaces and interactions. Curr Opin Struct Biol. 2002;12:21–27. - PubMed
    1. Hurst LD. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet. 2002;18:486. - PubMed

Publication types