Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Nov 24;2(11):e204.
doi: 10.1371/journal.pgen.0020204. Epub 2006 Oct 18.

Genomic selective constraints in murid noncoding DNA

Affiliations

Genomic selective constraints in murid noncoding DNA

Daniel J Gaffney et al. PLoS Genet. .

Abstract

Recent work has suggested that there are many more selectively constrained, functional noncoding than coding sites in mammalian genomes. However, little is known about how selective constraint varies amongst different classes of noncoding DNA. We estimated the magnitude of selective constraint on a large dataset of mouse-rat gene orthologs and their surrounding noncoding DNA. Our analysis indicates that there are more than three times as many selectively constrained, nonrepetitive sites within noncoding DNA as in coding DNA in murids. The majority of these constrained noncoding sites appear to be located within intergenic regions, at distances greater than 5 kilobases from known genes. Our study also shows that in murids, intron length and mean intronic selective constraint are negatively correlated with intron ordinal number. Our results therefore suggest that functional intronic sites tend to accumulate toward the 5' end of murid genes. Our analysis also reveals that mean number of selectively constrained noncoding sites varies substantially with the function of the adjacent gene. We find that, among others, developmental and neuronal genes are associated with the greatest numbers of putatively functional noncoding sites compared with genes involved in electron transport and a variety of metabolic processes. Combining our estimates of the total number of constrained coding and noncoding bases we calculate that over twice as many deleterious mutations have occurred in intergenic regions as in known genic sequence and that the total genomic deleterious point mutation rate is 0.91 per diploid genome, per generation. This estimated rate is over twice as large as a previous estimate in murids.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Mean Nucleotide Substitution Rates in Different Sequence Types
Substitution rates were estimated at all sites (dark bars) and non-CpG-prone sites (light bars). Intronic substitution rates were estimated from all intronic sites, excluding splice regions which were assumed to occur in the first 20 and last 40 bp. 95% confidence intervals were estimated by bootstrapping the dataset by 1-Mb block, 1,000 times.
Figure 2
Figure 2. Substitution Rates at Simulated 4-Fold Degenerate, Intronic, and Repetitive Sites
Means over 100 simulated replicates, each of which evolved a single sequence containing ~8 Mb of coding, intronic, and repetitive sequence along two lineages are shown.
Figure 3
Figure 3. Estimated Mean Nucleotide Substitution Rate in Transposable Elements
Substitution rates were estimated at all sites (dark bars) and non-CpG-prone sites (light bars). Elements are subdivided into those found in intronic and intergenic sequence. 95% confidence intervals were estimated by bootstrapping the dataset by 1-Mb block, 1,000 times.
Figure 4
Figure 4. Change in Intronic Constraint with Distance from the Splice Sites
Constraint was estimated at non-CpG-prone sites in first (A and B) and non-first (C and D) introns. Dashed lines show 95% confidence intervals estimated by bootstrapping the dataset by 1-Mb block, 1,000 times.
Figure 5
Figure 5. Change in Intergenic Constraint with Distance from Transcription Start and Stop Points
Constraint was estimated at non-CpG-prone sites. Dots show 95% confidence intervals estimated by bootstrapping the data by 1-Mb block, 1,000 times.
Figure 6
Figure 6. Mouse Intron Length Selective Constraint and Ordinal Number
Bars show the 95% confidence interval of constraint obtained by bootstrapping the data by 1-Mb block, 1,000 times.
Figure 7
Figure 7. Proportion of G/C Bases and CpG Dinucleotides in Different Mouse Sequence Classes

Similar articles

Cited by

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. - PubMed
    1. Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, et al. Genome sequence of the brown Norway rat yields insights into mammalian evolution. Nature. 2004;428:493–521. - PubMed
    1. Mikkelsen TS, Hillier LW, Eichler EE, Zody MC, Jaffe DB, et al. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437:69–87. - PubMed
    1. Dermitzakis ET, Reymond A, Lyle R, Scamuffa N, Ucla C, et al. Numerous potentially functional but non-genic conserved sequences on human Chromosome 21. Nature. 2002;420:578–582. - PubMed

Substances