Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1999 Sep;9(9):815-24.
doi: 10.1101/gr.9.9.815.

Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs

Affiliations

Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs

N Jareborg et al. Genome Res. 1999 Sep.

Erratum in

  • Genome Res 1999 Nov;9(11):1156

Abstract

A data set of 77 genomic mouse/human gene pairs has been compiled from the EMBL nucleotide database, and their corresponding features determined. This set was used to analyze the degree of conservation of noncoding sequences between mouse and human. A new alignment algorithm was developed to cope with the fact that large parts of noncoding sequences are not alignable in a meaningful way because of genetic drift. This new algorithm, DNA Block Aligner (DBA), finds colinear-conserved blocks that are flanked by nonconserved sequences of varying lengths. The noncoding regions of the data set were aligned with DBA. The proportion of the noncoding regions covered by blocks >60% identical was 36% for upstream regions, 50% for 5' UTRs, 23% for introns, and 56% for 3' UTRs. These blocks of high identity were more or less evenly distributed across the length of the features, except for upstream regions in which the first 100 bp upstream of the transcription start site was covered in up to 70% of the gene pairs. This data set complements earlier sets on the basis of cDNA sequences and will be useful for further comparative studies. [This paper contains supplementary data that can be found at http://www.genome.org [corrected]].

PubMed Disclaimer

Figures

Figure 1
Figure 1
Finite state model of the DBA algorithm. States are represented by labeled circles, transitions between states are represented by arrows. Emission probability values within brackets indicate the default parameters for DBA. The Match A–D states represent conserved regions of 65%, 75%, 85%, and 95% identity, respectively. The two Unmatch states represent large gaps between the conserved blocks. The process starts and ends in the Unmatch states.
Figure 2
Figure 2
DBA coverage and block size distributions. Box plots of DBA block coverage for different noncoding features (a), and DBA block lengths for the different similarity categories A–D (b). The central box depicts the middle half of the data between percentiles 25 and 75; the solid lines indicate the medians of each distribution. The dots indicate extreme values that fall outside of percentiles 10 and 90, respectively.
Figure 3
Figure 3
DBA block and repeat coverage of noncoding features. Coverage as a function of positions from the end of the upstream regions (a), the start of the 5′ UTRs (b), the end of the 5′ UTRs (c), the start of the introns (d), the end of the introns (e), the start of the 3′ UTRs (f), and the end of the 3′ UTRs (g) for DBA blocks (solid lines) and repeats (dotted lines) are shown.

References

    1. Altschul S, Gish W, Miller W, Meyers E, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Ansari-Lari MA, Oeltjen JC, Schwartz S, Zhang Z, Muzny DM, Lu J, Gorrell JH, Chinault AC, Belmont JW, Miller W, Gibbs RA. Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. Genome Res. 1998;8:29–40. - PubMed
    1. Antequera F, Bird A. Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci. 1993;90:11995–11999. - PMC - PubMed
    1. Bernardi G. The isochore organization of the human genome and its evolutionary history–A review. Gene. 1993;135:57–66. - PubMed
    1. Birney E, Durbin R. Dynamite: A flexible code generating language for dynamic programming methods used in sequence comparison. Intell Syst Mol Biol. 1997;5:56–64. - PubMed

Publication types

Substances

LinkOut - more resources