Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Dec;14(12):2406-11.
doi: 10.1101/gr.3199704. Epub 2004 Nov 15.

Intraspecies sequence comparisons for annotating genomes

Affiliations
Comparative Study

Intraspecies sequence comparisons for annotating genomes

Dario Boffelli et al. Genome Res. 2004 Dec.

Abstract

Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intraspecies sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents, and a set of genomic intervals were amplified, resequenced, and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C. intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom. It also raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Phylogenetic relationships of C. intestinalis subpopulations. Consensus sequences for the col5a1 interval, obtained for each of the six subpopulations analyzed in this study, were used to calculate the population tree. Subpopulations are defined by their collection locations, as in Table 1. The size of the circle surrounding each subpopulation is proportional to the heterozygosity of that subpopulation.
Figure 2.
Figure 2.
(A) Mutation rate analysis of the genomic interval containing the 5′ region of the forkhead gene. The x-axis represents the position in the multiple alignment consensus sequence, the y-axis the log likelihood ratio for a fast- over a slow-mutation regime at that position. The plot is smoothed using a 20%-trimmed mean over the 24-base window centered at each aligned site. A lower ratio indicates a low mutation rate. The sequence of 33 individuals (total tree length = 0.28) was used to generate this plot. The blue bar labeled “P” indicates the position of the forkhead promoter; the red and purple bars indicate the positions of low- and high-mutation rate intervals, respectively, that were functionally analyzed in this study. (B) Transgenic analysis of intervals identified by mutation rate analysis of the 5′ region of the forkhead gene. C. intestinalis larvae were electroporated with a reporter construct containing the genomic fragments 1, 2, 4, 5, and 7, respectively and the expression was visualized by histochemical staining with X-gal. Constructs for region 2 never yielded LacZ expression, and the position marked on the plot corresponds to a segment previously analyzed (Di Gregorio et al. 2001). Red arrows indicate expression in the neural tube, yellow arrows that in the notochord, and green arrows in the endoderm. Constructs for region 2 failed to yield tissue-specific expression.
Figure 3.
Figure 3.
Mutation rate analysis of the genomic interval containing the 5′ region and the first exon of the snail gene. The plot was drawn as described in the Figure 2 legend. The sequence of 37 individuals (total tree length = 0.52) was used to generate this plot. The position of the first exon is indicated by the green bar labeled “E”; region 1 is snail's promoter, and region 2 is a constrained interval upstream of snail. The inset shows the transgenic analysis of region 2. C. intestinalis larvae were electroporated with a reporter construct containing region 2, and the expression was visualized by histochemical staining with X-gal. The red arrow indicates expression in the neural tube.
Figure 4.
Figure 4.
Mutation rate analysis of the genomic interval containing the 5′ region of the col5a1 (A) and patched (B) genes. The plot was drawn as described in the Figure 2 legend. The sequence of 36 and 22 individuals was used to generate the col5a1 and patched plots (total tree lengths were 0.69 and 0.10), respectively. The blue bar labeled “P” indicates the position of col5a1's promoter; the numbered green bars indicate the position of exons 1–4 of col5a1 and exons 18–25 of patched.

References

    1. Ansari-Lari, M.A., Oeltjen, J.C., Schwartz, S., Zhang, Z., Muzny, D.M., Lu, J., Gorrell, J.H., Chinault, A.C., Belmont, J.W., Miller, W., et al. 1998. Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. Genome Res. 8: 29-40. - PubMed
    1. Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 1391-1394. - PubMed
    1. Bray, N. and Pachter, L. 2003. MAVID multiple alignment server. Nucleic Acids Res. 31: 3525-3526. - PMC - PubMed
    1. Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A vision for the future of genomics research. Nature 422: 835-847. - PubMed
    1. Corbo, J.C., Erives, A., Di Gregorio, A., Chang, A., and Levine, M. 1997a. Dorsoventral patterning of the vertebrate neural tube is conserved in a protochordate. Development 124: 2335-2344. - PubMed

Web site references

    1. http://bonaire.lbl.gov/newshadower/; phylogenetic shadowing.
    1. www.phrap.org; consed suite.
    1. www.ebi.ac.uk/clustalw/index.html; multiple sequence alignment.
    1. baboon.math.berkeley.edu/mavid; multiple sequence alignment.

Publication types

Associated data