Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 May 30:13:206.
doi: 10.1186/1471-2164-13-206.

Limitations of the rhesus macaque draft genome assembly and annotation

Affiliations

Limitations of the rhesus macaque draft genome assembly and annotation

Xiongfei Zhang et al. BMC Genomics. .

Abstract

Finished genome sequences and assemblies are available for only a few vertebrates. Thus, investigators studying many species must rely on draft genomes. Using the rhesus macaque as an example, we document the effects of sequencing errors, gaps in sequence and misassemblies on one automated gene model pipeline, Gnomon. The combination of draft genome with automated gene finding software can result in spurious sequences. We estimate that approximately 50% of the rhesus gene models are missing, incomplete or incorrect. The problems identified in this work likely apply to all draft vertebrate genomes annotated with any automated gene model pipeline and thus represent a pervasive challenge to the analysis of draft genomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
ACTRT1. Incorrect insertion results in a frameshift. Arrow points to a sequencing error (incorrect insertion) in the rhesus draft sequence for the single ACTRT1 exon. The reverse complement is shown to facilitate comparison with the corrected sequence and translated proteins. Red "G" indicates sequencing error. Yellow highlighting indicates nucleotide sequence. Green highlighting indicates correct protein sequence. Pink highlighting indicates spurious protein sequence caused by the sequencing error (insertion)
Figure 2
Figure 2
ADBR1. Incorrect sequence results in a premature stop codon. Arrow points to a sequencing error ("A" instead of "C") in the rhesus draft sequence for the single ADBR1 exon. Yellow highlighting indicates nucleotide sequence. Green highlighting indicates correct protein sequence. Pink highlighting indicates premature truncation of protein sequence caused by the sequencing error
Figure 3
Figure 3
ADCY3. Missing exon results in spurious sequence. Red "exon 15" indicates false exon created by Gnomon from intronic sequence. Correct exons are indicated in green boxes. Green letters at the bottom of the panel indicate boundaries of correct exons. Yellow highlighting indicates nucleotide sequence. Green highlighting indicates correct protein sequence. Pink highlighting indicates spurious protein sequence caused by the false exon
Figure 4
Figure 4
AADAT. Missing exon results in spurious sequence. Red "exon 12" indicates false exon created by Gnomon from intronic sequence. Correct exons are indicated in green boxes. Green letters at the bottom of the panel indicate boundaries of correct exons. Yellow highlighting indicates nucleotide sequence. Green highlighting indicates correct protein sequence. Pink highlighting indicates spurious protein sequence caused by the false exon
Figure 5
Figure 5
SERPINB6. Gene split between two chromosomes. Red exon boxes indicate exons assigned to the wrong chromosome. Green exon boxes indicate exons assigned to the correct chromosome. Accession numbers at left indicate genomic (top) and mRNA (bottom) sequences. Range of mRNA corresponding to exons is indicated by black numbers under exon boxes. Exons not drawn to scale
Figure 6
Figure 6
RALY. Gene split between two chromosomes. Red exon box indicates exon assigned to the wrong chromosome. Green exon boxes indicate exons assigned to the correct chromosome. Accession numbers at left indicate genomic (top) and mRNA (bottom) sequences. Range of mRNA corresponding to exons is indicated by black numbers under exon boxes. Exons not drawn to scale
Figure 7
Figure 7
CCDC135. Gene split between two chromosomes. Red exon boxes indicate exons assigned to the wrong chromosome. Green exon boxes indicate exons assigned to the correct chromosome. Accession numbers at left indicate genomic (top) and mRNA (bottom) sequences. Range of mRNA corresponding to exons is indicated by black numbers under exon boxes. Exons not drawn to scale
Figure 8
Figure 8
VPS13D. Gene split between two chromosomes and failure to integrate an unlocalized contig. Red exon boxes indicate exons assigned to the wrong chromosome. Green exon boxes indicate exons assigned to the correct chromosome. Accession numbers at left indicate genomic (top) and mRNA (bottom) sequences. Range of mRNA corresponding to exons is indicated by black numbers under exon boxes. Forward slashes in panel a indicate breakpoints where genomic fragments were not incorporated in the chromsome 1 file. Three dots were used in panel a to indicate exons not shown (exons 5–51 and 56–69). Exons not drawn to scale
Figure 9
Figure 9
SHE. Gene split between two chromosomes. Red exon box indicates exon assigned to the wrong chromosome. Green exon boxes indicate exons assigned to the correct chromosome. Accession numbers at left indicate genomic (top) and mRNA (bottom) sequences. Range of mRNA corresponding to exons is indicated by black numbers under exon boxes. Exons not drawn to scale
Figure 10
Figure 10
BBS1. Genomic fragment containing exon in the wrong orientation. Red exon box indicates exon in the wrong orientation with respect to the other exons in this gene. Green exon boxes indicate exons in the correct orientation. Accession numbers at left indicate genomic (top) and mRNA (bottom) sequences. Range of mRNA corresponding to exons is indicated by black numbers under exon boxes. Exons not drawn to scale
Figure 11
Figure 11
Pie chart illustrating categories of gene annotations

References

    1. Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter J, Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter J, Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter J, Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter J. Rhesus Macaque Genome Sequencing and Analysis Consortium et al.Evolutionary and biomedical insights from the rhesus macaque genome. Science. 2007;316:222–234. - PubMed
    1. Barr CS, Newman TK, Becker ML, Parker CC, Champoux M, Lesch KP, Goldman D, Suomi SJ, Higley JD. The utility of the non-human primate; model for studying gene by environment interactions in behavioral research. Genes Brain Behav. 2003;2:336–340. doi: 10.1046/j.1601-1848.2003.00051.x. - DOI - PubMed
    1. Arthur Chang TC, Chan AW. Assisted reproductive technology in nonhuman primates. Methods Mol Biol. 2011;770:337–363. doi: 10.1007/978-1-61779-210-6_13. - DOI - PubMed
    1. Messaoudi I, Estep R, Robinson B, Wong SW. Nonhuman primate models of human immunology. Antioxid Redox Signal. 2011;14:261–273. doi: 10.1089/ars.2010.3241. - DOI - PMC - PubMed
    1. Niu Y, Yu Y, Bernat A, Yang S, He X, Guo X, Chen D, Chen Y, Ji S, Si W, Lv Y, Tan T, Wei Q, Wang H, Shi L, Guan J, Zhu X, Afanassieff M, Savatier P, Zhang K, Zhou Q, Ji W. Transgenic rhesus monkeys produced by gene transfer into early-cleavage-stage embryos using a simian immunodeficiency virus-based vector. Proc Natl Acad Sci USA. 2010;107:17663–17667. doi: 10.1073/pnas.1006563107. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources