Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013;54(2):144-53.
doi: 10.1093/ilar/ilt037.

Improving genome assemblies and annotations for nonhuman primates

Review

Improving genome assemblies and annotations for nonhuman primates

Robert B Norgren Jr. ILAR J. 2013.

Abstract

The study of nonhuman primates (NHP) is key to understanding human evolution, in addition to being an important model for biomedical research. NHPs are especially important for translational medicine. There are now exciting opportunities to greatly increase the utility of these models by incorporating Next Generation (NextGen) sequencing into study design. Unfortunately, the draft status of nonhuman genomes greatly constrains what can currently be accomplished with available technology. Although all genomes contain errors, draft assemblies and annotations contain so many mistakes that they make currently available nonhuman primate genomes misleading to investigators conducting evolutionary studies; and these genomes are of insufficient quality to serve as references for NextGen studies. Fortunately, NextGen sequencing can be used in the production of greatly improved genomes. Existing Sanger sequences can be supplemented with NextGen whole genome, and exomic genomic sequences to create new, more complete and correct assemblies. Additional physical mapping, and an incorporation of information about gene structure, can be used to improve assignment of scaffolds to chromosomes. In addition, mRNA-sequence data can be used to economically acquire transcriptome information, which can be used for annotation. Some highly polymorphic and complex regions, for example MHC class I and immunoglobulin loci, will require extra effort to properly assemble and annotate. However, for the vast majority of genes, a modest investment in money, and a somewhat greater investment in time, can greatly improve assemblies and annotations sufficient to produce true, reference grade nonhuman primate genomes. Such resources can reasonably be expected to transform nonhuman primate research.

Keywords: ape, evolution; genome annotation; genome assembly; lemur; monkey; nonhuman primates; translational research.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Phylogeny of nonhuman primates The point of divergence from human's the last common ancestor is indicated at the branching points in millions of years ago (MYA). Strepsirrhines include the aye-aye (Daubentonia madagascariensis), grey mouse lemur (Microcebus murinus), and sifaka (Propithecus coquerelli). New world monkeys include the common marmoset (Callithrix jacchus). Old world monkeys include rhesus macaques (Macaca mulatta), cynomolgus monkeys (Macaca fascicularis), pigtail macaques (Macaca nemestrina), baboons (Papio anubis), African green monkeys (genus: Chlorocebus), and sooty mangabeys (Cercocebus atys). Gibbons are lesser apes, including the white-cheeked gibbon (Nomascus leucogenys). Orangutans (Pongo abelii), gorillas (Gorilla gorilla), chimpanzees (Pan troglodytes), and bonobos (Pan paniscus) are all great apes. Bonobos and chimpanzees last shared a common ancestor about one million years ago. Draft genomes are (or will be) available for all the species listed (see Table 1).
Figure 2:
Figure 2:
Schematic diagrams illustrating assembly and annotation errors in the rhesus macaque draft genome. (A) Scaffold assigned to the wrong chromosome: The scaffold containing exons 1, 2, 4, 5, and 6 of the SRC homology 2 domain containing E (SHE) gene is correctly assigned to chromosome 1 in the rhesus draft genome. However, the scaffold containing exon 3 of the SHE gene was incorrectly assigned to chromosome X. (B) Scaffold with exon in the wrong orientation: An unlocalized scaffold from the draft rhesus genome contains exons 9-13 of the Bardet-Biedl syndrome 1 (BBS1) gene. It was not included in the rhesus chromosome 14 file with the scaffold that contains exons 1-8 of BBS1. This is likely the contig containing exon 9 was in the wrong orientation with respect to the rest of the scaffold. (C) Sequencing error results in apparent nonsense mutation: The rhesus draft genomic DNA had sequencing error in the adrenergic, beta-1, receptor (ADBR1) gene. This introduced a premature stop codon (arrow, top panel). This has resulted in this locus being labeled a pseudogene by NCBI. Our targeted sequencing of this region has revealed the correct sequence (JN589014.1 - bottom panel). (D) Missing exon results in substitution of intronic sequence: The original rhesus draft genome did not contain the sequence for exon 15 for the adenylate cyclase 3 (ADCY3) gene. Instead, intronic sequence between exons 14 and exon 16 was substituted (top panel) when this gene was annotated. This led to spurious protein sequence (original protein) and a premature stop codon. The missing exon 15 was sequenced and deposited in GenBank (HM067826.1). NCBI then corrected the rhesus ADCY3 gene model and now reports a correct protein sequence for this gene (bottom panel). Figure 2 is redrawn from Zhang et al. 2012.
Figure 3:
Figure 3:
Flowchart describing assembly and annotation procedures The steps involved in creating a high-quality genome. Sequencing can include the conventional Sanger technique and/or several NextGen technologies including 454, Illumina, and Ion Torrent (see Table 1). Contig and scaffold assembly can utilize several assemblers including: Atlas (Havlak et al. 2004), AbySS (Simpson et al. 2009), ALLPATHS-LG (Gnerre et al. 2011), Celera assembler (Myers et al. 2000), MaSuRCA (http://www.genome.umd.edu/masurca.html) (accessed on July 19, 2013), and SOAPdenovo (Li et al. 2010). Chromosome mapping can use genetic information, radiation hybrids or fluorescence in situ hybridization (FISH). “Breaking” misassembled scaffolds and placing them on chromosomes can involve extensive manual work. Expressed sequence tags (ESTs) are usually partial transcripts obtained from Sanger sequencing. mRNA-seq is often performed with Illumina technology but can also be conducted with Ion Torrent machines.

References

    1. Auton A, Fledel-Alon A, Pfeifer S, Venn O, Ségurel L, Street T, Leffler EM, Bowden R, Aneas I, Broxholme J, Humburg P, Iqbal Z, Lunter G, Maller J, Hernandez RD, Melton C, Venkat A, Nobrega MA, Bontrop R, Myers S, Donnelly P, Przeworski M, McVean G. A fine-scale chimpanzee genetic map from population sequencing. Science. 2012;336:193–198. - PMC - PubMed
    1. Baroncelli S, Negri DR, Michelini Z, Cara A. Macaca mulatta, fascicularis, and nemestrina in AIDS vaccine development. Expert Rev Vaccines. 2008;7:1419–1434. - PubMed
    1. Bosinger SE, Sodora DL, Silvestri G. Generalized immune activation and innate immune responses in simian immunodeficiency virus infection. Curr Opin HIV AIDS. 2011;6:411–418. - PMC - PubMed
    1. Capozzi O, Carbone L, Stanyon RR, Marra A, Yang F, Whelan CW, de Jong PJ, Rocchi M, Archidiacono N. A comprehensive molecular cytogenetic analysis of chromosome rearrangements in gibbons. Genome Res. 2012;22:2520–2528. - PMC - PubMed
    1. Chahroudi A, Bosinger SE, Vanderford TH, Paiardini M, Silvestri G. Natural SIV hosts: Showing AIDS the door. Science. 2012;335:1188–1193. - PMC - PubMed

Publication types

MeSH terms