Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct;26(9-10):379-90.
doi: 10.1007/s00335-015-9585-8. Epub 2015 Jul 28.

Mouse genome annotation by the RefSeq project

Affiliations

Mouse genome annotation by the RefSeq project

Kelly M McGarvey et al. Mamm Genome. 2015 Oct.

Abstract

Complete and accurate annotation of the mouse genome is critical to the advancement of research conducted on this important model organism. The National Center for Biotechnology Information (NCBI) develops and maintains many useful resources to assist the mouse research community. In particular, the reference sequence (RefSeq) database provides high-quality annotation of multiple mouse genome assemblies using a combinatorial approach that leverages computation, manual curation, and collaboration. Implementation of this conservative and rigorous approach, which focuses on representation of only full-length and non-redundant data, produces high-quality annotation products. RefSeq records explicitly link sequences to current knowledge in a timely manner, updating public records regularly and rapidly in response to nomenclature updates, addition of new relevant publications, collaborator discussion, and user feedback. Whole genome re-annotation is also conducted at least every 12-18 months, and often more frequently in response to assembly updates or availability of informative data. This article highlights key features and advantages of RefSeq genome annotation products and presents an overview of NCBI processes to generate these data. Further discussion of NCBI's resources highlights useful features and the best methods for accessing our data.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Overview of NCBI’s eukaryotic annotation pipeline from http://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/#process. Briefly, genomic sequences are repeat masked (gray), and transcripts (blue), proteins (green), RNA-seq reads (orange), and curated RefSeq sequences (pink) are aligned to the genome. Based on these alignments, gene model predictions are calculated (brown), best models are selected, named and accessioned (purple), and finally annotation products are released publicly (yellow). During re-annotations, models and genes are given special attention and are tracked from one annotation release to the next
Fig. 2
Fig. 2
Examples of loci benefitting from manual curation. a The first 397 nucleotides of NM_144531.3 and NM_001109684.1 are missing from the GRCm38 reference genome assembly. The 5′ portion of the chromosome 4 gene Kazn (GeneID: 71529) was screen captured from NCBI's sequence viewer in the Gene resource and labels were edited (http://www.ncbi.nlm.nih.gov/gene/?term=71529#genomic-regions-transcripts-products). The partial alignment of the 5′ end of these RefSeq records is indicated by the double black arrows, and by the qualifier statement which is revealed upon hovering the mouse over the RefSeq transcript graphic. b Supporting evidence is reported on the NM_144531.3 record (http://www.ncbi.nlm.nih.gov/nuccore/NM_144531.3). The comments section shows that the full exon combination represented by NM_144531.3 is supported by the messenger RNA transcript, AK173090.1. This type of support evidence is associated with the ECO ID:0000332. The set of ECO IDs reported has been previously described (Pruitt et al. 2014). c The Apela gene on chromosome 8 (GeneID: 100038489) was defined as a non-coding locus in Mus musculus Annotation Release 104 (represented by NR_040692.1), but manual curation resulted in an update of the locus type to protein-coding in Annotation Release 105 (represented by NM_001297554.1/NP_001284483.1). The graphical display of RefSeq genome annotation that is shown in a, c was screen captured from NCBI's sequence viewer in the Gene resource and labels were edited (http://www.ncbi.nlm.nih.gov/gene/?term=100038489#genomic-regions-transcripts-products)
Fig. 3
Fig. 3
Graphical display of mouse RefSeq transcripts using NCBI’s Gene Resource. a Genomic context. Coordinates on multiple mouse genome assemblies and a graphical display of the location and orientation of genes neighboring the Fst (GeneID: 14313) locus are shown here. b Genomic regions, transcripts, and products. Tracks displayed with the default settings are indicated with red arrows. The configure button (red circle) may be used to customize tracks. c Zoom and pan features allow easy identification of differences between transcript variants. Quantitative RNA-seq intron features data are displayed in this view. d Shown here is a subset of the links and related information displayed in the sidebar of each Gene record
Fig. 4
Fig. 4
The UCSC Genome Browser does not accurately represent RefSeq data. a NCBI Sequence Viewer. Coordinates on mouse chromosome 17 (NC_000083.6 from 25,957,500 to 25,988,800) and a graphical display the neighboring loci, 1700022N22Rik (GeneID: 69431) and Capn15 (GeneID: 50817), were screen captured from NCBI sequence viewer in the Gene resource and labels were edited. b UCSC Genome Browser. Coordinates on mouse chromosome 17 (NC_000083.6) and the RefSeq Genes track were screen captured from the UCSC Genome Browser and labels were edited. No RefSeq models are displayed in the RefSeq Genes track

Similar articles

Cited by

References

    1. Brown GR, Hem V, Katz KS, Ovetsky M, Wallin C, Ermolaeva O, Tolstoy I, Tatusova T, et al. Gene: a gene-centered information resource at NCBI. Nucleic Acids Res. 2015;43:D36–D42. doi: 10.1093/nar/gku1055. - DOI - PMC - PubMed
    1. Chibucos MC, Mungall CJ, Balakrishnan R, Christie KR, Huntley RP, White O, Blake JA, Lewis SE, Giglio M. Standardized description of scientific evidence using the Evidence Ontology (ECO) Database J Biol Databases Curation. 2014;2014:bau075. - PMC - PubMed
    1. Chng Serene C, Ho L, Tian J, Reversade B. ELABELA: a hormone essential for heart development signals via the apelin receptor. Dev Cell. 2013;27:672–680. doi: 10.1016/j.devcel.2013.11.002. - DOI - PubMed
    1. Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, Chen H-C, Agarwala R, et al. Modernizing reference genome assemblies. PLoS Biol. 2011;9:e1001091. doi: 10.1371/journal.pbio.1001091. - DOI - PMC - PubMed
    1. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, et al. Landscape of transcription in human cells. Nature. 2012;489:101–108. doi: 10.1038/nature11233. - DOI - PMC - PubMed

Publication types

LinkOut - more resources