RefSeq: an update on mammalian reference sequences

Affiliations

PMID: 24259432
PMCID: PMC3965018
DOI: 10.1093/nar/gkt1114

RefSeq: an update on mammalian reference sequences

Kim D Pruitt et al. Nucleic Acids Res. 2014 Jan.

. 2014 Jan;42(Database issue):D756-63.

doi: 10.1093/nar/gkt1114. Epub 2013 Nov 19.

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.

PMID: 24259432
PMCID: PMC3965018
DOI: 10.1093/nar/gkt1114

Abstract

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.

PubMed Disclaimer

Figures

**Figure 1.**
Number of vertebrate and other eukaryotic genome annotations released by NCBI per year since 2001. Additional information about recently annotated genomes is available at http://www.ncbi.nlm.nih.gov/genome/annotation_euk/status/#recent.

**Figure 2.**
Both known and model RefSeq records may be associated with the same locus. A portion of the ‘Genomic regions, transcripts, and products’ section of the Gene record for human *GSTZ1* (NCBI GeneID 2954) is shown. Chromosome 14 coordinates corresponding to annotation of assembly GRCh13.p13 (NC_000014.8), NCBI annotation release 105 are shown at the top. The gene is associated with three known RefSeq transcripts (e.g. NM_145870.2, NM_145871.2 and NM_001513.3) and three model transcripts (e.g. XM_005267557.1, XM_005267558.1 and XM_005267559.1). The first exon of the overlapping *POMT2* gene is also visible in this display. Supplementing curated RefSeqs (NM, NR, or NP prefixes) with model RefSeqs (XM, XR and XP accessions) enables better representation of alternative splice variants and exons.

**Figure 3.**
Structured comments provide information on supporting evidence and biological attributes. A portion of the COMMENT section of the NM_006440.4 record is displayed, illustrating the two structured comments. (A) The Evidence Data comment reports supporting evidence for the exon combination represented in the record. (B) The RefSeq Attributes comment reports biological attributes. Each comment type includes the attribute category on the left and supporting evidence on the right. Structured comments include special formatting and are bracketed by START and END to support parsing.

See this image and copyright information in PMC

References

1. Pruitt KD, Katz KS, Sicotte H, Maglott DR. Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet. 2000;16:44–47. - PubMed
1. Nakamura Y, Cochrane G, Karsch-Mizrachi I. The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2013;41:D21–D24. - PMC - PubMed
1. Robinson J, Halliwell JA, McWilliam H, Lopez R, Parham P, Marsh SG. The IMGT/HLA database. Nucleic Acids Res. 2013;41:D1222–D1227. - PMC - PubMed
1. Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39:D152–D157. - PMC - PubMed
1. Karro JE, Yan Y, Zheng D, Zhang Z, Carriero N, Cayting P, Harrrison P, Gerstein M. Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Res. 2007;35:D55–D60. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

RefSeq: an update on mammalian reference sequences

Affiliation

RefSeq: an update on mammalian reference sequences

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources