Entrez Gene: gene-centered information at NCBI

Donna Maglott¹, Jim Ostell, Kim D Pruitt, Tatiana Tatusova

Affiliations

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA. maglott@ncbi.nlm.nih.gov

PMID: 17148475
PMCID: PMC1761442
DOI: 10.1093/nar/gkl993

Entrez Gene: gene-centered information at NCBI

Donna Maglott et al. Nucleic Acids Res. 2007 Jan.

. 2007 Jan;35(Database issue):D26-31.

doi: 10.1093/nar/gkl993. Epub 2006 Dec 5.

Authors

Donna Maglott¹, Jim Ostell, Kim D Pruitt, Tatiana Tatusova

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA. maglott@ncbi.nlm.nih.gov

PMID: 17148475
PMCID: PMC1761442
DOI: 10.1093/nar/gkl993

Abstract

Entrez Gene (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) is NCBI's database for gene-specific information. Entrez Gene includes records from genomes that have been completely sequenced, that have an active research community to contribute gene-specific information or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of both curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases and from other databases within NCBI. Records in Entrez Gene are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is provided via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programing utilities (E-Utilities), and for bulk transfer by ftp.

PubMed Disclaimer

Figures

**Figure 1**
Representative ‘Summary’ report of query results. Result of a query to retrieve information about partitioning-defective genes in mammals. This figure illustrates several points: (i) the display when limits is invoked to restrict result sets; (ii) spell checking; (iii) use of My NCBI to customize tabs to highlight subcategories of records in the result set; (iv) use of My NCBI to alter the display of the links menu. Limits:mammalia indicates that mammalia was selected from the page accessed via the limits tab to restrict results to genes in mammals. The term partitioning had no matches in the database; the ‘details’ page explains that only the term ‘defective’ was processed. Entrez identifies possible misspellings and suggests an alternate query (Did you mean:partitioning defective?). Of the 522 results that were returned, the tabs indicate that 448 are current (current only), 350 have genotype information available in dbSNP (Gene Genotype), 455 can be viewed in Map Viewer (Gene Map Viewer) and 386 have expression data in UniGene (Gene UniGene). Because, My NCBI environment replaces the default links menu with text, the databases connected to each record are displayed directly on the results page. The summary display includes the species of origin, preferred and alternate symbols, preferred and other descriptive names, chromosome localization, the GeneID and the MIM number when appropriate. Click on any symbol to link to the full report (Figure 2). The top black navigation bar and the blue side-bar at the left provide general links to other sites, including genome-specific resource guides (Genomic Biology), the FTP site, forms to submit feedback (Feedback) and forms to subscribe to mail lists (Mailing Lists).

**Figure 2**
(a) Representative Entrez gene full-report page, part 1. The full-report display. The standard gene-specific report page starts with summary information about the gene, a table of contents and a links menu. The summary section includes names and symbol aliases. If the gene has official names provided by a nomenclature authority, those names are reported as official symbol and official full name, with the named source anchoring a link. The database identifier provided by that source is displayed, anchoring a link to that source's specific record. The review status of all RefSeq RNAs for the gene is reported as RefSeq status. If the gene has been annotated on a RefSeq genomic sequence, a graphic is provided diagramming the intron/exon organization of the gene (genomic regions, transcripts and products) with the accessions for the genomic, mRNA and protein RefSeqs anchoring links to the sequence records in NCBI's Entrez system and, in the case of proteins, to BLink (1). If a RefSeq protein is a member of a CCDS group (2), the CCDS identifier to the right of the RefSeq protein accession anchors a link to the CCDS database. The genomic context section diagrams the placement of the gene and its neighbors. Each symbol anchors a link to another record in Entrez Gene. A link to NCBI's map viewer is in this section, identical to the one in the links menu. All citations in PubMed associated with a record in gene can be accessed by clicking on PubMed in the bibliography section or in the links menu. Navigation to PubMed for citations associated with specific information, such as GeneRIFs or gene ontology terms, is repeated explicitly with those elements (b). The links menu should be used to determine the types and sources of additional information that may be available about a gene. In this example, information about expression is available from GENSAT, GEO, UniGene and MGI; homology from HomoloGene, variation from SNP; cDNAs supporting genomic annotation from evidence viewer and ModelMaker, pathways from KEGG, etc. (b) Representative Entrez gene full report page, part 2. This portion of a full report display includes the sections of the record indicated in the table of contents (a) as bibliography, alleles, general gene information and general protein information. In the GeneRIFs section, the icons to the right of the text anchor a link to the PubMed that supports the GeneRIF. The data in the alleles and gene ontology sections were imported from MGI, as indicated in the links anchored by MGI. Alternate names for the gene, and the protein it encodes, are listed under general protein information/names. If this gene encoded an enzyme, the E.C. designation would be in this section as well. (c) Representative Entrez gene full report page, part 3. This portion of a full report display includes the sections of the record indicated in the table of contents (a) as reference sequences, related sequences and additional links. The reference sequences section is subdivided into subsections based on the type of RefSeq being reported. The first section (RefSeqs maintained independently of annotated genomes) reports the RefSeq genomic, RNA and protein accessions that can be updated at any time, and thus may differ in version or number from what was included in a genomic annotation (2). The sequences reported under RefSeqs of annotated genomes are the genomic RefSeqs for the chromosomes and contigs of reference and alternate assemblies. Each of these RefSeq sections, and the related sequences sections below, anchors links to records in NCBI's Entrez system, where standard tools are provided to process the sequence (e.g. altering the range, displaying annotated SNPs or downloading in multiple formats). The related sequences section lists the accessions and strains of public sequences of this gene or its encoded protein. The items in the additional links section are included in the Links menu (a), but are selected to be repeated here to enhance access, for example to display UniGene cluster number.

See this image and copyright information in PMC

Update of

Entrez Gene: gene-centered information at NCBI.
Maglott D, Ostell J, Pruitt KD, Tatusova T. Maglott D, et al. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D54-8. doi: 10.1093/nar/gki031. Nucleic Acids Res. 2005. Update in: Nucleic Acids Res. 2007 Jan;35(Database issue):D26-31. doi: 10.1093/nar/gkl993. PMID: 15608257 Free PMC article. Updated.

References

1. Wheeler D.L., Barrett T., Benson D.A., Bryant S.H., Canese K., Chetvernin V., Church D.M., DiCuccio M., Edgar R., Federhen S., et al. Database resources of the National Center for Biotechnology Information. Nucleic Acid Res. 2007 (Submitted) - PMC - PubMed
1. Pruitt K.D., Tatusova T., Maglott D. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acid Res. 2007 (Submitted) - PMC - PubMed
1. Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J., Wheeler D.L. GenBank. Nucleic Acid Res. 2007 (Submitted) - PMC - PubMed
1. Strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G., Klausner R.D., Collins F.S., Wagner L., Shenmen C.M., Schuler G.D., Altschul S.F., et al. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc. Natl Acad. Sci. USA. 2002;99:16899–16903. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Entrez Gene: gene-centered information at NCBI

Affiliation

Entrez Gene: gene-centered information at NCBI

Authors

Affiliation

Abstract

Figures

Update of

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous