Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation
- PMID: 29126148
- PMCID: PMC5753299
- DOI: 10.1093/nar/gkx1031
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation
Abstract
The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.
Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
Figures





Similar articles
-
Current status and new features of the Consensus Coding Sequence database.Nucleic Acids Res. 2014 Jan;42(Database issue):D865-72. doi: 10.1093/nar/gkt1059. Epub 2013 Nov 11. Nucleic Acids Res. 2014. PMID: 24217909 Free PMC article.
-
Tracking and coordinating an international curation effort for the CCDS Project.Database (Oxford). 2012 Mar 20;2012:bas008. doi: 10.1093/database/bas008. Print 2012. Database (Oxford). 2012. PMID: 22434842 Free PMC article.
-
The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.Genome Res. 2009 Jul;19(7):1316-23. doi: 10.1101/gr.080531.108. Epub 2009 Jun 4. Genome Res. 2009. PMID: 19498102 Free PMC article.
-
NCBI Taxonomy: a comprehensive update on curation, resources and tools.Database (Oxford). 2020 Jan 1;2020:baaa062. doi: 10.1093/database/baaa062. Database (Oxford). 2020. PMID: 32761142 Free PMC article. Review.
-
Scaling national and international improvement in virtual gene panel curation via a collaborative approach to discordance resolution.Am J Hum Genet. 2021 Sep 2;108(9):1551-1557. doi: 10.1016/j.ajhg.2021.06.020. Epub 2021 Jul 29. Am J Hum Genet. 2021. PMID: 34329581 Free PMC article. Review.
Cited by
-
Using Mechanistic Models and Machine Learning to Design Single-Color Multiplexed Nascent Chain Tracking Experiments.bioRxiv [Preprint]. 2023 Jan 26:2023.01.25.525583. doi: 10.1101/2023.01.25.525583. bioRxiv. 2023. Update in: Front Cell Dev Biol. 2023 May 30;11:1151318. doi: 10.3389/fcell.2023.1151318. PMID: 36747627 Free PMC article. Updated. Preprint.
-
Regulatory and coding sequences of TRNP1 co-evolve with brain size and cortical folding in mammals.Elife. 2023 Mar 22;12:e83593. doi: 10.7554/eLife.83593. Elife. 2023. PMID: 36947129 Free PMC article.
-
Application of single-cell RNA sequencing on human testicular samples: a comprehensive review.Int J Biol Sci. 2023 Apr 9;19(7):2167-2197. doi: 10.7150/ijbs.82191. eCollection 2023. Int J Biol Sci. 2023. PMID: 37151874 Free PMC article. Review.
-
Improved analysis of CRISPR fitness screens and reduced off-target effects with the BAGEL2 gene essentiality classifier.Genome Med. 2021 Jan 6;13(1):2. doi: 10.1186/s13073-020-00809-3. Genome Med. 2021. PMID: 33407829 Free PMC article.
-
Rare variant contribution to human disease in 281,104 UK Biobank exomes.Nature. 2021 Sep;597(7877):527-532. doi: 10.1038/s41586-021-03855-y. Epub 2021 Aug 10. Nature. 2021. PMID: 34375979 Free PMC article.
References
-
- O’Leary N.A., Wright M.W., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D. et al. . Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44:D733–D745. - PMC - PubMed
-
- Pruitt K.D., Harrow J., Harte R.A., Wallin C., Diekhans M., Maglott D.R., Searle S., Farrell C.M., Loveland J.E., Ruef B.J. et al. . The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009; 19:1316–1323. - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources