Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 4;46(D1):D221-D228.
doi: 10.1093/nar/gkx1031.

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation

Affiliations

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation

Shashikant Pujar et al. Nucleic Acids Res. .

Abstract

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Number of CCDS IDs and genes represented in the human (A) and mouse (B) CCDS releases. The X-axis indicates the year in which a CCDS dataset was made public. Details about CCDS releases are available on the CCDS Releases and Statistics web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=SHOW_STATISTICS).
Figure 2.
Figure 2.
Fraction of all genes in a CCDS release that are represented by at least two current CCDS IDs.
Figure 3.
Figure 3.
Changes in the human (A) and mouse (B) datasets with every new CCDS release. ‘New’ = new CCDS IDs added; ‘dropped’ = CCDS ID present in the previous release but withdrawn in the subsequent release; ‘updated’ = CCDS IDs that have an incremented accession version compared to the previous release, indicating a sequence update in the coding region.
Figure 4.
Figure 4.
A view of the graphical display accessed from the report page of CCDS3542.1 (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=ALLFIELDS&DATA=CCDS3542&ORGANISM=0&BUILDS=CURRENTBUILDS) using the purple ‘S’ icon. (A) Transcripts and proteins from NCBI Annotation Release 108. (B) Transcripts and proteins from Ensembl Release 85. The green bar indicates the gene; transcripts are shown in purple and proteins are shown in red color. Positioning the cursor over any of these objects (gene, transcript or protein) opens a tool tip which includes additional information and links. Proteins in the NCBI annotation display that are in the CCDS set include a link to the CCDS ID in the tool tip. The gray box to the right (indicated by vertical arrow) is the tool tip corresponding to the protein accession NP_002514.1. Differences between any two objects can also be revealed as vertical lines (indicated by horizontal arrows) when the objects (NM_002523.2 and ENST00000265634 in the figure) are selected using the ‘Control’ or ‘Command’ button on the keyboard.
Figure 5.
Figure 5.
Distribution of human and mouse CCDS IDs by their ‘Review status’ in the current human (Release 20) and mouse (Release 21) CCDS releases at the time of data freeze. Details of the review status categories and sub-categories are provided in Table 1. Reviewed 1 = CCDS IDs reviewed ‘by RefSeq and HAVANA’, Reviewed 2 = CCDS IDs reviewed ‘by CCDS collaboration’, Reviewed 3 = CCDS IDs reviewed ‘by RefSeq, HAVANA and CCDS collaboration’.

Similar articles

  • Current status and new features of the Consensus Coding Sequence database.
    Farrell CM, O'Leary NA, Harte RA, Loveland JE, Wilming LG, Wallin C, Diekhans M, Barrell D, Searle SM, Aken B, Hiatt SM, Frankish A, Suner MM, Rajput B, Steward CA, Brown GR, Bennett R, Murphy M, Wu W, Kay MP, Hart J, Rajan J, Weber J, Snow C, Riddick LD, Hunt T, Webb D, Thomas M, Tamez P, Rangwala SH, McGarvey KM, Pujar S, Shkeda A, Mudge JM, Gonzalez JM, Gilbert JG, Trevanion SJ, Baertsch R, Harrow JL, Hubbard T, Ostell JM, Haussler D, Pruitt KD. Farrell CM, et al. Nucleic Acids Res. 2014 Jan;42(Database issue):D865-72. doi: 10.1093/nar/gkt1059. Epub 2013 Nov 11. Nucleic Acids Res. 2014. PMID: 24217909 Free PMC article.
  • Tracking and coordinating an international curation effort for the CCDS Project.
    Harte RA, Farrell CM, Loveland JE, Suner MM, Wilming L, Aken B, Barrell D, Frankish A, Wallin C, Searle S, Diekhans M, Harrow J, Pruitt KD. Harte RA, et al. Database (Oxford). 2012 Mar 20;2012:bas008. doi: 10.1093/database/bas008. Print 2012. Database (Oxford). 2012. PMID: 22434842 Free PMC article.
  • The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.
    Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D. Pruitt KD, et al. Genome Res. 2009 Jul;19(7):1316-23. doi: 10.1101/gr.080531.108. Epub 2009 Jun 4. Genome Res. 2009. PMID: 19498102 Free PMC article.
  • NCBI Taxonomy: a comprehensive update on curation, resources and tools.
    Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, Mcveigh R, O'Neill K, Robbertse B, Sharma S, Soussov V, Sullivan JP, Sun L, Turner S, Karsch-Mizrachi I. Schoch CL, et al. Database (Oxford). 2020 Jan 1;2020:baaa062. doi: 10.1093/database/baaa062. Database (Oxford). 2020. PMID: 32761142 Free PMC article. Review.
  • Scaling national and international improvement in virtual gene panel curation via a collaborative approach to discordance resolution.
    Stark Z, Foulger RE, Williams E, Thompson BA, Patel C, Lunke S, Snow C, Leong IUS, Puzriakova A, Daugherty LC, Leigh S, Boustred C, Niblock O, Rueda-Martin A, Gerasimenko O, Savage K, Bellamy W, Lin VSK, Valls R, Gordon L, Brittain HK, Thomas ERA, Taylor Tavares AL, McEntagart M, White SM, Tan TY, Yeung A, Downie L, Macciocca I, Savva E, Lee C, Roesley A, De Fazio P, Deller J, Deans ZC, Hill SL, Caulfield MJ, North KN, Scott RH, Rendon A, Hofmann O, McDonagh EM. Stark Z, et al. Am J Hum Genet. 2021 Sep 2;108(9):1551-1557. doi: 10.1016/j.ajhg.2021.06.020. Epub 2021 Jul 29. Am J Hum Genet. 2021. PMID: 34329581 Free PMC article. Review.

Cited by

References

    1. O’Leary N.A., Wright M.W., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D. et al. . Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44:D733–D745. - PMC - PubMed
    1. Aken B.L., Ayling S., Barrell D., Clarke L., Curwen V., Fairley S., Fernandez Banet J., Billis K., Garcia Giron C., Hourlier T. et al. . The Ensembl gene annotation system. Database. 2016; 2016:1–19. - PMC - PubMed
    1. Pruitt K.D., Harrow J., Harte R.A., Wallin C., Diekhans M., Maglott D.R., Searle S., Farrell C.M., Loveland J.E., Ruef B.J. et al. . The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009; 19:1316–1323. - PMC - PubMed
    1. Harte R.A., Farrell C.M., Loveland J.E., Suner M.M., Wilming L., Aken B., Barrell D., Frankish A., Wallin C., Searle S. et al. . Tracking and coordinating an international curation effort for the CCDS Project. Database. 2012; 2012:bas008. - PMC - PubMed
    1. Farrell C.M., O’Leary N.A., Harte R.A., Loveland J.E., Wilming L.G., Wallin C., Diekhans M., Barrell D., Searle S.M., Aken B. et al. . Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res. 2014; 42:D865–D872. - PMC - PubMed

Publication types