Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Mar 20:2012:bas008.
doi: 10.1093/database/bas008. Print 2012.

Tracking and coordinating an international curation effort for the CCDS Project

Affiliations

Tracking and coordinating an international curation effort for the CCDS Project

Rachel A Harte et al. Database (Oxford). .

Abstract

The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a 'gold standard' definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. DATABASE URL: http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The flowchart outlines the CCDS review process (light gray boxes). CCDS IDs undergo status changes during and following the review process, as indicated by the colored boxes, where light green indicates ‘Public’ status, red indicates an ongoing review that has not yet reached consensus, orange indicates a pending update or withdrawal that has reached consensus, and purple indicates ‘Withdrawn’ status.
Figure 2.
Figure 2.
UCSC Genome Browser view of the human KLHL35 (kelch-like 35) gene. CCDS8237.1 was based on AK091109.1 (mRNA track, blue). This CCDS ID has now been withdrawn because a retained intron introduces a premature termination codon, rendering the transcript an NMD candidate. CCDS44685.2 representing the completely processed full-length variant remains valid for this gene.
Figure 3.
Figure 3.
UCSC Genome Browser view of CCDS4929.1, which was updated to version 2, representing a variant of the human CRISP3 (cysteine-rich secretory protein 3) gene. The CDS was extended at the 5′-end. (a) Both the longer protein (258 amino acids) encoded by the update and the shorter protein (245 amino acids) have predicted signal peptides (SignalPv4.0) of 32 amino acids and 19 amino acids, respectively. (b and c) Base-level view. The upstream AUG start codon (b) has the weaker Kozak context (blue box) and is only conserved among primates (red box), whereas the downstream AUG (c) is conserved among more mammals (46-way alignment and conservation track).

References

    1. Pruitt KD, Harrow J, Harte RA, et al. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009;19:1316–1323. - PMC - PubMed
    1. Flicek P, Amode MR, Barrell D, et al. Ensembl 2012. Nucleic Acids Res. 2012;40:D84–D90. - PMC - PubMed
    1. Wilming LG, Gilbert JGR, Howe K, et al. The vertebrate genome annotation (Vega) database. Nucleic Acids Res. 2008;36:D753–D760. - PMC - PubMed
    1. Harrow J, Denoeud F, Frankish A, et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7:S4. - PMC - PubMed
    1. Pruitt KD, Tatusova T, Brown GR, et al. NCBI reference sequences (RefSeq): current status, new features and genome annotation policy. Nucl. Acids Res. 2012;40:D130–D135. - PMC - PubMed

Publication types