Tracking and coordinating an international curation effort for the CCDS Project

Rachel A Harte¹, Catherine M Farrell, Jane E Loveland, Marie-Marthe Suner, Laurens Wilming, Bronwen Aken, Daniel Barrell, Adam Frankish, Craig Wallin, Steve Searle, Mark Diekhans, Jennifer Harrow, Kim D Pruitt

Affiliations

PMID: 22434842
PMCID: PMC3308164
DOI: 10.1093/database/bas008

Tracking and coordinating an international curation effort for the CCDS Project

Rachel A Harte et al. Database (Oxford). 2012.

. 2012 Mar 20:2012:bas008.

doi: 10.1093/database/bas008. Print 2012.

Authors

Affiliation

¹ Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA.

PMID: 22434842
PMCID: PMC3308164
DOI: 10.1093/database/bas008

Abstract

The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a 'gold standard' definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. DATABASE URL: http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi.

PubMed Disclaimer

Figures

**Figure 1.**
The flowchart outlines the CCDS review process (light gray boxes). CCDS IDs undergo status changes during and following the review process, as indicated by the colored boxes, where light green indicates ‘Public’ status, red indicates an ongoing review that has not yet reached consensus, orange indicates a pending update or withdrawal that has reached consensus, and purple indicates ‘Withdrawn’ status.

**Figure 2.**
UCSC Genome Browser view of the human *KLHL35* (kelch-like 35) gene. CCDS8237.1 was based on AK091109.1 (mRNA track, blue). This CCDS ID has now been withdrawn because a retained intron introduces a premature termination codon, rendering the transcript an NMD candidate. CCDS44685.2 representing the completely processed full-length variant remains valid for this gene.

**Figure 3.**
UCSC Genome Browser view of CCDS4929.1, which was updated to version 2, representing a variant of the human *CRISP3* (cysteine-rich secretory protein 3) gene. The CDS was extended at the 5′-end. (a) Both the longer protein (258 amino acids) encoded by the update and the shorter protein (245 amino acids) have predicted signal peptides (SignalPv4.0) of 32 amino acids and 19 amino acids, respectively. (b and c) Base-level view. The upstream AUG start codon (b) has the weaker Kozak context (blue box) and is only conserved among primates (red box), whereas the downstream AUG (c) is conserved among more mammals (46-way alignment and conservation track).

See this image and copyright information in PMC

References

1. Pruitt KD, Harrow J, Harte RA, et al. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009;19:1316–1323. - PMC - PubMed
1. Flicek P, Amode MR, Barrell D, et al. Ensembl 2012. Nucleic Acids Res. 2012;40:D84–D90. - PMC - PubMed
1. Wilming LG, Gilbert JGR, Howe K, et al. The vertebrate genome annotation (Vega) database. Nucleic Acids Res. 2008;36:D753–D760. - PMC - PubMed
1. Harrow J, Denoeud F, Frankish A, et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7:S4. - PMC - PubMed
1. Pruitt KD, Tatusova T, Brown GR, et al. NCBI reference sequences (RefSeq): current status, new features and genome annotation policy. Nucl. Acids Res. 2012;40:D130–D135. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Tracking and coordinating an international curation effort for the CCDS Project

Affiliation

Tracking and coordinating an international curation effort for the CCDS Project

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources