Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs

Affiliations

PMID: 16683036
PMCID: PMC1449903
DOI: 10.1371/journal.pgen.0020062

Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs

Norihiro Maeda et al. PLoS Genet. 2006 Apr.

. 2006 Apr;2(4):e62.

doi: 10.1371/journal.pgen.0020062.

Affiliation

¹ Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, Wako, Japan. rgscerg@gsc.riken.jp

PMID: 16683036
PMCID: PMC1449903
DOI: 10.1371/journal.pgen.0020062

Abstract

The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

**Figure 1. Annotation Pipelines for Transcript Description and for GO Terms**
(A) Pipeline for transcript description. Query sequences falling into categories (black boxes) 1–3 were assigned the description of the matched target sequence DNA entry in MGI symbols, and synonyms were also transferred to our annotation database. Queries falling into categories 4–10 were assigned a transcript description corresponding to the matched protein name. For query sequences falling into category 5 or 6, the keyword “homolog” was appended to the matching protein name. Sequences assigned to category 7 or 8 were denoted with the prefix “similar to” attached to the target sequence name. The prefix “weakly similar” was used to identify sequences assigned to category 9 or 10. For all sequences in categories 5–10, the name of the organism corresponding to the matched protein was appended to the assigned transcript description. If a query was assigned to category 14, its transcript description was “hypothetical [InterPro domain name] containing protein.” Query sequences assigned to category 17 and 19 were annotated as “hypothetical protein” and “unclassifiable,” respectively. Query sequences grouped into category N1 or N2 were assigned the description of the matched target ncRNA entry. For query sequences falling into category N2, the keyword “homolog of” was appended to the matching ncRNA name. (B) Pipeline for GO terms. DB, database.

See this image and copyright information in PMC

References

1. Kawai J, Shinagawa A, Shibata K, Yoshino M, Itoh M, et al. Functional annotation of a full-length mouse cDNA collection. Nature. 2001;409:685–690. - PubMed
1. Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002;420:563–573. - PubMed
1. Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, et al. The transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. - PubMed
1. Kasukawa T, Furuno M, Nikaido I, Bono H, Hume DA, et al. Development and evaluation of an automated annotation pipeline and cDNA annotation system. Genome Res. 2003;13:1542–1551. - PMC - PubMed
1. Furuno M, Kasukawa T, Saito R, Adachi J, Suzuki H, et al. CDS annotation in full-length cDNA sequence. Genome Res. 2003;13:1478–1487. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs

Affiliation

Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources