Targeted discovery of novel human exons by comparative genomics
- PMID: 17989246
- PMCID: PMC2099585
- DOI: 10.1101/gr.7128207
Targeted discovery of novel human exons by comparative genomics
Abstract
A complete and accurate set of human protein-coding gene annotations is perhaps the single most important resource for genomic research after the human-genome sequence itself, yet the major gene catalogs remain incomplete and imperfect. Here we describe a genome-wide effort, carried out as part of the Mammalian Gene Collection (MGC) project, to identify human genes not yet in the gene catalogs. Our approach was to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT-PCR. We have identified 734 novel gene fragments (NGFs) containing 2188 exons with, at most, weak prior cDNA support. These NGFs correspond to an estimated 563 distinct genes, of which >160 are completely absent from the major gene catalogs, while hundreds of others represent significant extensions of known genes. The NGFs appear to be predominantly protein-coding genes rather than noncoding RNAs, unlike novel transcribed sequences identified by technologies such as tiling arrays and CAGE. They tend to be expressed at low levels and in a tissue-specific manner, and they are enriched for roles in motor activity, cell adhesion, connective tissue, and central nervous system development. Our results demonstrate that many important genes and gene fragments have been missed by traditional approaches to gene discovery but can be identified by their evolutionary signatures using comparative sequence data. However, they suggest that hundreds-not thousands-of protein-coding genes are completely missing from the current gene catalogs.
Figures











Similar articles
-
Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes.Nucleic Acids Res. 2005 Apr 4;33(6):1935-9. doi: 10.1093/nar/gki328. Print 2005. Nucleic Acids Res. 2005. PMID: 15809229 Free PMC article.
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Conserved introns reveal novel transcripts in Drosophila melanogaster.Genome Res. 2009 Jul;19(7):1289-300. doi: 10.1101/gr.090050.108. Epub 2009 May 20. Genome Res. 2009. PMID: 19458021 Free PMC article.
-
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review.
-
The Protein-Coding Human Genome: Annotating High-Hanging Fruits.Bioessays. 2019 Nov;41(11):e1900066. doi: 10.1002/bies.201900066. Epub 2019 Sep 23. Bioessays. 2019. PMID: 31544971 Review.
Cited by
-
Computational analysis of whole-genome differential allelic expression data in human.PLoS Comput Biol. 2010 Jul 8;6(7):e1000849. doi: 10.1371/journal.pcbi.1000849. PLoS Comput Biol. 2010. PMID: 20628616 Free PMC article.
-
Comparative assessment of methods for aligning multiple genome sequences.Nat Biotechnol. 2010 Jun;28(6):567-72. doi: 10.1038/nbt.1637. Epub 2010 May 23. Nat Biotechnol. 2010. PMID: 20495551
-
The evolution of epitype.Plant Cell. 2010 Jun;22(6):1658-66. doi: 10.1105/tpc.110.075481. Epub 2010 Jun 15. Plant Cell. 2010. PMID: 20551346 Free PMC article.
-
PHAST and RPHAST: phylogenetic analysis with space/time models.Brief Bioinform. 2011 Jan;12(1):41-51. doi: 10.1093/bib/bbq072. Epub 2010 Dec 21. Brief Bioinform. 2011. PMID: 21278375 Free PMC article.
-
Between a chicken and a grape: estimating the number of human genes.Genome Biol. 2010;11(5):206. doi: 10.1186/gb-2010-11-5-206. Epub 2010 May 5. Genome Biol. 2010. PMID: 20441615 Free PMC article. Review.
References
-
- Adams M.D., Kerlavage A.R., Fields C., Venter J.C., Kerlavage A.R., Fields C., Venter J.C., Fields C., Venter J.C., Venter J.C. 3,400 new expressed sequence tags identify diversity of transcripts in human brain. Nat. Genet. 1993a;4:256–267. - PubMed
-
- Adams M.D., Soares M.B., Kerlavage A.R., Fields C., Venter J.C., Soares M.B., Kerlavage A.R., Fields C., Venter J.C., Kerlavage A.R., Fields C., Venter J.C., Fields C., Venter J.C., Venter J.C. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat. Genet. 1993b;4:373–380. - PubMed
-
- Adams M.D., Kerlavage A.R., Fleischmann R.D., Fuldner R.A., Bult C.J., Lee N.H., Kirkness E.F., Weinstock K.G., Gocayne J.D., White O., Kerlavage A.R., Fleischmann R.D., Fuldner R.A., Bult C.J., Lee N.H., Kirkness E.F., Weinstock K.G., Gocayne J.D., White O., Fleischmann R.D., Fuldner R.A., Bult C.J., Lee N.H., Kirkness E.F., Weinstock K.G., Gocayne J.D., White O., Fuldner R.A., Bult C.J., Lee N.H., Kirkness E.F., Weinstock K.G., Gocayne J.D., White O., Bult C.J., Lee N.H., Kirkness E.F., Weinstock K.G., Gocayne J.D., White O., Lee N.H., Kirkness E.F., Weinstock K.G., Gocayne J.D., White O., Kirkness E.F., Weinstock K.G., Gocayne J.D., White O., Weinstock K.G., Gocayne J.D., White O., Gocayne J.D., White O., White O., et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature. 1995;377 (Suppl):3–174. - PubMed
-
- Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Dolinski K., Dwight S.S., Eppig J.T., Dwight S.S., Eppig J.T., Eppig J.T., et al. Gene Ontology: Tool for the unification of biology. Nat. Genet. 2000;25:25–29. - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases