Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Dec 30:364:99-107.
doi: 10.1016/j.gene.2005.05.036. Epub 2005 Sep 26.

Investigation of protein functions through data-mining on integrated human transcriptome database, H-Invitational database (H-InvDB)

Affiliations

Investigation of protein functions through data-mining on integrated human transcriptome database, H-Invitational database (H-InvDB)

Chisato Yamasaki et al. Gene. .

Abstract

H-Invitational Database (H-InvDB; ) is a human transcriptome database, containing integrative annotation of 41,118 full-length cDNA clones originated from 21,037 loci. H-InvDB is a product of the H-Invitational project, an international collaboration to systematically and functionally validate human genes by analysis of a unique set of high quality full-length cDNA clones using automatic annotation and human curation under unified criteria. Here, 19,574 proteins encoded by these cDNAs were classified into 11,709 function-known and 7865 function-unknown hypothetical proteins by similarity with protein databases and motif prediction (InterProScan). The proportion of "hypothetical proteins" in H-InvDB was as high as 40.4%. In this study, we thus conducted data-mining in H-InvDB with the aim of assigning advanced functional annotations to those hypothetical proteins. First, by data-mining in the H-InvDB version of GTOP, we identified 337 SCOP domains within 7865 H-Inv hypothetical proteins. Second, by data-mining of predicted subcellular localization by SOSUI and TMHMM in H-InvDB, we found 1032 transmembrane proteins within H-Inv hypothetical proteins. These results clearly demonstrate that structural prediction is effective for functional annotation of proteins with unknown functions. All the data in H-InvDB are shown in two main views, the cDNA view and the Locus view, and five auxiliary databases with web-based viewers; DiseaseInfo Viewer, H-ANGEL, Clustering Viewer, G-integra and TOPO Viewer; the data also are provided as flat files and XML files. The data consists of descriptions of their gene structures, novel alternative splicing isoforms, functional RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs in relation with orphan diseases, gene expression profiling, and comparisons with mouse full-length cDNAs in the context of molecular evolution. This unique integrative platform for conducting in silico data-mining represents a substantial contribution to resources required for the exploration of human biology and pathology.

PubMed Disclaimer

Similar articles

  • The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.
    Genome Information Integration Project And H-Invitational 2; Yamasaki C, Murakami K, Fujii Y, Sato Y, Harada E, Takeda J, Taniya T, Sakate R, Kikugawa S, Shimada M, Tanino M, Koyanagi KO, Barrero RA, Gough C, Chun HW, Habara T, Hanaoka H, Hayakawa Y, Hilton PB, Kaneko Y, Kanno M, Kawahara Y, Kawamura T, Matsuya A, Nagata N, Nishikata K, Noda AO, Nurimoto S, Saichi N, Sakai H, Sanbonmatsu R, Shiba R, Suzuki M, Takabayashi K, Takahashi A, Tamura T, Tanaka M, Tanaka S, Todokoro F, Yamaguchi K, Yamamoto N, Okido T, Mashima J, Hashizume A, Jin L, Lee KB, Lin YC, Nozaki A, Sakai K, Tada M, Miyazaki S, Makino T, Ohyanagi H, Osato N, Tanaka N, Suzuki Y, Ikeo K, Saitou N, Sugawara H, O'Donovan C, Kulikova T, Whitfield E, Halligan B, Shimoyama M, Twigger S, Yura K, Kimura K, Yasuda T, Nishikawa T, Akiyama Y, Motono C, Mukai Y, Nagasaki H, Suwa M, Horton P, Kikuno R, Ohara O, Lancet D, Eveno E, Graudens E, Imbeaud S, Debily MA, Hayashizaki Y, Amid C, Han M, Osanger A, Endo T, Thomas MA, Hirakawa M, Makalowski W, Nakao M, Kim NS, Yoo HS, De Souza SJ, Bonaldo Mde F, Niimura Y, Kuryshev V, Schupp I, Wiemann S, Bellgard M, Shionyu M, Jia L, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Zhang Q, Go M… See abstract for full author list ➔ Genome Information Integration Project And H-Invitational 2, et al. Nucleic Acids Res. 2008 Jan;36(Database issue):D793-9. doi: 10.1093/nar/gkm999. Epub 2007 Dec 18. Nucleic Acids Res. 2008. PMID: 18089548 Free PMC article.
  • Integrative annotation of 21,037 human genes validated by full-length cDNA clones.
    Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero RA, Tamura T, Yamaguchi-Kabata Y, Tanino M, Yura K, Miyazaki S, Ikeo K, Homma K, Kasprzyk A, Nishikawa T, Hirakawa M, Thierry-Mieg J, Thierry-Mieg D, Ashurst J, Jia L, Nakao M, Thomas MA, Mulder N, Karavidopoulou Y, Jin L, Kim S, Yasuda T, Lenhard B, Eveno E, Suzuki Y, Yamasaki C, Takeda J, Gough C, Hilton P, Fujii Y, Sakai H, Tanaka S, Amid C, Bellgard M, Bonaldo Mde F, Bono H, Bromberg SK, Brookes AJ, Bruford E, Carninci P, Chelala C, Couillault C, de Souza SJ, Debily MA, Devignes MD, Dubchak I, Endo T, Estreicher A, Eyras E, Fukami-Kobayashi K, Gopinath GR, Graudens E, Hahn Y, Han M, Han ZG, Hanada K, Hanaoka H, Harada E, Hashimoto K, Hinz U, Hirai M, Hishiki T, Hopkinson I, Imbeaud S, Inoko H, Kanapin A, Kaneko Y, Kasukawa T, Kelso J, Kersey P, Kikuno R, Kimura K, Korn B, Kuryshev V, Makalowska I, Makino T, Mano S, Mariage-Samson R, Mashima J, Matsuda H, Mewes HW, Minoshima S, Nagai K, Nagasaki H, Nagata N, Nigam R, Ogasawara O, Ohara O, Ohtsubo M, Okada N, Okido T, Oota S, Ota M, Ota T, Otsuki T, Piatier-Tonneau D, Poustka A, Ren SX, Saitou N, Sakai K, Sakamoto S, Sakate R, Schupp I, Servant F, Sherry … See abstract for full author list ➔ Imanishi T, et al. PLoS Biol. 2004 Jun;2(6):e162. doi: 10.1371/journal.pbio.0020162. Epub 2004 Apr 20. PLoS Biol. 2004. PMID: 15103394 Free PMC article.
  • Full-length transcriptome-based H-InvDB throws a new light on chromosome-centric proteomics.
    Imanishi T, Nagai Y, Habara T, Yamasaki C, Takeda J, Mikami S, Bando Y, Tojo H, Nishimura T. Imanishi T, et al. J Proteome Res. 2013 Jan 4;12(1):62-6. doi: 10.1021/pr300861a. Epub 2012 Dec 17. J Proteome Res. 2013. PMID: 23245335
  • The NEIBank project for ocular genomics: data-mining gene expression in human and rodent eye tissues.
    Wistow G. Wistow G. Prog Retin Eye Res. 2006 Jan;25(1):43-77. doi: 10.1016/j.preteyeres.2005.05.003. Epub 2005 Jul 7. Prog Retin Eye Res. 2006. PMID: 16005676 Review.
  • Searching for hypothetical proteins: theory and practice based upon original data and literature.
    Lubec G, Afjehi-Sadat L, Yang JW, John JP. Lubec G, et al. Prog Neurobiol. 2005 Sep-Oct;77(1-2):90-127. doi: 10.1016/j.pneurobio.2005.10.001. Epub 2005 Nov 4. Prog Neurobiol. 2005. PMID: 16271823 Review.

Cited by

  • Bioinformatics tools and novel challenges in long non-coding RNAs (lncRNAs) functional analysis.
    Da Sacco L, Baldassarre A, Masotti A. Da Sacco L, et al. Int J Mol Sci. 2012;13(1):97-114. doi: 10.3390/ijms13010097. Epub 2011 Dec 23. Int J Mol Sci. 2012. PMID: 22312241 Free PMC article. Review.
  • Generalist Genes: Genetic Links Between Brain, Mind, and Education.
    Plomin R, Kovas Y, Haworth CM. Plomin R, et al. Mind Brain Educ. 2007 Mar;1(1):11-19. doi: 10.1111/j.1751-228X.2007.00002.x. Mind Brain Educ. 2007. PMID: 20383259 Free PMC article.
  • Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees.
    Matsuya A, Sakate R, Kawahara Y, Koyanagi KO, Sato Y, Fujii Y, Yamasaki C, Habara T, Nakaoka H, Todokoro F, Yamaguchi K, Endo T, Oota S, Makalowski W, Ikeo K, Suzuki Y, Hanada K, Hashimoto K, Hirai M, Iwama H, Saitou N, Hiraki AT, Jin L, Kaneko Y, Kanno M, Murakami K, Noda AO, Saichi N, Sanbonmatsu R, Suzuki M, Takeda J, Tanaka M, Gojobori T, Imanishi T, Itoh T. Matsuya A, et al. Nucleic Acids Res. 2008 Jan;36(Database issue):D787-92. doi: 10.1093/nar/gkm878. Epub 2007 Nov 3. Nucleic Acids Res. 2008. PMID: 17982176 Free PMC article.
  • The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.
    Genome Information Integration Project And H-Invitational 2; Yamasaki C, Murakami K, Fujii Y, Sato Y, Harada E, Takeda J, Taniya T, Sakate R, Kikugawa S, Shimada M, Tanino M, Koyanagi KO, Barrero RA, Gough C, Chun HW, Habara T, Hanaoka H, Hayakawa Y, Hilton PB, Kaneko Y, Kanno M, Kawahara Y, Kawamura T, Matsuya A, Nagata N, Nishikata K, Noda AO, Nurimoto S, Saichi N, Sakai H, Sanbonmatsu R, Shiba R, Suzuki M, Takabayashi K, Takahashi A, Tamura T, Tanaka M, Tanaka S, Todokoro F, Yamaguchi K, Yamamoto N, Okido T, Mashima J, Hashizume A, Jin L, Lee KB, Lin YC, Nozaki A, Sakai K, Tada M, Miyazaki S, Makino T, Ohyanagi H, Osato N, Tanaka N, Suzuki Y, Ikeo K, Saitou N, Sugawara H, O'Donovan C, Kulikova T, Whitfield E, Halligan B, Shimoyama M, Twigger S, Yura K, Kimura K, Yasuda T, Nishikawa T, Akiyama Y, Motono C, Mukai Y, Nagasaki H, Suwa M, Horton P, Kikuno R, Ohara O, Lancet D, Eveno E, Graudens E, Imbeaud S, Debily MA, Hayashizaki Y, Amid C, Han M, Osanger A, Endo T, Thomas MA, Hirakawa M, Makalowski W, Nakao M, Kim NS, Yoo HS, De Souza SJ, Bonaldo Mde F, Niimura Y, Kuryshev V, Schupp I, Wiemann S, Bellgard M, Shionyu M, Jia L, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Zhang Q, Go M… See abstract for full author list ➔ Genome Information Integration Project And H-Invitational 2, et al. Nucleic Acids Res. 2008 Jan;36(Database issue):D793-9. doi: 10.1093/nar/gkm999. Epub 2007 Dec 18. Nucleic Acids Res. 2008. PMID: 18089548 Free PMC article.
  • H-DBAS: alternative splicing database of completely sequenced and manually annotated full-length cDNAs based on H-Invitational.
    Takeda J, Suzuki Y, Nakao M, Kuroda T, Sugano S, Gojobori T, Imanishi T. Takeda J, et al. Nucleic Acids Res. 2007 Jan;35(Database issue):D104-9. doi: 10.1093/nar/gkl854. Epub 2006 Nov 27. Nucleic Acids Res. 2007. PMID: 17130147 Free PMC article.

Publication types

LinkOut - more resources