A new web-based data mining tool for the identification of candidate genes for human genetic disorders
- PMID: 12529706
- DOI: 10.1038/sj.ejhg.5200918
A new web-based data mining tool for the identification of candidate genes for human genetic disorders
Abstract
To identify the gene underlying a human genetic disorder can be difficult and time-consuming. Typically, positional data delimit a chromosomal region that contains between 20 and 200 genes. The choice then lies between sequencing large numbers of genes, or setting priorities by combining positional data with available expression and phenotype data, contained in different internet databases. This process of examining positional candidates for possible functional clues may be performed in many different ways, depending on the investigator's knowledge and experience. Here, we report on a new tool called the GeneSeeker, which gathers and combines positional data and expression/phenotypic data in an automated way from nine different web-based databases. This results in a quick overview of interesting candidate genes in the region of interest. The GeneSeeker system is built in a modular fashion allowing for easy addition or removal of databases if required. Databases are searched directly through the web, which obviates the need for data warehousing. In order to evaluate the GeneSeeker tool, we analysed syndromes with known genesis. For each of 10 syndromes the GeneSeeker programme generated a shortlist that contained a significantly reduced number of candidate genes from the critical region, yet still contained the causative gene. On average, a list of 163 genes based on position alone was reduced to a more manageable list of 22 genes based on position and expression or phenotype information. We are currently expanding the tool by adding other databases. The GeneSeeker is available via the web-interface (http://www.cmbi.kun.nl/GeneSeeker/).
Similar articles
-
GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases.Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W758-61. doi: 10.1093/nar/gki435. Nucleic Acids Res. 2005. PMID: 15980578 Free PMC article.
-
MILANO--custom annotation of microarray results using automatic literature searches.BMC Bioinformatics. 2005 Jan 20;6:12. doi: 10.1186/1471-2105-6-12. BMC Bioinformatics. 2005. PMID: 15661078 Free PMC article.
-
Syndrome to gene (S2G): in-silico identification of candidate genes for human diseases.Hum Mutat. 2010 Mar;31(3):229-36. doi: 10.1002/humu.21171. Hum Mutat. 2010. PMID: 20052752
-
[Transcriptomes for serial analysis of gene expression].J Soc Biol. 2002;196(4):303-7. J Soc Biol. 2002. PMID: 12645300 Review. French.
-
The modular nature of genetic diseases.Clin Genet. 2007 Jan;71(1):1-11. doi: 10.1111/j.1399-0004.2006.00708.x. Clin Genet. 2007. PMID: 17204041 Review.
Cited by
-
Genome-wide identification of genes likely to be involved in human genetic disease.Nucleic Acids Res. 2004 Jun 4;32(10):3108-14. doi: 10.1093/nar/gkh605. Print 2004. Nucleic Acids Res. 2004. PMID: 15181176 Free PMC article.
-
Gene prioritization of resistant rice gene against Xanthomas oryzae pv. oryzae by using text mining technologies.Biomed Res Int. 2013;2013:853043. doi: 10.1155/2013/853043. Epub 2013 Nov 25. Biomed Res Int. 2013. PMID: 24371834 Free PMC article.
-
Text mining in cancer gene and pathway prioritization.Cancer Inform. 2014 Oct 13;13(Suppl 1):69-79. doi: 10.4137/CIN.S13874. eCollection 2014. Cancer Inform. 2014. PMID: 25392685 Free PMC article. Review.
-
Integration of text- and data-mining using ontologies successfully selects disease gene candidates.Nucleic Acids Res. 2005 Mar 14;33(5):1544-52. doi: 10.1093/nar/gki296. Print 2005. Nucleic Acids Res. 2005. PMID: 15767279 Free PMC article.
-
POCUS: mining genomic sequence annotation to predict disease genes.Genome Biol. 2003;4(11):R75. doi: 10.1186/gb-2003-4-11-r75. Epub 2003 Oct 10. Genome Biol. 2003. PMID: 14611661 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical