Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep;12(9):841-3.
doi: 10.1038/nmeth.3484. Epub 2015 Jul 20.

Phenolyzer: phenotype-based prioritization of candidate genes for human diseases

Affiliations

Phenolyzer: phenotype-based prioritization of candidate genes for human diseases

Hui Yang et al. Nat Methods. 2015 Sep.

Abstract

Prior biological knowledge and phenotype information may help to identify disease genes from human whole-genome and whole-exome sequencing studies. We developed Phenolyzer (http://phenolyzer.usc.edu), a tool that uses prior information to implicate genes involved in diseases. Phenolyzer exhibits superior performance over competing methods for prioritizing Mendelian and complex disease genes, based on disease or phenotype terms entered as free text.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare competing financial interests: details are available in the online version of the paper.

Figures

Figure 1
Figure 1
Workflow of Phenolyzer. (1) Disease match: each disease or phenotype query term is separately translated into sets of disease names by word match, offspring search, synonym retrieval and phenotype interpretation in disease name databases. (2) Gene query: each retrieved disease name is queried in the gene-disease databases based on an exact match, to get a list of genes. (3) Gene score system: a score based on the type and confidence of the gene-disease relationship is generated for each gene corresponding to each disease name. Then, for each input term, a weighted sum score is calculated for each reported gene by adding all the scores retrieved in previous step. The seed gene set is generated by collating all the genes of all input terms, and each gene score is normalized. (4) Seed gene growth: candidate disease genes are expanded beyond the seed gene set based on four types of gene-gene relationships; scores are calculated for all genes that connect with seed genes. (5) Gene ranking: all the information is integrated to generate a score for each gene, with the weights trained from a logistic regression model. The scores are renormalized to the final prioritized gene list. HPRD, Human Protein Reference Database; HTRI, Human Transcriptional Regulation Interaction Database.
Figure 2
Figure 2
Comparison between Phenolyzer and other tools to find well-known monogenic disease genes and predict recently published novel disease genes. (a) The ranking distribution of genes for 14 monogenic diseases. (b) The ranking distribution of 55 recently published disease genes from four human genetics journals.

Similar articles

Cited by

References

    1. Lyon GJ, Wang K. Genome Med. 2012;4:58. - PMC - PubMed
    1. Wang K, Li M, Hakonarson H. Nucleic Acids Res. 2010;38:e164. - PMC - PubMed
    1. Cingolani P, et al. Fly (Austin) 2012;6:80–92. - PMC - PubMed
    1. McLaren W, et al. Bioinformatics. 2010;26:2069–2070. - PMC - PubMed
    1. Jäger M, et al. Hum Mutat. 2014;35:548–555. - PubMed

Publication types

Substances