Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun 15;33(12):1751-1757.
doi: 10.1093/bioinformatics/btx028.

HIPred: an integrative approach to predicting haploinsufficient genes

Affiliations

HIPred: an integrative approach to predicting haploinsufficient genes

Hashem A Shihab et al. Bioinformatics. .

Abstract

Motivation: A major cause of autosomal dominant disease is haploinsufficiency, whereby a single copy of a gene is not sufficient to maintain the normal function of the gene. A large proportion of existing methods for predicting haploinsufficiency incorporate biological networks, e.g. protein-protein interaction networks that have recently been shown to introduce study bias. As a result, these methods tend to perform best on well-studied genes, but underperform on less studied genes. The advent of large genome sequencing consortia, such as the 1000 genomes project, NHLBI Exome Sequencing Project and the Exome Aggregation Consortium creates an urgent need for unbiased haploinsufficiency prediction methods.

Results: Here, we describe a machine learning approach, called HIPred, that integrates genomic and evolutionary information from ENSEMBL, with functional annotations from the Encyclopaedia of DNA Elements consortium and the NIH Roadmap Epigenomics Project to predict haploinsufficiency, without the study bias described earlier. We benchmark HIPred using several datasets and show that our unbiased method performs as well as, and in most cases, outperforms existing biased algorithms.

Availability and implementation: HIPred scores for all gene identifiers are available at: https://github.com/HAShihab/HIPred .

Contact: h.shihab@bristol.ac.uk or tom.gaunt@bristol.ac.uk.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Methods for integrating feature groups: (a) feature groups are combined at the data level and fed into a single classifier; (b) feature groups are encoded as base kernels and combined using MKL; and (c) feature groups are used to construct heterogeneous base classifiers which are then combined using a stacking approach
Fig. 2
Fig. 2
Informative features used for predicting haploinsufficient genes

References

    1. Campbell C., Ying Y. (2011) Learning with support vector machines. Synth. Lect. Artif. Intell. Mach. Learn., 5, 1–95.
    1. Chen T., Guestrin C. (2016). Xgboost: A scalable tree boosting system. CoRR http://arxiv.org/abs/1603.02754.
    1. Choi Y. et al. (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One, 7, e46688.. - PMC - PubMed
    1. Dang V.T. et al. (2008) Identification of human haploinsufficient genes and their genomic proximity to segmental duplications. Eur. J. Hum. Genet., 16, 1350–1357. - PubMed
    1. Huang N. et al. (2010) Characterising and predicting haploinsufficiency in the human genome. PLoS Genet., 6, e1001154.. - PMC - PubMed