. 2016 Feb 26;11(2):e0149621.

doi: 10.1371/journal.pone.0149621. eCollection 2016.

The Implicitome: A Resource for Rationalizing Gene-Disease Associations

Kristina M Hettne¹, Mark Thompson¹, Herman H H B M van Haagen¹, Eelke van der Horst¹, Rajaram Kaliyaperumal¹, Eleni Mina¹, Zuotian Tatum¹, Jeroen F J Laros¹, Erik M van Mulligen^{1

2}, Martijn Schuemie², Emmelien Aten¹, Tong Shu Li³, Richard Bruskiewich⁴, Benjamin M Good³, Andrew I Su³, Jan A Kors², Johan den Dunnen¹, Gert-Jan B van Ommen¹, Marco Roos¹, Peter A C 't Hoen¹, Barend Mons^{1

5}, Erik A Schultes^{1

6}

Affiliations

¹ Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
² Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands.
³ Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America.
⁴ STAR Informatics / Delphinai Corporation, Port Moody, BC, Canada.
⁵ Dutch Techcentre for Life Sciences, Utrecht, The Netherlands.
⁶ Leiden Institute for Advanced Computer Science, Leiden, The Netherlands.

PMID: 26919047
PMCID: PMC4769089
DOI: 10.1371/journal.pone.0149621

The Implicitome: A Resource for Rationalizing Gene-Disease Associations

Kristina M Hettne et al. PLoS One. 2016.

. 2016 Feb 26;11(2):e0149621.

doi: 10.1371/journal.pone.0149621. eCollection 2016.

Authors

Affiliations

¹ Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
² Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands.
³ Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America.
⁴ STAR Informatics / Delphinai Corporation, Port Moody, BC, Canada.
⁵ Dutch Techcentre for Life Sciences, Utrecht, The Netherlands.
⁶ Leiden Institute for Advanced Computer Science, Leiden, The Netherlands.

PMID: 26919047
PMCID: PMC4769089
DOI: 10.1371/journal.pone.0149621

Abstract

High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing biomedical knowledge for identification and interpretation of gene-disease associations. The implicitome can be used in conjunction with experimental data resources to rationalize both known and novel associations. We demonstrate the usefulness of the implicitome by rationalizing known and novel gene-disease associations, including those from GWAS. To facilitate the re-use of implicit gene-disease associations, we publish our data in compliance with FAIR Data Publishing recommendations [https://www.force11.org/group/fairgroup] using nanopublications. An online tool (http://knowledge.bio) is available to explore established and potential gene-disease associations in the context of other biomedical relations.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Richard Bruskiewich is the Founder/CEO of Delphinai Corporation, a Canadian federally incorporated startup company performing scientific information systems development for the life sciences. "STAR Informatics" is a trade name of this company. Kristina M. Hettne has performed paid consultancy since November 1, 2015, for Euretos b.v, a startup founded in 2012 that develops knowledge management and discovery services for the life sciences, with the Euretos Knowledge Platform as a marketed product. The paid consultancy did not specifically fund this study. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.

Figures

**Fig 1. Gene-Disease LWAS using concept profiles and networks of implicit information.**
a) Concepts X and Z share an association in a hypothetical concept network via an explicit link (co-occurrence) and multiple implicit links (indirect connections via an intermediate concept, Y1, Y2, and Y3). The concept profile for concept X is depicted where the weights (w) between concepts reflect the co-occurrence frequencies of each concept in the data source. b) Concept profiles for concepts X and Z have explicit links to concepts Y1, Y2, and Y3 but no explicit link between themselves, as reflected in their corresponding concept profiles. c) The intermediate shared concepts between concept profiles X and Z constitute implicit information, indirectly linking X and Z (red dotted line). The strength of the implicit link (match score) is computed as the inner product of the weights of matching concepts in the concept profiles. **d & e)** The distribution of concept profile size for gene (median 1142, maximum 56,028) and disease (median 995, maximum 81,562) concepts. f) The distribution of number of overlapping concepts between gene and disease concept profiles (median 180, maximum overlap 40,725). Only 23 concept pairs had no overlapping concepts. g) Concept profiles for the human gene *CWH43* (left) and the disease “Hyperphosphatesia with Mental Retardation” (right) which share no explicit co-occurrence. The 37 overlapping concepts are shown clustered in between. Both the number and weights of these overlapping links contribute to the strength of the implicit association. h) The distribution of match scores (higher numbers indicating stronger associations) for the 204 million LWAS-derived gene-disease pairs for both the explicit (black) and implicit (red) associations.

**Fig 2. Correction of literature bias in the match score.**
**a,b**) Distribution of genes and diseases recognized by LWAS when sorted by publication abundance (log number of MEDLINE abstracts). Red lines indicate the 5-abstract cut-off, below which concept profiles are not constructed. **c,d)** Distribution of gene and disease rank orders, binned in 10 percentile intervals (x-axis). Higher numbers indicating stronger associations (y-axis).

**Fig 3. The relative distribution of LWAS association types.**
Distribution of the top 105 highest-ranking implicit gene-disease pairs determined by manual inspection: *Type I Gene family member* (n = 71) represents gene-disease associations where a family member of the gene is causing the disease or a disease with very large phenotypic overlap; *Type II Negation* (n = 4) and *Type III Homonym* (n = 11) represent different classes of LWAS false positives composing 14% of the cases. *Type IV Novel association* (n = 19) indicates gene-disease associations of promise for follow up investigations.

**Fig 4. Overlapping implicit gene-disease associations between LWAS and GWAS.**
Green area: GWAS p-value cutoff of 10⁻⁵, yellow area: GWAS p-value cutoffs of 10⁻⁸, red horizontal area: LWAS 99^th-percentile cutoff, blue horizontal area: LWAS 95^th-percentile cutoff.

**Fig 5. Overview of LWAS workflow (concept profile creation and analysis).**

See this image and copyright information in PMC

References

1. Good BM, Ainscough BJ, McMichael JF, Su AI, Griffith OL. Organizing knowledge to enable personalization of medicine in cancer. Genome Biol. 2014;15: 438 Available: http://genomebiology.com/2014/15/8/438 10.1186/s13059-014-0438-7 - DOI - PMC - PubMed
1. Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, Balhoff JP, et al. Finding our way through phenotypes. PLoS Biol. 2015;13: e1002033 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4285398&tool=p... 10.1371/journal.pbio.1002033 - DOI - PMC - PubMed
1. Machado CM, Rebholz-Schuhmann D, Freitas AT, Couto FM. The semantic web in translational medicine: current applications and future directions. Brief Bioinform. 2015;16: 89–103. Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4293377&tool=p... 10.1093/bib/bbt079 - DOI - PMC - PubMed
1. Swanson DR. Complementary structures in disjoint science literatures. Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval—SIGIR ‘91. New York, New York, USA: ACM Press; 1991. pp. 280–289. Available: http://dl.acm.org/citation.cfm?id=122860.122889
1. Jelier R, Schuemie MJ, Roes P-J, van Mulligen EM, Kors JA. Literature-based concept profiles for gene annotation: the issue of weighting. Int J Med Inform. 2008;77: 354–62. Available: http://www.ncbi.nlm.nih.gov/pubmed/17827057 - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Implicitome: A Resource for Rationalizing Gene-Disease Associations

Affiliations

The Implicitome: A Resource for Rationalizing Gene-Disease Associations

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources