Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 26;11(2):e0149621.
doi: 10.1371/journal.pone.0149621. eCollection 2016.

The Implicitome: A Resource for Rationalizing Gene-Disease Associations

Affiliations

The Implicitome: A Resource for Rationalizing Gene-Disease Associations

Kristina M Hettne et al. PLoS One. .

Abstract

High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing biomedical knowledge for identification and interpretation of gene-disease associations. The implicitome can be used in conjunction with experimental data resources to rationalize both known and novel associations. We demonstrate the usefulness of the implicitome by rationalizing known and novel gene-disease associations, including those from GWAS. To facilitate the re-use of implicit gene-disease associations, we publish our data in compliance with FAIR Data Publishing recommendations [https://www.force11.org/group/fairgroup] using nanopublications. An online tool (http://knowledge.bio) is available to explore established and potential gene-disease associations in the context of other biomedical relations.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Richard Bruskiewich is the Founder/CEO of Delphinai Corporation, a Canadian federally incorporated startup company performing scientific information systems development for the life sciences. "STAR Informatics" is a trade name of this company. Kristina M. Hettne has performed paid consultancy since November 1, 2015, for Euretos b.v, a startup founded in 2012 that develops knowledge management and discovery services for the life sciences, with the Euretos Knowledge Platform as a marketed product. The paid consultancy did not specifically fund this study. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1. Gene-Disease LWAS using concept profiles and networks of implicit information.
a) Concepts X and Z share an association in a hypothetical concept network via an explicit link (co-occurrence) and multiple implicit links (indirect connections via an intermediate concept, Y1, Y2, and Y3). The concept profile for concept X is depicted where the weights (w) between concepts reflect the co-occurrence frequencies of each concept in the data source. b) Concept profiles for concepts X and Z have explicit links to concepts Y1, Y2, and Y3 but no explicit link between themselves, as reflected in their corresponding concept profiles. c) The intermediate shared concepts between concept profiles X and Z constitute implicit information, indirectly linking X and Z (red dotted line). The strength of the implicit link (match score) is computed as the inner product of the weights of matching concepts in the concept profiles. d & e) The distribution of concept profile size for gene (median 1142, maximum 56,028) and disease (median 995, maximum 81,562) concepts. f) The distribution of number of overlapping concepts between gene and disease concept profiles (median 180, maximum overlap 40,725). Only 23 concept pairs had no overlapping concepts. g) Concept profiles for the human gene CWH43 (left) and the disease “Hyperphosphatesia with Mental Retardation” (right) which share no explicit co-occurrence. The 37 overlapping concepts are shown clustered in between. Both the number and weights of these overlapping links contribute to the strength of the implicit association. h) The distribution of match scores (higher numbers indicating stronger associations) for the 204 million LWAS-derived gene-disease pairs for both the explicit (black) and implicit (red) associations.
Fig 2
Fig 2. Correction of literature bias in the match score.
a,b) Distribution of genes and diseases recognized by LWAS when sorted by publication abundance (log number of MEDLINE abstracts). Red lines indicate the 5-abstract cut-off, below which concept profiles are not constructed. c,d) Distribution of gene and disease rank orders, binned in 10 percentile intervals (x-axis). Higher numbers indicating stronger associations (y-axis).
Fig 3
Fig 3. The relative distribution of LWAS association types.
Distribution of the top 105 highest-ranking implicit gene-disease pairs determined by manual inspection: Type I Gene family member (n = 71) represents gene-disease associations where a family member of the gene is causing the disease or a disease with very large phenotypic overlap; Type II Negation (n = 4) and Type III Homonym (n = 11) represent different classes of LWAS false positives composing 14% of the cases. Type IV Novel association (n = 19) indicates gene-disease associations of promise for follow up investigations.
Fig 4
Fig 4. Overlapping implicit gene-disease associations between LWAS and GWAS.
Green area: GWAS p-value cutoff of 10−5, yellow area: GWAS p-value cutoffs of 10−8, red horizontal area: LWAS 99th-percentile cutoff, blue horizontal area: LWAS 95th-percentile cutoff.
Fig 5
Fig 5. Overview of LWAS workflow (concept profile creation and analysis).

References

    1. Good BM, Ainscough BJ, McMichael JF, Su AI, Griffith OL. Organizing knowledge to enable personalization of medicine in cancer. Genome Biol. 2014;15: 438 Available: http://genomebiology.com/2014/15/8/438 10.1186/s13059-014-0438-7 - DOI - PMC - PubMed
    1. Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, Balhoff JP, et al. Finding our way through phenotypes. PLoS Biol. 2015;13: e1002033 Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4285398&tool=p... 10.1371/journal.pbio.1002033 - DOI - PMC - PubMed
    1. Machado CM, Rebholz-Schuhmann D, Freitas AT, Couto FM. The semantic web in translational medicine: current applications and future directions. Brief Bioinform. 2015;16: 89–103. Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4293377&tool=p... 10.1093/bib/bbt079 - DOI - PMC - PubMed
    1. Swanson DR. Complementary structures in disjoint science literatures. Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval—SIGIR ‘91. New York, New York, USA: ACM Press; 1991. pp. 280–289. Available: http://dl.acm.org/citation.cfm?id=122860.122889
    1. Jelier R, Schuemie MJ, Roes P-J, van Mulligen EM, Kors JA. Literature-based concept profiles for gene annotation: the issue of weighting. Int J Med Inform. 2008;77: 354–62. Available: http://www.ncbi.nlm.nih.gov/pubmed/17827057 - PubMed

Publication types