Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 18:6:29915.
doi: 10.1038/srep29915.

Relational Network for Knowledge Discovery through Heterogeneous Biomedical and Clinical Features

Affiliations

Relational Network for Knowledge Discovery through Heterogeneous Biomedical and Clinical Features

Huaidong Chen et al. Sci Rep. .

Abstract

Biomedical big data, as a whole, covers numerous features, while each dataset specifically delineates part of them. "Full feature spectrum" knowledge discovery across heterogeneous data sources remains a major challenge. We developed a method called bootstrapping for unified feature association measurement (BUFAM) for pairwise association analysis, and relational dependency network (RDN) modeling for global module detection on features across breast cancer cohorts. Discovered knowledge was cross-validated using data from Wake Forest Baptist Medical Center's electronic medical records and annotated with BioCarta signaling signatures. The clinical potential of the discovered modules was exhibited by stratifying patients for drug responses. A series of discovered associations provided new insights into breast cancer, such as the effects of patient's cultural background on preferences for surgical procedure. We also discovered two groups of highly associated features, the HER2 and the ER modules, each of which described how phenotypes were associated with molecular signatures, diagnostic features, and clinical decisions. The discovered "ER module", which was dominated by cancer immunity, was used as an example for patient stratification and prediction of drug responses to tamoxifen and chemotherapy. BUFAM-derived RDN modeling demonstrated unique ability to discover clinically meaningful and actionable knowledge across highly heterogeneous biomedical big data sets.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Bootstrapping for unified feature association measurement (BUFAM).
(A) Flowchart of the BUFAM algorithm. (B) Statistical measurements for specific combinations of feature data types. Mathematical details are in the Supplement.
Figure 2
Figure 2. BUFAM feature association discovery.
(A) Overview of the measures of pairwise feature associations presented in formula image. Gray regions represent non-testable associations. (B) Validation of discovered associations using the EMR of Wake Forest Baptist Medical Center (WakeOne) presented in formula image. Associations between Oncotype DX score and histologic grade, and between Oncotype DX score and progesterone status, are shown as examples.
Figure 3
Figure 3. Comparison with KNN algorithm.
(A) Missing values of 6 numeric features were imputed by the KNN algorithm. These were compared with p-values generated by BUFAM for the corresponding 15 pairwise associations before (black circles) and after (red stars) imputation. Associations were sorted according to p-values generated by BUFAM before imputation. (B) Comparisons of normalized original (black) and imputed (red) values of the BP metagene (stars) and tumor size (circles). Patients were sorted according to normalized BP metagene values.
Figure 4
Figure 4. Comparison with meta-analysis.
(A) Comparison of feature pairs for BUFAM and cohort-based analyses. (B) Numbers of supporting patients for each pairwise association found with BUFAM compared to meta-analysis. (C) Comparison of association results of BUFAM and meta-analysis were compared. The two horizontal dashed lines marked the p-values of 0.01 and 0.05, respectively. Black: BUFAM; Blue: WFCCC; Green: MDACC; Red: TCGA.
Figure 5
Figure 5. RDN module detection and annotation.
(A) Visualization of the RDN topology of biomedical concepts of different data types (numeric, ordinal, nominal, and binary, represented in node shapes); association polarity (positive, negative, or not applicable); and the HER2 module (A) and ER module (B). (B) Characters of the HER2 and ER modules: pivot concepts, phenotypic features, molecular underpinnings, and most common treatments.
Figure 6
Figure 6. ER module discovery and patient subtyping for drug responses.
(A) Feature availability among the TCGA (orange), WFCCC (purple), and MDACC (blue) cohorts. Feature names are shown at the right. Unavailable features are labeled in gray. (B) Pairwise associations among features in the ER module presented in formula image. The yellow box highlights common associations shared by two groups of associations. (C) Patient subtyping using the BioCarta signatures (top region) associated with the ER module. These signatures revealed four subtypes: Immune Inert, Neutral, Active, and Responsive. (D) Differential treatment responses assessed by Kaplan-Meier survival analysis of the WFCCC cohort, using distant-metastasis-free survival time as the index. Survival curves of the Immune Neutral (top) and Active (bottom) Subtypes under different treatments are presented and labeled with patient numbers and log-rank test p-values. More details are given in the Supplement.

Similar articles

Cited by

References

    1. Whetzel P. L. et al.. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic acids research 39, W541–W545 (2011). - PMC - PubMed
    1. Schadt E. E., Linderman M. D., Sorenson J., Lee L. & Nolan G. P. Computational solutions to large-scale data management and analysis. Nature Reviews Genetics 11, 647–657 (2010). - PMC - PubMed
    1. Rosenthal A. et al.. Cloud computing: a new business paradigm for biomedical information sharing. Journal of Biomedical Informatics 43, 342–353 (2010). - PubMed
    1. Shah N. H. & Tenenbaum J. D. The coming age of data-driven medicine: translational bioinformatics’ next frontier. J Am Med Inform Assoc 19, e2–4, doi: 10.1136/amiajnl-2012-000969 (2012). - DOI - PMC - PubMed
    1. Dolinski K., Chatr-aryamontri A. & Tyers M. Systematic curation of protein and genetic interaction data for computable biology. BMC biology 11, 43 (2013). - PMC - PubMed

Publication types

Substances