Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Apr 29;130(9):1423-1444.
doi: 10.1161/CIRCRESAHA.121.319969. Epub 2022 Apr 28.

Harnessing Big Data to Advance Treatment and Understanding of Pulmonary Hypertension

Affiliations
Review

Harnessing Big Data to Advance Treatment and Understanding of Pulmonary Hypertension

Christopher J Rhodes et al. Circ Res. .

Abstract

Pulmonary hypertension is a complex disease with multiple causes, corresponding to phenotypic heterogeneity and variable therapeutic responses. Advancing understanding of pulmonary hypertension pathogenesis is likely to hinge on integrated methods that leverage data from health records, imaging, novel molecular -omics profiling, and other modalities. In this review, we summarize key data sets generated thus far in the field and describe analytical methods that hold promise for deciphering the molecular mechanisms that underpin pulmonary vascular remodeling, including machine learning, network medicine, and functional genetics. We also detail how genetic and subphenotyping approaches enable earlier diagnosis, refined prognostication, and optimized treatment prediction. We propose strategies that identify functionally important molecular pathways, bolstered by findings across multi-omics platforms, which are well-positioned to individualize drug therapy selection and advance precision medicine in this highly morbid disease.

Keywords: big data; information dissemination; phenotype; precision medicine; pulmonary arterial hypertension.

PubMed Disclaimer

Conflict of interest statement

Disclosures statement: The authors disclose no direct conflicts in this work. Dr Maron discloses Consultant roles for Actelion Pharmaceuticals and Tenax, and a Grant / Contract from Deerfield Company.

Figures

Figure 1.
Figure 1.. Overview of applied machine learning (ML) in pulmonary hypertension (PH).
Falling under the umbrella of artificial intelligence (AI), ML describes a family of algorithms used to make predictions or infer patterns in complex datasets. Supervised ML algorithms are trained to predict a known sample label (i.e. clinical feature or outcome), where a variety of data types can be input for prediction of a continuous or categorical feature (regression or classification). Unsupervised ML algorithms are most often applied for clustering, where patterns and structure are agnostically inferred in unlabeled datasets. Deep learning, an emerging sub-field of ML in which algorithms are built on artificial neural networks mimicking human brain connectivity, can be employed for supervised or unsupervised learning tasks. This figure summarizes potential high-yield applications of supervised and unsupervised ML methods in PH.
Figure 2.
Figure 2.. Disrupting conventional risk stratification in PAH through networks.
Classical strategies for risk stratification hinge on linear regression methods, including (A) univariate and (B) multivariate analyses, which estimate clinical risk based on the association individual parameters and outcomes. N, patient number. V, variable. (C) However, this approach does not inform clinical phenotypes based on functional relationships between variables and is also linked to over- and under-estimation of risk driven by extreme datapoints that alter regression slopes. (D) Phenotype networks offer an alternative approach to identifying patient subgroups, which relies on functional associations between clinical parameters as the starting point for cluster identification. (E) By focusing on a subset of functionally important clinical parameters to subgroup patients, Euclidian methods are accessible to match patients at point-of-care with similar patients for which outcome has been determined previously. This approach has been utilized to prognosticate cohorts at-risk for pulmonary arterial hypertension referred for invasive cardiopulmonary exercise testing. Panel E adapted from Oldham et al..
Figure 3.
Figure 3.. Leveraging networks to optimize phenotyping in pulmonary arterial hypertension.
(A) Data from large scale screens profiling protein-protein interactions have been compiled from various sources to establish centralized protein-protein interactome. Thus, protein-protein interactions in the interactome reflect functional or physical associations between proteins and in this way generate a wiring map of molecular pathways inclusive of functionally relevant signaling pathways. (B) Multiplex data generating transcriptomic or proteomic signatures collected from PAH patients and controls can be analyzed further using networks. Differentially expressed data between groups is mapped to the interactome, and protein-protein interaction partners are carried forward to generate a PPI network. Thus, the network is based on biological data from patients, but transformed to a network output. Since, only valid functional- or physical- PPIs are in the interactome, the output is a wiring map of biological important PPIs informed from the patient samples. (C) The integration of PPIs with clinical phenotypic data defines reticulotypes, as reported previously for patients with World Symposium Pulmonary Hypertension Group 2 pulmonary hypertension due to cardiomyopathy.
Figure 4.
Figure 4.. Translating PH sub-phenotype discovery into actionable knowledge and clinical applicability.
A blueprint is offered for possible investigations that may occur following initial unsupervised ML-based discovery of PH sub-phenotypes. First, independent cohort validation is required prior to additional follow-up studies. Because many initial sub-phenotyping studies may be limited to cross-sectional blood omics profiling, follow-up longitudinal analyses will be warranted to understand if and how phenotype-specific molecular profiles evolve during the disease course. To gain insight into the mechanistic underpinnings of a sub-phenotype, signaling pathways can be inferred in silico from public molecular interaction networks or enrichment databases. Implicated pathways can then be functionally validated in animal models of PH or examined in molecular studies of lung tissue from PH patients. To permit feasible sub-phenotype classification in the clinic or drug trial setting, parsimonious classifiers must be developed that require only a limited number of readily obtainable input variables. A classifier could be implemented for secondary analysis of data and samples from completed clinical trials. Data from this secondary analysis might provide the foundation for design of an innovative clinical trial (split-phase, master protocol, or adaptive design).
Figure 5:
Figure 5:
Illustrative example of how molecular profiles derived from understanding of large datasets could be integrated into standard clinical risk stratification to improve risk estimates. Established clinical risk factors enable identification of very high and very low risk patients but risk in more intermediate cases is harder to establish. Independent molecular stratifiers could provide additional information on whether a patient is considered to be higher or lower risk and therefore guide clinical decision making.
Central Illustration.
Central Illustration.. Deep phenotyping in the next era of pulmonary hypertension.
A substantiation number of large registries through foundation support, international collaborations, and institutional registries collecting clinical and biological data have emerged over the past 10 years. Integration of these data, coupled with unbiased and biased analytical methodologies, will be critical for clarifying the phenotypic profile of pulmonary hypertension. Expanding the range of collected data to non-conventional parameters, including perinatal events, nutritional patterns, toxic exposures, and pharmacovigilance programs (as already reported in France) via electronic medical record-based systems and other personalized device mechanisms will be critical for integrating genetic risk with acquired disease determinants. Ultimately, phenotyping patients will hinge on the identification of functionally important molecular pathways, inclusive of various –omics data, to generate reticulotypes. In turn, valid reticulocytes are well-positioned to optimize clinical trial design and establish a path toward individualized drug selection and, hence, precision medicine.

References

    1. Southgate L, Machado RD, Graf S and Morrell NW. Molecular genetic framework underlying pulmonary arterial hypertension. Nature reviews Cardiology. 2019. - PubMed
    1. Graf S, Haimel M, Bleda M, Hadinnapola C, Southgate L, Li W, Hodgson J, Liu B, Salmon RM, Southwood M, et al. Identification of rare sequence variation underlying heritable pulmonary arterial hypertension. Nature communications. 2018;9:1416. - PMC - PubMed
    1. Zhu N, Pauciulo MW, Welch CL, Lutz KA, Coleman AW, Gonzaga-Jauregui C, Wang J, Grimes JM, Martin LJ, He H, et al. Novel risk genes and mechanisms implicated by exome sequencing of 2572 individuals with pulmonary arterial hypertension. Genome Med. 2019;11:69. - PMC - PubMed
    1. Zhu N, Swietlik EM, Welch CL, Pauciulo MW, Hagen JJ, Zhou X, Guo Y, Karten J, Pandya D, Tilly T, et al. Rare variant analysis of 4241 pulmonary arterial hypertension cases from an international consortium implicates FBLN2, PDGFD, and rare de novo variants in PAH. Genome Med. 2021;13:80. - PMC - PubMed
    1. Wu Y, Wharton J, Walters R, Vasilaki E, Aman J, Zhao L, Wilkins MR and Rhodes CJ. The pathophysiological role of novel pulmonary arterial hypertension gene SOX17. Eur Respir J. 2021;58. - PubMed

Publication types