Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 24;57(6):2002591.
doi: 10.1183/13993003.02591-2020. Print 2021 Jun.

Identifying early pulmonary arterial hypertension biomarkers in systemic sclerosis: machine learning on proteomics from the DETECT cohort

Affiliations

Identifying early pulmonary arterial hypertension biomarkers in systemic sclerosis: machine learning on proteomics from the DETECT cohort

Yasmina Bauer et al. Eur Respir J. .

Abstract

Pulmonary arterial hypertension (PAH) is a devastating complication of systemic sclerosis (SSc). Screening for PAH in SSc has increased detection, allowed early treatment for PAH and improved patient outcomes. Blood-based biomarkers that reliably identify SSc patients at risk of PAH, or with early disease, would significantly improve screening, potentially leading to improved survival, and provide novel mechanistic insights into early disease. The main objective of this study was to identify a proteomic biomarker signature that could discriminate SSc patients with and without PAH using a machine learning approach and to validate the findings in an external cohort.Serum samples from patients with SSc and PAH (n=77) and SSc without pulmonary hypertension (non-PH) (n=80) were randomly selected from the clinical DETECT study and underwent proteomic screening using the Myriad RBM Discovery platform consisting of 313 proteins. Samples from an independent validation SSc cohort (PAH n=22 and non-PH n=22) were obtained from the University of Sheffield (Sheffield, UK).Random forest analysis identified a novel panel of eight proteins, comprising collagen IV, endostatin, insulin-like growth factor binding protein (IGFBP)-2, IGFBP-7, matrix metallopeptidase-2, neuropilin-1, N-terminal pro-brain natriuretic peptide and RAGE (receptor for advanced glycation end products), that discriminated PAH from non-PH in SSc patients in the DETECT Discovery Cohort (average area under the receiver operating characteristic curve 0.741, 65.1% sensitivity/69.0% specificity), which was reproduced in the Sheffield Confirmatory Cohort (81.1% accuracy, 77.3% sensitivity/86.5% specificity).This novel eight-protein biomarker panel has the potential to improve early detection of PAH in SSc patients and may provide novel insights into the pathogenesis of PAH in the context of SSc.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest: Y. Bauer is a former employee of Actelion Pharmaceuticals Ltd and Idorsia Pharmaceuticals Ltd, and is now an employee of Galapagos GmbH. Conflict of interest: S. de Bernard reports grants from Idorsia, during the conduct of the study. Conflict of interest: P. Hickey has nothing to disclose. Conflict of interest: K. Ballard is an employee of Myriad RBM. Conflict of interest: J. Cruz is an employee of Myriad RBM. Conflict of interest: P. Cornelisse is a former employee of Actelion Pharmaceuticals Ltd. Conflict of interest: H. Chadha-Boreham is a former employee of Actelion Pharmaceuticals Ltd. Conflict of interest: O. Distler reports personal fees for consultancy from Amgen, AbbVie, Acceleron Pharma, AnaMar, Actelion, Alexion, Arxx Therapeutics, Baecon Discovery, Blade Therapeutics, Corbuspharma, CSL Behring, ChemomAb, Horizon Pharmaceuticals, Ergonex, Galapagos NV, Glenmark Pharmaceuticals, GSK, Inventiva, Italfarmaco, iQone, iQvia, Kymera, Lilly, Medac, Sanofi, Target Bio Science and UCB, grants and personal fees for consultancy and lectures from Bayer and Boehringer Ingelheim, personal fees for interviewing from Catenion, grants from Competitive Drug Development International Ltd, personal fees for consultancy and lectures from Medscape, MSD, Pfizer and Roche, grants and personal fees for consultancy from Mitsubishi Tanabe Pharma, personal fees for lectures from Novartis, outside the submitted work; and has a patent mir-29 for the treatment of systemic sclerosis issued (US8247389, EP2331143). Conflict of interest: D. Rosenberg is an employee of and hold shares in Johnson and Johnson. Conflict of interest: M. Doelberg is an employee of Actelion Pharmaceuticals Ltd. Conflict of interest: S. Roux is a former employee of Actelion Pharmaceuticals Ltd. Conflict of interest: O. Nayler is a former employee and former stock owner of Actelion Pharmaceuticals Ltd, and current employee and stock owner of Idorsia Pharmaceuticals Ltd. Conflict of interest: A. Lawrie reports grants from the British Heart Foundation and Medical Research Council, grants, personal fees and other (conference attendance and travel) from Actelion Pharmaceuticals, grants and personal fees from GlaxoSmithKline, outside the submitted work.

Figures

FIGURE 1
FIGURE 1
Patients and analytes for a) the DETECT Discovery Cohort and b) the Sheffield Confirmatory Cohort. RHC: right heart catheterisation; PH: pulmonary hypertension; PAH: pulmonary arterial hypertension; STH-ObS: The Sheffield Teaching Hospitals Observational Study of Patients with Pulmonary Hypertension, Cardiovascular and Lung Disease; CTD: connective tissue disease; ILD: interstitial lung disease. #: enrolled 2008–2011; : enrolled 2008–2015; +: 271 protein analytes passed quality control in the DETECT Discovery Cohort and 238 protein analytes passed quality control in the Sheffield Confirmatory Cohort (238 protein analytes were suitable for investigation in both cohorts).
FIGURE 2
FIGURE 2
Variables (proteins) of importance to classify pulmonary arterial hypertension. Variable importance output of random forests applied to a) the DETECT Discovery Cohort, b) the Sheffield Confirmatory Cohort and c) 238 common proteins between the two cohorts, applied on the DETECT Discovery Cohort. The plots show the most important variables (y-axis) as assessed by the mean decrease of the Gini index (x-axis). Proteins are ordered top to bottom as most to least important. The eight common variables in all analyses appear in red. See supplementary table S1 for details of the proteins on the Myriad RBM Discovery platform.
FIGURE 3
FIGURE 3
Serum concentrations of the eight best-performing and common proteins in predicting pulmonary arterial hypertension (PAH) in the DETECT Discovery Cohort and the Sheffield Confirmatory Cohort: a) collagen IV, b) endostatin, c) insulin-like growth factor binding protein (IGFBP)-2, d) IGFBP-7, e) matrix metallopeptidase (MMP)-2, f) neuropilin-1, g) N-terminal pro-brain natriuretic peptide (NT-proBNP) and h) RAGE (receptor for advanced glycation end products). PH: pulmonary hypertension. Boxes indicate median and interquartile range; whiskers indicate the full range of the data. Individual patient samples are represented by dots. p-values from the Wilcoxon rank-sum test between the two patient groups.
FIGURE 4
FIGURE 4
a, b) Performance of the panel of six common protein biomarkers in a) the DETECT Discovery Cohort and b) the Sheffield Confirmatory Cohort: receiver operating characteristic (ROC) curves of the pulmonary arterial hypertension (PAH) versus non-pulmonary hypertension (non-PH) classifier. ROC-AUC: area under the ROC curve; RAGE: receptor for advanced glycation end products; IGFBP: insulin-like growth factor binding protein; MMP: matrix metallopeptidase; SSc: systemic sclerosis; NT-proBNP: N-terminal pro-brain natriuretic peptide. The six selected proteins are the subset from the eight common proteins that produced the best ROC-AUC in the DETECT Discovery Cohort (0.751). c, d) Addition of c) NT-proBNP or d) NT-proBNP plus neuropilin-1 to the six selected proteins.
FIGURE 5
FIGURE 5
Sparse partial least squares association of pulmonary vascular resistance (PVR) to six common biomarker proteins: a) N-terminal pro-brain natriuretic peptide (NT-proBNP), b) RAGE (receptor for advanced glycation end products), c) insulin-like growth factor binding protein (IGFBP)-7, d) pyruvate carboxylase (cFib), e) vascular cell adhesion molecule (VCAM)-1 and f) surfactant protein D (SP-D). Correlation plots for each individual biomarker variable with PVR, showing Pearson's correlation coefficient between the logarithm of the two variables and the corresponding p-value.

Comment in

References

    1. Mukerjee D. Prevalence and outcome in systemic sclerosis associated pulmonary arterial hypertension: application of a registry approach. Ann Rheum Dis 2003; 62: 1088–1093. doi: 10.1136/ard.62.11.1088 - DOI - PMC - PubMed
    1. Hachulla E, Gressin V, Guillevin L, et al. Early detection of pulmonary arterial hypertension in systemic sclerosis: a French nationwide prospective multicenter study. Arthritis Rheum 2005; 52: 3792–3800. doi: 10.1002/art.21433 - DOI - PubMed
    1. Tyndall AJ, Bannert B, Vonk M, et al. Causes and risk factors for death in systemic sclerosis: a study from the EULAR Scleroderma Trials and Research (EUSTAR) database. Ann Rheum Dis 2010; 69: 1809–1815. doi: 10.1136/ard.2009.114264 - DOI - PubMed
    1. Humbert M, Sitbon O, Chaouat A, et al. Pulmonary arterial hypertension in France: results from a national registry. Am J Respir Crit Care Med 2006; 173: 1023–1030. doi: 10.1164/rccm.200510-1668OC - DOI - PubMed
    1. Chung L, Domsic RT, Lingala B, et al. Survival and predictors of mortality in systemic sclerosis-associated pulmonary arterial hypertension: outcomes from the pulmonary hypertension assessment and recognition of outcomes in scleroderma registry. Arthritis Care Res 2014; 66: 489–495. doi: 10.1002/acr.22121 - DOI - PMC - PubMed

Publication types