Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 17;13(1):6236.
doi: 10.1038/s41598-023-31850-y.

Circulating proteins to predict COVID-19 severity

Chen-Yang Su #  1   2   3 Sirui Zhou #  1   4 Edgar Gonzalez-Kozlova #  5 Guillaume Butler-Laporte  1   4 Elsa Brunet-Ratnasingham  6 Tomoko Nakanishi  1   7   8   9 Wonseok Jeon  2 David R Morrison  1 Laetitia Laurent  1 Jonathan Afilalo  1   4 Marc Afilalo  10 Danielle Henry  1 Yiheng Chen  1   7 Julia Carrasco-Zanini  11 Yossi Farjoun  1 Maik Pietzner  11   12 Nofar Kimchi  1 Zaman Afrasiabi  1 Nardin Rezk  1 Meriem Bouab  1 Louis Petitjean  1 Charlotte Guzman  1 Xiaoqing Xue  1 Chris Tselios  1 Branka Vulesevic  1 Olumide Adeleye  1 Tala Abdullah  1 Noor Almamlouk  1 Yara Moussa  1 Chantal DeLuca  1 Naomi Duggan  1 Erwin Schurr  13 Nathalie Brassard  6 Madeleine Durand  6 Diane Marie Del Valle  14 Ryan Thompson  15 Mario A Cedillo  16 Eric Schadt  15 Kai Nie  17 Nicole W Simons  15 Konstantinos Mouskas  15 Nicolas Zaki  17 Manishkumar Patel  14 Hui Xie  17 Jocelyn Harris  17 Robert Marvin  17 Esther Cheng  15 Kevin Tuballes  14 Kimberly Argueta  17 Ieisha Scott  17 Mount Sinai COVID-19 Biobank TeamCelia M T Greenwood  1   4 Clare Paterson  18 Michael A Hinterberg  18 Claudia Langenberg  11   12 Vincenzo Forgetta  1 Joelle Pineau  2 Vincent Mooser  7 Thomas Marron  19 Noam D Beckmann  15 Seunghee Kim-Schulze  17 Alexander W Charney  20 Sacha Gnjatic  5   14   17 Daniel E Kaufmann  6   21   22 Miriam Merad  14 J Brent Richards  23   24   25   26
Affiliations

Circulating proteins to predict COVID-19 severity

Chen-Yang Su et al. Sci Rep. .

Abstract

Predicting COVID-19 severity is difficult, and the biological pathways involved are not fully understood. To approach this problem, we measured 4701 circulating human protein abundances in two independent cohorts totaling 986 individuals. We then trained prediction models including protein abundances and clinical risk factors to predict COVID-19 severity in 417 subjects and tested these models in a separate cohort of 569 individuals. For severe COVID-19, a baseline model including age and sex provided an area under the receiver operator curve (AUC) of 65% in the test cohort. Selecting 92 proteins from the 4701 unique protein abundances improved the AUC to 88% in the training cohort, which remained relatively stable in the testing cohort at 86%, suggesting good generalizability. Proteins selected from different COVID-19 severity were enriched for cytokine and cytokine receptors, but more than half of the enriched pathways were not immune-related. Taken together, these findings suggest that circulating proteins measured at early stages of disease progression are reasonably accurate predictors of COVID-19 severity. Further research is needed to understand how to incorporate protein measurement into clinical care.

PubMed Disclaimer

Conflict of interest statement

J.B.R. has served as an advisor to GlaxoSmithKline and Deerfield Capital and is the Founder of 5 Prime Sciences. The Lady Davis Institute has previously received funding from GlaxoSmithKline, Eli Lilly, and Biogen for research programs at Dr. Richards’ laboratory unrelated to this manuscript. C.P. and M.H. are employees of SomaLogic. All other authors do not have any conflict of interest. TN has received speaking fees from Boehringer Ingelheim for talks unrelated to this research. S.G. reports other research funding from Bristol-Myers Squibb, Boehringer-Ingelheim, Celgene, Genentech, Regeneron, and Takeda. S.G. reports other research funding from Bristol-Myers Squibb, Boehringer-Ingelheim, Celgene, Genentech, Regeneron, and Takeda.

Figures

Figure 1
Figure 1
Overall Study design. Schematic of training and testing stages of this study. Severe COVID-19 is defined as death or use of any form of oxygen supplementation. Critical COVID-19 is defined as death or severe respiratory failure (non-invasive ventilation, high flow oxygen therapy, intubation, or extracorporeal membrane oxygenation).
Figure 2
Figure 2
AUC score results. (a) L1 regularized logistic regression training and testing results for severe COVID-19 and (b) critical COVID-19. Blue and red are used to represent the protein model and baseline model, respectively, while solid and dotted lines represent the testing and training performance, respectively. Shaded areas denote the 95% confidence intervals for the training cohort. (c) Two-by-two contingency table results from the test set are shown for predicting severe (top left, bottom left) and critical COVID-19 (top right, bottom right) using the protein model (blue) and baseline model (red). The threshold for predicting cases was determined during training using Youden’s J statistic which selects a threshold that maximizes the sum of the sensitivity and specificity score. PPV positive predictive value, NPV negative predictive value.
Figure 3
Figure 3
Feature importance and correlation of SOMAmers selected in the protein model to predict severe and critical COVID-19. (Left) Coefficient values of the 92 nonzero SOMAmer reagents in the final trained L1 regularized logistic regression protein model fitted to predict severe COVID-19. The original data contained 4984 SOMAmer reagents and 4 other variables: age, sex, sample processing time, and hospital site. 92 SOMAmer reagents remained within the model along with age and sample processing time which are not shown. The model was trained on the entire BQC19 cohort using lambda = 10.0 (log10 lambda = 1.0) which was the best lambda value found from the hyperparameter search. (Bottom) Coefficient values of the 67 nonzero SOMAmer reagents of the final trained L1 regularized logistic regression protein model fitted to predict critical COVID-19. The original data contained 4984 SOMAmer reagents and 4 variables age, sex, sample processing time, and hospital site. 67 SOMAmer reagents remained within the model along with age and sample processing time which are not shown. The model was trained on the entire BQC19 cohort using lambda = 10.0 (log10 lambda = 1.0) which was the best lambda value found from the hyperparameter search. (Right) Spearman’s rank correlations between the 92 proteins associated with severe COVID-19 and the 67 proteins associated with critical COVID-19. These results suggest that while there were 14 overlapping proteins (SFTPD, CXCL10, RAB3A, NAGPA, CDH5, IFNA7, ZNRF3, CBS, CCL7, SETMAR, TNXB, CDHR1, CXCL13, and CBLN1), in general, the protein levels were uncorrelated with one another. Out of 6164 total correlations (92 × 67), 6150 correlations (99.8%) had a Spearman's absolute ρ < 0.8.

Similar articles

Cited by

References

    1. The COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19 by worldwide meta-analysis. medRxiv (2021) doi:10.1101/2021.03.10.21252820.
    1. Zhou S, et al. A Neanderthal OAS1 isoform protects individuals of European ancestry against COVID-19 susceptibility and severity. Nat. Med. 2021;27:659–667. doi: 10.1038/s41591-021-01281-1. - DOI - PubMed
    1. Zhou F, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study. Lancet. 2020;395:1054–1062. doi: 10.1016/S0140-6736(20)30566-3. - DOI - PMC - PubMed
    1. Nakanishi T, et al. Age-dependent impact of the major common genetic risk factor for COVID-19 on severity and mortality. medRxiv. 2021 doi: 10.1101/2021.03.07.21252875. - DOI - PMC - PubMed
    1. Williamson EJ, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584:430–436. doi: 10.1038/s41586-020-2521-4. - DOI - PMC - PubMed

Publication types