Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 21;29(1):26.
doi: 10.1186/s10020-023-00610-z.

Organ and cell-specific biomarkers of Long-COVID identified with targeted proteomics and machine learning

Affiliations

Organ and cell-specific biomarkers of Long-COVID identified with targeted proteomics and machine learning

Maitray A Patel et al. Mol Med. .

Abstract

Background: Survivors of acute COVID-19 often suffer prolonged, diffuse symptoms post-infection, referred to as "Long-COVID". A lack of Long-COVID biomarkers and pathophysiological mechanisms limits effective diagnosis, treatment and disease surveillance. We performed targeted proteomics and machine learning analyses to identify novel blood biomarkers of Long-COVID.

Methods: A case-control study comparing the expression of 2925 unique blood proteins in Long-COVID outpatients versus COVID-19 inpatients and healthy control subjects. Targeted proteomics was accomplished with proximity extension assays, and machine learning was used to identify the most important proteins for identifying Long-COVID patients. Organ system and cell type expression patterns were identified with Natural Language Processing (NLP) of the UniProt Knowledgebase.

Results: Machine learning analysis identified 119 relevant proteins for differentiating Long-COVID outpatients (Bonferonni corrected P < 0.01). Protein combinations were narrowed down to two optimal models, with nine and five proteins each, and with both having excellent sensitivity and specificity for Long-COVID status (AUC = 1.00, F1 = 1.00). NLP expression analysis highlighted the diffuse organ system involvement in Long-COVID, as well as the involved cell types, including leukocytes and platelets, as key components associated with Long-COVID.

Conclusions: Proteomic analysis of plasma from Long-COVID patients identified 119 highly relevant proteins and two optimal models with nine and five proteins, respectively. The identified proteins reflected widespread organ and cell type expression. Optimal protein models, as well as individual proteins, hold the potential for accurate diagnosis of Long-COVID and targeted therapeutics.

Keywords: Biomarker; COVID-19; Cell types; Long-COVID; Machine learning; Organ system; Targeted proteomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Identification of important blood proteins in Long-COVID outpatients. A Subjects plotted in two dimensions, following t-SNE dimensionality reduction of all 119 important proteins determined by Boruta feature reduction, shows cluster separation of Long-COVID outpatients from acutely ill COVID-19 ward/ICU inpatients and healthy control subjects. B Subjects plotted in two dimensions, following t-SNE dimensionality reduction of top 9 important proteins determined by Recursive Feature Selection with 50% threshold, shows separation cluster of Long-COVID outpatients from acutely ill COVID-19 ward/ICU inpatients and healthy control subjects C Subjects plotted in two-dimensions, following t-SNE dimensionality reduction of top 5 important proteins determined by Recursive Feature Selection with 80% threshold, shows separation cluster of Long-COVID outpatients from acutely ill COVID-19 ward/ICU inpatients and healthy control subjects with some mixing D A heatmap demonstrated the pairwise cosine similarity between cohort’s protein profiles for top 9 proteins. Greater cosine similarity measure between subjects indicates similar protein profiles while smaller measure indicates large differences between profiles (distance was pseudocolored on the bar scale). The protein profile of Long-COVID outpatients is distinctively different from all other cohorts. E A heatmap demonstrated the pairwise cosine similarity between cohort’s protein profiles with respect to top 5 proteins. Greater cosine similarity measure between subjects indicates similar protein profiles while smaller measure indicates large differences between profiles (distance was pseudocolored on the bar scale). The protein profile of Long-COVID outpatients is distinctively different from all other cohorts
Fig. 2
Fig. 2
Protein Expression of Optimal 9 Proteins in Long-COVID. Blue points are Long-COVID outpatient measurements; green filled area represents 5th percentile to 95th percentile protein expression range of healthy control subjects. A–D, F–I Plots demonstrating elevated protein expression in Long-COVID compared to healthy controls versus time after acute infection for CXCL5, AP3S2, MAX, PDLIM7, EDAR, LTA4h, CRACR2A, CXCL3. E A plot demonstrating decreased FRZB expression in Long-COVID compared to healthy controls versus time after acute infection
Fig. 3
Fig. 3
Frequency of protein expression in major organs/body systems and cell type. A A bar plot demonstrating the percentage of proteins that are expressed in specific major organs and body systems determined by Natural Language Processing. There were total of 60 proteins out of the 119 proteins (50%) with UniProt organ system expression information. The organ system classification combines NLP identified organs, tissue, multi-level tissue and anatomical system entities. B A bar plot demonstrating the percentage of proteins that are expressed in specific cell types determined by Natural Language Processing. There were total of 44 proteins out of the 119 proteins (37%) with UniProt cell type expression information. Only those cell types with percentages greater than 5% are shown for visualization clarity

References

    1. Abassi Z, Skorecki K, Hamo-Giladi DB, Kruzel-Davila E, Heyman SN. Kinins and chymase: the forgotten components of the renin-angiotensin system and their implications in COVID-19 disease. Am J Physiol Lung Cell Mol Physiol. 2021;320:L422–l429. doi: 10.1152/ajplung.00548.2020. - DOI - PMC - PubMed
    1. Ackermann M, et al. Pulmonary vascular endothelialitis, thrombosis, and angiogenesis in covid-19. N Engl J Med. 2020;383(2):120–128. doi: 10.1056/NEJMoa2015432. - DOI - PMC - PubMed
    1. Assarsson E, et al. Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability. PLoS ONE. 2014;9:e95192. doi: 10.1371/journal.pone.0095192. - DOI - PMC - PubMed
    1. Bateman A, et al. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. - DOI - PMC - PubMed
    1. Bhatraju PK, et al. Covid-19 in critically ill patients in the seattle region - case series. N Engl J Med. 2020;382:2012–2022. doi: 10.1056/NEJMoa2004500. - DOI - PMC - PubMed

Publication types