Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 13;4(1):100659.
doi: 10.1016/j.patter.2022.100659. Epub 2022 Dec 1.

Comprehensively identifying Long Covid articles with human-in-the-loop machine learning

Affiliations

Comprehensively identifying Long Covid articles with human-in-the-loop machine learning

Robert Leaman et al. Patterns (N Y). .

Abstract

A significant percentage of COVID-19 survivors experience ongoing multisystemic symptoms that often affect daily living, a condition known as Long Covid or post-acute-sequelae of SARS-CoV-2 infection. However, identifying scientific articles relevant to Long Covid is challenging since there is no standardized or consensus terminology. We developed an iterative human-in-the-loop machine learning framework combining data programming with active learning into a robust ensemble model, demonstrating higher specificity and considerably higher sensitivity than other methods. Analysis of the Long Covid Collection shows that (1) most Long Covid articles do not refer to Long Covid by any name, (2) when the condition is named, the name used most frequently in the literature is Long Covid, and (3) Long Covid is associated with disorders in a wide variety of body systems. The Long Covid Collection is updated weekly and is searchable online at the LitCovid portal: https://www.ncbi.nlm.nih.gov/research/coronavirus/docsum?filters=e_condition.LongCovid.

Keywords: COVID-19; Long Covid; active learning; data programming; machine learning; natural language processing; post-acute sequelae of SARS-CoV-2 infection; text classification; weak supervision.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Receiver operating characteristic (ROC) curve of our results Shown with the sensitivity/specificity points for our results thresholded at prediction ≥ 0.7 and several alternative methods of collecting articles relevant to Long Covid. The area under the curve (AUC) is 0.8454.
Figure 2
Figure 2
Terms for Long Covid found most frequently by the Long Covid grammar The grammar found a total of 7,378 mentions of Long Covid, representing 763 unique phrases, ignoring capitalization and punctuation.
Figure 3
Figure 3
Terms used to refer to Long Covid over time Articles that mention Long Covid use the name Long Covid at least once. Articles that use an alternative term mention Long Covid at least once via identifiable synonym. Articles that do not mention Long Covid do not contain an identifiable term for Long Covid. All articles listed are relevant to Long Covid.
Figure 4
Figure 4
Dendrogram of the disorders most frequently mentioned in the Long Covid Collection Disorders are filtered if their annotation rate is less than in the general COVID-19 literature (p < 0.01, Fisher exact test). Disorders are clustered according to the number of ancestors in common in the Medical Subject Headings (MeSH) hierarchy.
Figure 5
Figure 5
Most frequent topics in the Long Covid Collection over time (A–D) Topic names are manually generated but reflect the most common phrases in the topic. Study types (A) show increased rigor over time, while interventions (B) show a lack of treatments or tests specific to Long Covid. Long Covid is a complex, multisystemic, condition that causes a wide variety of potentially serious systemic dysfunctions (C) and specific disorders (D).
Figure 6
Figure 6
System overview System diagram illustrates the flow of data for the three primary system processes: model creation, article prediction, and article annotation.

References

    1. Chen Q., Allot A., Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Res. 2021;49:D1534–D1540. doi: 10.1093/nar/gkaa952. - DOI - PMC - PubMed
    1. Chen Q., Allot A., Lu Z. Keep up with the latest coronavirus research. Nature. 2020;579:193. doi: 10.1038/d41586-020-00694-1. - DOI - PubMed
    1. Chen Q., Allot A., Leaman R., Wei C.-H., Aghaarabi E., Guerrerio J., Xu L., Lu Z. LitCovid in 2022: an information resource for the COVID-19 literature. Nucleic Acids Res. 2022;2022:gkac1005. doi: 10.1093/nar/gkac1005. - DOI - PMC - PubMed
    1. Callard F., Perego E. How and why patients made Long Covid. Soc. Sci. Med. 2021;268:113426. doi: 10.1016/j.socscimed.2020.113426. - DOI - PMC - PubMed
    1. Patient Led Research Collaborative Report: what does COVID-19 recovery actually look like? An analysis of the prolonged COVID-19 symptoms survey by patient-led research team. 2020. https://patientresearchcovid19.com/research/report-1/

LinkOut - more resources