Heterogeneity of diagnosis and documentation of post-COVID conditions in primary care: A machine learning analysis
- PMID: 40378166
- PMCID: PMC12083802
- DOI: 10.1371/journal.pone.0324017
Heterogeneity of diagnosis and documentation of post-COVID conditions in primary care: A machine learning analysis
Abstract
Background: Post-COVID conditions (PCC) have proven difficult to diagnose. In this retrospective observational study, we aimed to characterize the level of variation in PCC diagnoses observed across clinicians from a number of methodological angles and to determine whether natural language classifiers trained on clinical notes can reconcile differences in diagnostic definitions.
Methods: We used data from 519 primary care clinics around the United States who were in the American Family Cohort registry between October 1, 2021 (when the ICD-10 code for PCC was activated) and November 1, 2023. There were 6,116 patients with a diagnostic code for PCC (U09.9), and 5,020 with diagnostic codes for both PCC and COVID-19. We explored these data using 4 different outcomes: 1) Time between COVID-19 and PCC diagnostic codes; 2) Count of patients with PCC diagnostic codes per clinician; 3) Patient-specific probability of PCC diagnostic code based on patient and clinician characteristics; and 4) Performance of a natural language classifier trained on notes from 5,000 patients annotated by two physicians to indicate probable PCC.
Results: Of patients with diagnostic codes for PCC and COVID-19, 61.3% were diagnosed with PCC less than 12 weeks after initial recorded COVID-19. Clinicians in the top 1% of diagnostic propensity accounted for more than a third of all PCC diagnoses (35.8%). Comparing LASSO logistic regressions predicting documentation of PCC diagnosis, a log-likelihood test showed significantly better fit when clinician and practice site indicators were included (p < 0.0001). Inter-rater agreement between physician annotators on PCC diagnosis was moderate (Cohen's kappa: 0.60), and performance of the natural language classifiers was marginal (best AUC: 0.724, 95% credible interval: 0.555-0.878).
Conclusion: We found evidence of substantial disagreement between clinicians on diagnostic criteria for PCC. The variation in diagnostic rates across clinicians points to the possibilities of under- and over-diagnosis for patients.
Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures



Similar articles
-
Investigating use of diagnostic codes for post-COVID- 19 condition in Ontario health administrative data.BMC Health Serv Res. 2025 May 14;25(1):694. doi: 10.1186/s12913-025-12751-4. BMC Health Serv Res. 2025. PMID: 40369553 Free PMC article.
-
Rates of ICD-10 Code U09.9 Documentation and Clinical Characteristics of VA Patients With Post-COVID-19 Condition.JAMA Netw Open. 2023 Dec 1;6(12):e2346783. doi: 10.1001/jamanetworkopen.2023.46783. JAMA Netw Open. 2023. PMID: 38064215 Free PMC article.
-
Using a data-driven approach to define post-COVID conditions in US electronic health record data.PLoS One. 2024 Apr 5;19(4):e0300570. doi: 10.1371/journal.pone.0300570. eCollection 2024. PLoS One. 2024. PMID: 38578822 Free PMC article.
-
Definition and measurement of post-COVID-19 conditions in real-world practice: a global systematic literature review.BMJ Open. 2024 Jan 17;14(1):e077886. doi: 10.1136/bmjopen-2023-077886. BMJ Open. 2024. PMID: 38233057 Free PMC article.
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2021 Feb 23;2(2):CD013665. doi: 10.1002/14651858.CD013665.pub2. Cochrane Database Syst Rev. 2021. Update in: Cochrane Database Syst Rev. 2022 May 20;5:CD013665. doi: 10.1002/14651858.CD013665.pub3. PMID: 33620086 Free PMC article. Updated.
References
-
- Department of Health and Human Services, Office of the Assistant Secretary for Health. National research action plan on long COVID. 200 Independence Ave SW, Washington, DC 20201; 2022. Aug.
-
- Ioannou GN, Baraff A, Fox A, Shahoumian T, Hickok A, O’Hare AM, et al.. Rates and factors associated with documentation of diagnostic codes for long COVID in the national veterans affairs health care system. JAMA Netw Open. 2022;5(7):e2224359. doi: 10.1001/jamanetworkopen.2022.24359 - DOI - PMC - PubMed
-
- Zhang HG, Honerlaw JP, Maripuri M, Samayamuthu MJ, Beaulieu-Jones BR, Baig HS, et al.. Characterizing the use of the ICD-10 code for long COVID in 3 US healthcare systems. medRxiv; 2023. p. 2023.02.12.23285701. doi: 10.1101/2023.02.12.23285701 - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical