Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 20;1(4):480-489.
doi: 10.1016/j.mcpdig.2023.08.002. eCollection 2023 Dec.

Diagnostic Accuracy of Artificial Intelligence in Virtual Primary Care

Affiliations

Diagnostic Accuracy of Artificial Intelligence in Virtual Primary Care

Dan Zeltzer et al. Mayo Clin Proc Digit Health. .

Abstract

Objective: To evaluate the diagnostic accuracy of artificial intelligence (AI)-generated clinical diagnoses.

Patients and methods: A retrospective chart review of 102,059 virtual primary care clinical encounters from October 1, 2022, to January 31, 2023 was conducted. Patients underwent an AI medical interview, after which virtual care providers reviewed the interview summary and AI-provided differential diagnoses, communicated with patients, and finalized diagnoses and treatment plans. Our accuracy measures were agreement between AI diagnoses, virtual care providers, and blind adjudicators. We analyzed AI diagnostic agreement across different diagnoses, presenting symptoms, patient demographic characteristics such as race, and provider levels of experience. We also evaluated model performance improvement with retraining.

Results: Providers selected an AI diagnosis in 84.2% (n = 85,976) of cases and the top-ranked AI diagnosis in 60.9% (n = 62,130) of cases. Agreement rates varied by diagnosis, with greater than or equal to 95% provider agreement with an AI diagnosis for 35 diagnoses (47% of cases, n = 47,679) and greater than or equal to 90% agreement for 57 diagnoses (69% of cases, n = 70,697). The average agreement rate for half of all presenting symptoms was greater than or equal to 90%. Adjusting for case mix, diagnostic accuracy exhibited minimal variation across demographic characteristics. The adjudicators' consensus diagnosis, reached in 58.2% (n = 128) of adjudicated cases was always included in the AI differential diagnosis. Provider experience did not affect agreement, and model retraining increased diagnostic accuracy for retrained conditions from 96.6% to 98.0%.

Conclusion: Our findings show that agreement between AI and provider diagnoses is high in most cases in the setting of this study. The results highlight the potential for AI to enhance primary care disease diagnosis and patient triage, with the capacity to improve over time.

PubMed Disclaimer

Conflict of interest statement

During the development and conduct of this study, Zeltzer, Herzog, Pickman, Steuerman, Ilan Ber, Kugler, and Shaul received consulting fees or were paid employees and held stocks or stock options from K Health Inc. Dr Ebbert’s institution received consulting fees from K Health Inc and EXACT Sciences. Dr Ebbert also received payments, royalties, and travel support from Applied Aerosol Technologies, MedInCell, and EXACT Sciences.

Figures

Figure 1
Figure 1
Accuracy distribution: AI and provider diagnostic agreement for 102,059 cases. The figure summarizes agreement between providers and AI-recommended differential diagnosis. The y-axis shows the rate of agreement, defined as the percentage of cases for which the provider selected a diagnosis from the AI-proposed differential diagnosis list. The x-axis shows the cumulative proportion of cases with diagnoses exceeding each level of agreement. (A) panel shows results grouped by each of the provider-selected diagnoses in the sample (the most common are labeled). (B) panel shows results grouped by each of 215 presenting symptoms in the sample as reported by the patient during automated intake (the most common are labeled). See text for detailed sample and measure definitions. AI, artificial intelligence.
Figure 2
Figure 2
Accuracy distribution: AI and provider diagnostic agreement for 102,059 cases. (A) panel shows the share of cases in which the virtual care provider and each of the adjudicators selected a diagnosis that is within the AI list of differential diagnoses. (B) panel shows the share of cases in which the virtual care provider and adjudicators selected the same diagnosis as the virtual care provider or the AI top-ranked diagnosis. Error bars show 95% CI. Y-axis scales differ between panels. The sample is described in detail in Supplementary Table 3 (available online at https://www.mcpdigitalhealth.org/). AI, artificial intelligence; VP, virtual care provider.

References

    1. Abbasgholizadeh Rahimi S., Légaré F., Sharma G., et al. Application of artificial intelligence in community-based primary health care: systematic scoping review and critical appraisal. J Med Internet Res. 2021;23(9) - PMC - PubMed
    1. Huang S., Ribers M.A., Ullrich H. Assessing the value of data for prediction policies: the case of antibiotic prescribing. Econ Lett. 2022;213
    1. Kueper J.K., Terry A.L., Zwarenstein M., Lizotte D.J. Artificial intelligence and primary care research: a scoping review. Ann Fam Med. 2020;18(3):250–258. - PMC - PubMed
    1. Liaw W.R., Westfall J.M., Williamson T.S., Jabbarpour Y., Bazemore A. Primary care: the actual intelligence required for artificial intelligence to advance health care and improve health. JMIR Med Inform. 2022;10(3) - PMC - PubMed
    1. Wallace W., Chan C., Chidambaram S., et al. The diagnostic and triage accuracy of digital and online symptom checker tools: a systematic review. NPJ Digit Med. 2022;5(1):118. - PMC - PubMed

LinkOut - more resources