Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation

Pradeep Mutalik^{1

2}, Kei-Hoi Cheung^{1

2}, Jennifer Green³, Melissa Buelt-Gebhardt⁴, Karen F Anderson^{1

2}, Vales Jeanpaul¹, Linda McDonald⁵, Michael Wininger^{2

5}, Yuli Li¹, Nallakkandi Rajeevan¹, Peter M Jessel^{3

6}, Hans Moore^{7

8}, Selçuk Adabag^{4

9}, Merritt H Raitt^{3

6}, Mihaela Aslan^{1

2}

Affiliations

¹ VA Cooperative Studies Program Clinical Epidemiology Research Center (CSP-CERC), VA Connecticut Healthcare System, West Haven, CT.
² Yale University School of Medicine, New Haven, CT.
³ VA Portland Health Care System, Portland, OR.
⁴ VA Minneapolis Health Care System, Minneapolis, MN.
⁵ Cooperative Studies Program Coordinating Center, VA Connecticut Healthcare System, West Haven, Connecticut.
⁶ Oregon Health and Sciences University, Portland, OR.
⁷ VA Washington DC Health Care, Washington, DC.
⁸ Georgetown University School of Medicine, Washington, DC.
⁹ University of Minnesota, Minneapolis, MN.

PMID: 40417550
PMCID: PMC12099393

Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation

Pradeep Mutalik et al. AMIA Annu Symp Proc. 2025.

. 2025 May 22:2024:847-856.

eCollection 2024.

Authors

Affiliations

¹ VA Cooperative Studies Program Clinical Epidemiology Research Center (CSP-CERC), VA Connecticut Healthcare System, West Haven, CT.
² Yale University School of Medicine, New Haven, CT.
³ VA Portland Health Care System, Portland, OR.
⁴ VA Minneapolis Health Care System, Minneapolis, MN.
⁵ Cooperative Studies Program Coordinating Center, VA Connecticut Healthcare System, West Haven, Connecticut.
⁶ Oregon Health and Sciences University, Portland, OR.
⁷ VA Washington DC Health Care, Washington, DC.
⁸ Georgetown University School of Medicine, Washington, DC.
⁹ University of Minnesota, Minneapolis, MN.

PMID: 40417550
PMCID: PMC12099393

Abstract

The aim of this work was to create a gold-standard curated cohort of 10,000+ cases from the Veteran Affairs (VA) corporate data warehouse (CDW) for virtual emulation of a randomized clinical trial (CSP#592). The trial had six inclusion/exclusion criteria lacking adequate structured data. We therefore used a hybrid computer/human approach to extract information from clinical notes. Rule-based NLP output was iteratively adjudicated by a panel of trained non-clinician content experts and non-experts using an easy-to-use spreadsheet-based rapid adjudication display. This group-adjudication process iteratively sharpened both the computer algorithm and clinical decision criteria, while simultaneously training the non-experts. The cohort was successfully created with each inclusion/exclusion decision backed by a source document. Less than 0.5% of cases required referral to specialist clinicians. It is likely that such curated datasets capturing specialist reasoning and using a process-supervised approach will acquire greater importance as training tools for future clinical AI applications.

PubMed Disclaimer

Figures

**Figure 1:**
Rapid Adjudication Display: This screenshot displays a part of our final LVEF adjudication for the simplest cases. The first 7 columns (“Ser No” through “Report Button”) were algorithm-generated, and the last 4 columns were filled in by human adjudicators. The adjudicators had to do the following: i) Type “Y” in the “LVEF Correct?” column if the numbers in the “Low Value” and “High Value” columns corresponded to the highlighted EF, LVEF or Ejection fraction values in the “Snippet” column, otherwise type “N”. ii) Type “Y” in the “LVEF qualifies” column if the mean of the true LVEF values in the snippet was <=35%, else type “N”. iii) Type “Y’ if the dates in the “Proc Date” and “Doc Date” column were within 15 days of each other, else type “N” – dates are ‘x’-ed out here for privacy reasons. iv) Optionally type in comments in case of ambiguity, or to provide feedback. The adjudicators could click the “Report Button” to view the entire report if a decision could not be made based on the snippet alone.

**Figure 2:**
Rapid Adjudication Display for BiV exclusion showing the implementation of the complex clinical decision tree. All the columns except the last one, were algorithm-generated. The ‘reason’ column shows the algorithm’s justification for its Y/N decision. The adjudicator was required to flag patients who needed to be rejected.

See this image and copyright information in PMC

References

1. Elizabeth Ford, Carroll John A, Smith Helen E, et al. “Extracting information from the text of electronic medical records to improve case detection: a systematic review”. J Am Med Inf Assoc. 2016;23:1007–1015. doi: 10.1093/jamia/ocv180. - PMC - PubMed
1. Zeng Z., Deng Y., Li X., Naumann T., Luo Y. “Natural Language Processing for EHR-Based Computational Phenotyping,”. IEEE/ACM Trans Comput Biol Bioinf. 2019;16(1):139–153. doi: 10.1109/TCBB.2018.2849968. - PMC - PubMed
1. Hernán Miguel A., Hernández-Diáz Sonia, James M, Robins MD. “Randomized trials analyzed as observational studies”. Ann Intern Med. 2013;159:560–562. doi: 10.7326/0003-4819-159-8-201310150-00709. - PMC - PubMed
1. Miguel A, Hernán MD. “Methods of public health research — strengthening causal inference from observational data”. N Engl J Med. 385:15. doi: 10.1056/NEJMp2113319. - PubMed
1. Feigenbaum Edward, McCorduk Pamela. 1st ed. Reading, MA: Addison-Wesley; 1983. “The fifth generation”. ISBN 978-0-201-11519-2. OCLC 9324691.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation

Affiliations

Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation

Authors

Affiliations

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources