Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation
- PMID: 40417550
- PMCID: PMC12099393
Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation
Abstract
The aim of this work was to create a gold-standard curated cohort of 10,000+ cases from the Veteran Affairs (VA) corporate data warehouse (CDW) for virtual emulation of a randomized clinical trial (CSP#592). The trial had six inclusion/exclusion criteria lacking adequate structured data. We therefore used a hybrid computer/human approach to extract information from clinical notes. Rule-based NLP output was iteratively adjudicated by a panel of trained non-clinician content experts and non-experts using an easy-to-use spreadsheet-based rapid adjudication display. This group-adjudication process iteratively sharpened both the computer algorithm and clinical decision criteria, while simultaneously training the non-experts. The cohort was successfully created with each inclusion/exclusion decision backed by a source document. Less than 0.5% of cases required referral to specialist clinicians. It is likely that such curated datasets capturing specialist reasoning and using a process-supervised approach will acquire greater importance as training tools for future clinical AI applications.
©2024 AMIA - All rights reserved.
Figures


References
-
- Miguel A, Hernán MD. “Methods of public health research — strengthening causal inference from observational data”. N Engl J Med. 385:15. doi: 10.1056/NEJMp2113319. - PubMed
-
- Feigenbaum Edward, McCorduk Pamela. 1st ed. Reading, MA: Addison-Wesley; 1983. “The fifth generation”. ISBN 978-0-201-11519-2. OCLC 9324691.
MeSH terms
LinkOut - more resources
Full Text Sources