Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 22:2024:847-856.
eCollection 2024.

Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation

Affiliations

Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation

Pradeep Mutalik et al. AMIA Annu Symp Proc. .

Abstract

The aim of this work was to create a gold-standard curated cohort of 10,000+ cases from the Veteran Affairs (VA) corporate data warehouse (CDW) for virtual emulation of a randomized clinical trial (CSP#592). The trial had six inclusion/exclusion criteria lacking adequate structured data. We therefore used a hybrid computer/human approach to extract information from clinical notes. Rule-based NLP output was iteratively adjudicated by a panel of trained non-clinician content experts and non-experts using an easy-to-use spreadsheet-based rapid adjudication display. This group-adjudication process iteratively sharpened both the computer algorithm and clinical decision criteria, while simultaneously training the non-experts. The cohort was successfully created with each inclusion/exclusion decision backed by a source document. Less than 0.5% of cases required referral to specialist clinicians. It is likely that such curated datasets capturing specialist reasoning and using a process-supervised approach will acquire greater importance as training tools for future clinical AI applications.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Rapid Adjudication Display: This screenshot displays a part of our final LVEF adjudication for the simplest cases. The first 7 columns (“Ser No” through “Report Button”) were algorithm-generated, and the last 4 columns were filled in by human adjudicators. The adjudicators had to do the following: i) Type “Y” in the “LVEF Correct?” column if the numbers in the “Low Value” and “High Value” columns corresponded to the highlighted EF, LVEF or Ejection fraction values in the “Snippet” column, otherwise type “N”. ii) Type “Y” in the “LVEF qualifies” column if the mean of the true LVEF values in the snippet was <=35%, else type “N”. iii) Type “Y’ if the dates in the “Proc Date” and “Doc Date” column were within 15 days of each other, else type “N” – dates are ‘x’-ed out here for privacy reasons. iv) Optionally type in comments in case of ambiguity, or to provide feedback. The adjudicators could click the “Report Button” to view the entire report if a decision could not be made based on the snippet alone.
Figure 2:
Figure 2:
Rapid Adjudication Display for BiV exclusion showing the implementation of the complex clinical decision tree. All the columns except the last one, were algorithm-generated. The ‘reason’ column shows the algorithm’s justification for its Y/N decision. The adjudicator was required to flag patients who needed to be rejected.

References

    1. Elizabeth Ford, Carroll John A, Smith Helen E, et al. “Extracting information from the text of electronic medical records to improve case detection: a systematic review”. J Am Med Inf Assoc. 2016;23:1007–1015. doi: 10.1093/jamia/ocv180. - PMC - PubMed
    1. Zeng Z., Deng Y., Li X., Naumann T., Luo Y. “Natural Language Processing for EHR-Based Computational Phenotyping,”. IEEE/ACM Trans Comput Biol Bioinf. 2019;16(1):139–153. doi: 10.1109/TCBB.2018.2849968. - PMC - PubMed
    1. Hernán Miguel A., Hernández-Diáz Sonia, James M, Robins MD. “Randomized trials analyzed as observational studies”. Ann Intern Med. 2013;159:560–562. doi: 10.7326/0003-4819-159-8-201310150-00709. - PMC - PubMed
    1. Miguel A, Hernán MD. “Methods of public health research — strengthening causal inference from observational data”. N Engl J Med. 385:15. doi: 10.1056/NEJMp2113319. - PubMed
    1. Feigenbaum Edward, McCorduk Pamela. 1st ed. Reading, MA: Addison-Wesley; 1983. “The fifth generation”. ISBN 978-0-201-11519-2. OCLC 9324691.

LinkOut - more resources