Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan;12(1):170-178.
doi: 10.1055/s-0041-1723024. Epub 2021 Mar 10.

Extracting Medical Information from Paper COVID-19 Assessment Forms

Affiliations

Extracting Medical Information from Paper COVID-19 Assessment Forms

Colin G White-Dzuro et al. Appl Clin Inform. 2021 Jan.

Abstract

Objective: This study examines the validity of optical mark recognition, a novel user interface, and crowdsourced data validation to rapidly digitize and extract data from paper COVID-19 assessment forms at a large medical center.

Methods: An optical mark recognition/optical character recognition (OMR/OCR) system was developed to identify fields that were selected on 2,814 paper assessment forms, each with 141 fields which were used to assess potential COVID-19 infections. A novel user interface (UI) displayed mirrored forms showing the scanned assessment forms with OMR results superimposed on the left and an editable web form on the right to improve ease of data validation. Crowdsourced participants validated the results of the OMR system. Overall error rate and time taken to validate were calculated. A subset of forms was validated by multiple participants to calculate agreement between participants.

Results: The OMR/OCR tools correctly extracted data from scanned forms fields with an average accuracy of 70% and median accuracy of 78% when the OMR/OCR results were compared with the results from crowd validation. Scanned forms were crowd-validated at a mean rate of 157 seconds per document and a volume of approximately 108 documents per day. A randomly selected subset of documents was reviewed by multiple participants, producing an interobserver agreement of 97% for documents when narrative-text fields were included and 98% when only Boolean and multiple-choice fields were considered.

Conclusion: Due to the COVID-19 pandemic, it may be challenging for health care workers wearing personal protective equipment to interact with electronic health records. The combination of OMR/OCR technology, a novel UI, and crowdsourcing data-validation processes allowed for the efficient extraction of a large volume of paper medical documents produced during the COVID-19 pandemic.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Fig. 1
Fig. 1
Flow chart of exclusion criteria and subgroup analyses.
Fig. 2
Fig. 2
Crowdsourcing user interface: (left) scanned page of early form with physician circling with optical character recognition results overlaid. (right) HTML form with selected fields marked.
Fig. 3
Fig. 3
Cumulative volume of COVID-19 intake forms over time.
Fig. 4
Fig. 4
Crowd-sourced documents processed by date and worker.

References

    1. Patel P D, Cobb J, Wright D. Rapid development of telehealth capabilities within pediatric patient portal infrastructure for COVID-19 care: barriers, solutions, results. J Am Med Inform Assoc. 2020;27(07):1116–1120. - PMC - PubMed
    1. Kim S I, Lee J Y. Walk-through screening center for COVID-19: an accessible and efficient screening system in a pandemic situation. J Korean Med Sci. 2020;35(15):e154. - PMC - PubMed
    1. Islam M S, Rahman K M, Sun Y. Current knowledge of COVID-19 and infection prevention and control strategies in healthcare settings: a global analysis. Infect Control Hosp Epidemiol. 2020;41(10):1196–1206. - PMC - PubMed
    1. Ferioli M, Cisternino C, Leo V, Pisani L, Palange P, Nava S. Protecting healthcare workers from SARS-CoV-2 infection: practical indications. Eur Respir Rev. 2020;29(155):200068. - PMC - PubMed
    1. Downs S M, Carroll A E, Anand V, Biondich P G. Human and system errors, using adaptive turnaround documents to capture data in a busy practice. AMIA Annu Symp Proc. 2005;2005:211–215. - PMC - PubMed