Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb:6:e2100105.
doi: 10.1200/CCI.21.00105.

A Scalable Quality Assurance Process for Curating Oncology Electronic Health Records: The Project GENIE Biopharma Collaborative Approach

Affiliations

A Scalable Quality Assurance Process for Curating Oncology Electronic Health Records: The Project GENIE Biopharma Collaborative Approach

Jessica A Lavery et al. JCO Clin Cancer Inform. 2022 Feb.

Abstract

Purpose: The American Association for Cancer Research Project Genomics Evidence Neoplasia Information Exchange Biopharma Collaborative is a multi-institution effort to build a pan-cancer repository of genomic and clinical data curated from the electronic health record. For the research community to be confident that data extracted from electronic health record text are reliable, transparency of the approach used to ensure data quality is essential.

Materials and methods: Four institutions participating in AACR's Project GENIE created an observational cohort of patients with cancer for whom tumor molecular profiling data, therapeutic exposures, and treatment outcomes are available and will be shared publicly with the research community. A comprehensive approach to quality assurance included assessments of (1) feasibility of the curation model through pressure test cases; (2) accuracy through programmatic queries and comparison with source data; and (3) reproducibility via double curation and code review.

Results: Assessments of feasibility resulted in critical modifications to the curation directives. Queries and comparison with source data identified errors that were rectified via data correction and curator retraining. Assessment of intercurator reliability indicated a reliable curation model.

Conclusion: The transparent quality assurance processes for the GENIE BPC data ensure that the data can be used for analyses that support clinical decision making and advances in precision oncology.

PubMed Disclaimer

Conflict of interest statement

Jessica A. LaveryResearch Funding: AACR Project GENIE (Inst) Eva M. LepistoEmployment: Kiniksa Pharmaceuticals (I)Stock and Other Ownership Interests: Kiniksa Pharmaceuticals (I) Samantha BrownResearch Funding: AACR Michele LeNoue-NewtonEmployment: DaVita (I)Stock and Other Ownership Interests: DaVita (I)Research Funding: GE Healthcare (Inst) Shawn SweeneyEmployment: ConcertAI (I)Research Funding: Amgen, AstraZeneca, Bristol Myers Squibb, Genentech, Bayer, Boehringer Ingelheim, Janssen, Merck, Novartis, Analysis Group Ben Ho ParkLeadership: LoxoStock and Other Ownership Interests: Loxo, CelcuityConsulting or Advisory Role: Horizon Discovery, Loxo, Casdin Capital, Jackson Laboratory for Genomic Medicine, Celcuity, Sermonix Pharmaceuticals, Hologic, EQRxResearch Funding: AbbVie, Pfizer, GE Healthcare, LillyPatents, Royalties, Other Intellectual Property: Royalties paid through inventions at Johns Hopkins University by Horizon Discovery LtdTravel, Accommodations, Expenses: Lilly, LoxoUncompensated Relationships: Tempus Jeremy L. WarnerThis author is an Associate Editor for JCO Clinical Cancer Informatics. Journal policy recused the author from having any role in the peer review of this manuscript.Stock and Other Ownership Interests: HemOnc.orgConsulting or Advisory Role: Westat, IBM, Roche, Flatiron HealthTravel, Accommodations, Expenses: IBM Philippe L. BedardConsulting or Advisory Role: BMS (Inst), Pfizer (Inst), Seattle Genetics, Lilly, Amgen, MerckResearch Funding: Bristol Myers Squibb (Inst), Sanofi (Inst), AstraZeneca (Inst), Genentech/Roche (Inst), Servier (Inst), GlaxoSmithKline (Inst), Novartis (Inst), PTC Therapeutics (Inst), Nektar (Inst), Merck (Inst), Seattle Genetics (Inst), Mersana (Inst), Immunomedics (Inst), Lilly (Inst), Amgen (Inst), Bicara Therapeutics (Inst) Gregory RielyResearch Funding: Novartis (Inst), Roche/Genentech (Inst), GlaxoSmithKline (Inst), Pfizer (Inst), Infinity Pharmaceuticals (Inst), Mirati Therapeutics (Inst), Merck (Inst), Takeda (Inst)Patents, Royalties, Other Intellectual Property: Patent application submitted covering pulsatile use of erlotinib to treat or prevent brain metastases (Inst)Travel, Accommodations, Expenses: Merck Sharp & DohmeOther Relationship: Pfizer, Roche/Genentech, Takeda Deborah SchragStock and Other Ownership Interests: Merck (I)Honoraria: PfizerConsulting or Advisory Role: JAMA-Journal of the American Medical AssociationResearch Funding: AACR (Inst), GRAIL (Inst)Patents, Royalties, Other Intellectual Property: PRISSMM model is trademarked and curation tools are available to academic medical centers and government under creative commons licenseOther Relationship: JAMA-Journal of the American Medical Association Katherine S. PanageasStock and Other Ownership Interests: AstraZeneca, Pfizer, Sunesis PharmaceuticalsNo other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
Flow diagram for QA processes and data release. QA, quality assurance.
FIG A1.
FIG A1.
Example of (A) incorrectly curated data and (B) corresponding REDCap data quality rule alert. Example REDCap data quality rule for incorrectly curated data on the basis of simulated data for an example patient. REDCap, Research Electronic Data Capture.
FIG A2.
FIG A2.
Snapshot of quality assurance application summarizing source data verification findings. The screenshot indicates a dropdown on the left-hand side where the user can specify a particular institution, cancer diagnosis, and summary level for the source data verification findings. The first table on the left shows the number of forms that were reviewed per patient. The table to the right has tabs for each PRISSMM module and shows the number of forms that were compared with the electronic health record and the extent and type of major and minor issues.
FIG A3.
FIG A3.
Kaplan-Meier estimates of OS stratified by curator for (A) NSCLC, (B) CRC, and (C) BrCa. Kaplan-Meier estimates of OS stratified by the primary curator and secondary curator among records that underwent double curation for the purposes of assessing reproducibility. Note that curves are not intended to describe estimates of time to event end points, but to demonstrate the assessment of reproducibility of curation of time to event end points across curators. BrCa, breast cancer; CRC, colorectal cancer; NSCLC, non–small-cell lung cancer; OS, overall survival.
FIG A4.
FIG A4.
Kaplan-Meier estimates of PFS-I and PFS-M stratified by curator for NSCLC and CRC: (A) PFS-I NSCLC, (B) PFS-I CRC, (C) PFS-M NSCLC, and (D) PFS-M CRC. Survival curves are stratified by the primary curator and secondary curator among records that underwent double curation for the purposes of assessing reproducibility. Note that curves are not intended to describe estimates of time to event end points, but to demonstrate the assessment of reproducibility of curation of time to event end points across curators. CRC, colorectal cancer; NSCLC, non–small-cell lung cancer; PFS-I, progression-free survival according to imaging; PFS-M, progression-free survival according to medical oncologist.

References

    1. Dolley S: Big data’s role in precision public health. Front Public Health 6:68, 2018 - PMC - PubMed
    1. Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, et al. : The Cancer Genome Atlas pan-cancer analysis project. Nat Genet 45:1113-1120, 2013 - PMC - PubMed
    1. Kim E, Rubinstein SM, Nead KT, et al. : The evolving use of electronic health records (EHR) for research. Semin Radiat Oncol 29:354-361, 2019 - PubMed
    1. AACR Project GENIE : Powering precision medicine through an international consortium. Cancer Discov 7:818-831, 2017 - PMC - PubMed
    1. Litchfield K, Turajlic S, Swanton C: The GENIE is out of the bottle: Landmark cancer genomics dataset released. Cancer Discov 7:796-798, 2017 - PubMed

Publication types