Observational Study

. 2020 Sep;108(3):644-652.

doi: 10.1002/cpt.1966. Epub 2020 Jul 18.

An Electronic Health Record Text Mining Tool to Collect Real-World Drug Treatment Outcomes: A Validation Study in Patients With Metastatic Renal Cell Carcinoma

Sylvia A van Laar¹, Kim B Gombert-Handoko¹, Henk-Jan Guchelaar¹, Juliëtte Zwaveling¹

Affiliations

PMID: 32575147
PMCID: PMC7484987
DOI: 10.1002/cpt.1966

Observational Study

An Electronic Health Record Text Mining Tool to Collect Real-World Drug Treatment Outcomes: A Validation Study in Patients With Metastatic Renal Cell Carcinoma

Sylvia A van Laar et al. Clin Pharmacol Ther. 2020 Sep.

. 2020 Sep;108(3):644-652.

doi: 10.1002/cpt.1966. Epub 2020 Jul 18.

Authors

Sylvia A van Laar¹, Kim B Gombert-Handoko¹, Henk-Jan Guchelaar¹, Juliëtte Zwaveling¹

Affiliation

¹ Department of Clinical Pharmacy and Toxicology, Leiden University Medical Center, Leiden, The Netherlands.

PMID: 32575147
PMCID: PMC7484987
DOI: 10.1002/cpt.1966

Abstract

Real-world evidence can close the inferential gap between marketing authorization studies and clinical practice. However, the current standard for real-world data extraction from electronic health records (EHRs) for treatment evaluation is manual review (MR), which is time-consuming and laborious. Clinical Data Collector (CDC) is a novel natural language processing and text mining software tool for both structured and unstructured EHR data and only shows relevant EHR sections improving efficiency. We investigated CDC as a real-world data (RWD) collection method, through application of CDC queries for patient inclusion and information extraction on a cohort of patients with metastatic renal cell carcinoma (RCC) receiving systemic drug treatment. Baseline patient characteristics, disease characteristics, and treatment outcomes were extracted and these were compared with MR for validation. One hundred patients receiving 175 treatments were included using CDC, which corresponded to 99% with MR. Calculated median overall survival was 21.7 months (95% confidence interval (CI) 18.7-24.8) vs. 21.7 months (95% CI 18.6-24.8) and progression-free survival 8.9 months (95% CI 5.4-12.4) vs. 7.6 months (95% CI 5.7-9.4) for CDC vs. MR, respectively. Highest F1-score was found for cancer-related variables (88.1-100), followed by comorbidities (71.5-90.4) and adverse drug events (53.3-74.5), with most diverse scores on international metastatic RCC database criteria (51.4-100). Mean data collection time was 12 minutes (CDC) vs. 86 minutes (MR). In conclusion, CDC is a promising tool for retrieving RWD from EHRs because the correct patient population can be identified as well as relevant outcome data, such as overall survival and progression-free survival.

PubMed Disclaimer

Conflict of interest statement

The authors declared no competing interests for this work.

Figures

**Figure 1**
Architecture of the Clinical Data Collector on‐premises isolation platform. (a) Copy of electronic health record (EHR) data transferred, stored, and cleaned in a local MSSQL Server relational database. (b) Natural language processing (NLP) transformation application programming interface (API) pseudonymizes data. (c) Search engine is compatible with the structure used in data warehouse. (d) Client to build queries by a user. Results window in CDC shows only parts of EHR documents containing defined criteria by user. (e) Text mining of (combinations of) keywords is supported by an online thesaurus. [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 2**
Data extraction approach from structured and unstructured data using Clinical Data Collector.

**Figure 3**
Flowchart of patient inclusion of manual inclusion and inclusion with Clinical Data Collector (CDC). The two approaches yielded patient samples that were very similar and therefore use of CDC is satisfactory for the intended purpose. DTC, Diagnosis Treatment Combination.

**Figure 4**
Kaplan–Meier survival plots determined from manual review and Clinical Data Collector data for cabozantinib, everolimus, nivolumab, pazopanib and sunitinib combined. (a) Overall survival, (b) Progression‐free survival. CI, confidence interval.

**Figure 5**
Bland–Altman plots of continuous variables collected using CDC vs. manual with mean difference and 95% confidence interval. (a) Length: −0.21 cm (−4.2 to 4.8), (b) Weight: 1.1 kg (−6.6 to 8.7), (c) Age: −0.17 years (−0.27 to 0.24), (d) Estimated glomerular filtration rate (eGRF) 0.22 ml/min/1.73m2 (−5.3 to 5.8), (e) Alanine transaminase (ALAT) 0.19 U/L (−3.2 to 3.6), (f) Aspartate aminotransferase (ASAT) 0.24 (−4.0 to 4.5).

See this image and copyright information in PMC

References

1. Franklin, J.M. & Schneeweiss, S. When and how can real world data analyses substitute for randomized controlled trials? Clin. Pharmacol. Ther. 102, 924–933 (2017). - PubMed
1. Bothwell, L.E. & Podolsky, S.H. The emergence of the randomized controlled trial. N. Engl. J. Med. 375, 501–504 (2016). - PubMed
1. Verweij, J. et al Innovation in oncology clinical trial design. Cancer Treat. Rev. 74, 15–20 (2019). - PubMed
1. Chen, E.Y. , Raghunathan, V. & Prasad, V. An overview of cancer drugs approved by the US Food and Drug Administration based on the surrogate end point of response rate. JAMA Intern. Med. 179, 915–921 (2019). - PMC - PubMed
1. Lakdawalla, D.N. et al Predicting real‐world effectiveness of cancer therapies using overall survival and progression‐free survival from clinical trials: empirical evidence for the ASCO value framework. Value Health 20, 866–875 (2017). - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An Electronic Health Record Text Mining Tool to Collect Real-World Drug Treatment Outcomes: A Validation Study in Patients With Metastatic Renal Cell Carcinoma

Affiliation

An Electronic Health Record Text Mining Tool to Collect Real-World Drug Treatment Outcomes: A Validation Study in Patients With Metastatic Renal Cell Carcinoma

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous