Development and Validation of a High-Quality Composite Real-World Mortality Endpoint
- PMID: 29756355
- PMCID: PMC6232402
- DOI: 10.1111/1475-6773.12872
Development and Validation of a High-Quality Composite Real-World Mortality Endpoint
Abstract
Objective: To create a high-quality electronic health record (EHR)-derived mortality dataset for retrospective and prospective real-world evidence generation.
Data sources/study setting: Oncology EHR data, supplemented with external commercial and US Social Security Death Index data, benchmarked to the National Death Index (NDI).
Study design: We developed a recent, linkable, high-quality mortality variable amalgamated from multiple data sources to supplement EHR data, benchmarked against the highest completeness U.S. mortality data, the NDI. Data quality of the mortality variable version 2.0 is reported here.
Principal findings: For advanced non-small-cell lung cancer, sensitivity of mortality information improved from 66 percent in EHR structured data to 91 percent in the composite dataset, with high date agreement compared to the NDI. For advanced melanoma, metastatic colorectal cancer, and metastatic breast cancer, sensitivity of the final variable was 85 to 88 percent. Kaplan-Meier survival analyses showed that improving mortality data completeness minimized overestimation of survival relative to NDI-based estimates.
Conclusions: For EHR-derived data to yield reliable real-world evidence, it needs to be of known and sufficiently high quality. Considering the impact of mortality data completeness on survival endpoints, we highlight the importance of data quality assessment and advocate benchmarking to the NDI.
Keywords: Mortality data; data quality; electronic health records; external validation; oncology.
© 2018 The Authors. Health Services Research published by Wiley Periodicals, Inc. on behalf of Health Research and Educational Trust.
Figures

Notes.
NDI data were used as the benchmark in this study and were assumed to have 100 percent completeness. Patients were excluded from this analysis if their death date fell before the advanced diagnosis date.

Notes. Data were restricted to practices with ≥100 patients. Boxplots show the median sensitivity, with lower and upper hinges of the boxes corresponding to the 25 and 75 percent interquartile range (
IQR ); lower and upper whiskers indicate sensitivity within 1.5IQR of the lower and upper quantiles, respectively; and points outside of the whiskers show the rest of the data.
References
-
- Blackstone, E. H. 2012. “Demise of a Vital Resource.” Journal of Thoracic and Cardiovascular Surgery 143 (1): 37–8. - PubMed
-
- Calle, E. E. , and Terrell D. D.. 1993. “Utility of the National Death Index for Ascertainment of Mortality among Cancer Prevention Study II Participants.” American Journal of Epidemiology 137: 235–41. - PubMed
-
- Cowper, D. C. , Kubal J. D., Maynard C., and Hynes D. M.. 2002. “A Primer and Comparative Review of Major U.S. Mortality Databases. Ann.” Epidemiology 12 (7): 462–8. - PubMed
-
- da Graca, B. , Filardo G., and Nicewander D.. 2013. “Consequences for Healthcare Quality and Research of the Exclusion of Records from the Death Master File.” Circulation: Cardiovascular Quality and Outcomes 6: 124–8. - PubMed
-
- Khozin, S. , Blumenthal G. M., and Pazdur R.. 2017. “Real‐world Data for Clinical Evidence Generation in Oncology.” Journal of the National Cancer Institute 109 (11): djx187. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources