Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan;197(1):177-187.
doi: 10.1007/s10549-022-06764-4. Epub 2022 Nov 5.

Implications of missing data on reported breast cancer mortality

Affiliations

Implications of missing data on reported breast cancer mortality

Jennifer K Plichta et al. Breast Cancer Res Treat. 2023 Jan.

Abstract

Background: National cancer registries are valuable tools to analyze patterns of care and clinical outcomes; yet, missing data may impact the accuracy and generalizability of these data. We sought to evaluate the association between missing data and overall survival (OS).

Methods: Using the NCDB (National Cancer Database) and SEER (Surveillance, Epidemiology, End Results Program), we assessed data missingness among patients diagnosed with invasive breast cancer from 2010 to 2014. Key variables included demographic (age, race, ethnicity, insurance, education, income), tumor (grade, ER, PR, HER2, TNM stages), and treatment (surgery in both databases; chemotherapy and radiation in NCDB). OS was compared between those with and without missing data using Cox proportional hazards models.

Results: Overall, 775,996 patients in the NCDB and 263,016 in SEER were identified; missing at least 1 key variable occurred for 29% and 13%, respectively. Of those, the overwhelming majority (NCDB 80%; SEER 88%) were missing tumor variables. When compared to patients with complete data, missingness was associated with a greater risk of death: NCDB HR 1.23 (99% CI 1.21-1.25) and SEER HR 2.11 (99% CI 2.05-2.18). Patients with complete tumor data had higher unadjusted OS estimates than that of the entire sample: NCDB 82.7% vs 81.8% and SEER 83.5% vs 81.7% for 5-year OS.

Conclusions: Missingness of select variables is not uncommon within large national cancer registries and is associated with a worse OS. Exclusion of patients with missing variables may introduce unintended bias into analyses and result in findings that underestimate breast cancer mortality.

Keywords: Breast cancer; Cancer registry; Data missingness; Databases; Outcomes; Survival.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Percent of breast cancer patients (diagnosed 2010–2014) missing (A) 0 to 8+ of the 14 (SEER) or 16 (NCDB) key variables; or (B) data by variable category. Key variables included: demographic (age, race, ethnicity, insurance, education, income), tumor (grade, ER, PR, HER2, TNM stages), and treatment (surgery in both databases; chemotherapy and radiation in NCDB). NCDB: National Cancer Database. SEER: Surveillance, Epidemiology, End Results Program.
Figure 1.
Figure 1.
Percent of breast cancer patients (diagnosed 2010–2014) missing (A) 0 to 8+ of the 14 (SEER) or 16 (NCDB) key variables; or (B) data by variable category. Key variables included: demographic (age, race, ethnicity, insurance, education, income), tumor (grade, ER, PR, HER2, TNM stages), and treatment (surgery in both databases; chemotherapy and radiation in NCDB). NCDB: National Cancer Database. SEER: Surveillance, Epidemiology, End Results Program.
Figure 2.
Figure 2.
Percent of breast cancer patients with missing data by diagnosis year (2010–2014). NCDB: National Cancer Database. SEER: Surveillance, Epidemiology, End Results Program.
Figure 3.
Figure 3.
Unadjusted overall survival for breast cancer patients (diagnosed 2010–2014) compared by missingness of any key variable (3A: NCDB, 3B: SEER), or compared by number of missing key variables (3C: NCDB; 3D: SEER). NCDB: National Cancer Database. SEER: Surveillance, Epidemiology, End Results Program.
Figure 4.
Figure 4.
Unadjusted overall survival for breast cancer patients (diagnosed 2010–2014) compared by missingness within and across categories. (4A) NCDB, Demographic variables; (4B) NCDB, Tumor variables; (4C) NCDB, Treatment variables; (4D) NCDB, combined; (4E) SEER, Demographic variables; (4F) SEER, Tumor Variables; (4G) SEER, Treatment Variables, (4H) SEER, combined. NCDB: National Cancer Database. SEER: Surveillance, Epidemiology, End Results Program.

References

    1. Janz TA, Graboyes EM, Nguyen SA, Ellis MA, Neskey DM, Harruff EE, Lentsch EJ: A Comparison of the NCDB and SEER Database for Research Involving Head and Neck Cancer. Otolaryngol Head Neck Surg 2019, 160(2):284–294. - PubMed
    1. Mallin K, Browner A, Palis B, Gay G, McCabe R, Nogueira L, Yabroff R, Shulman L, Facktor M, Winchester DP et al. : Incident Cases Captured in the National Cancer Database Compared with Those in U.S. Population Based Central Cancer Registries in 2012–2014. Ann Surg Oncol 2019. - PubMed
    1. Mallin K, Palis BE, Watroba N, Stewart AK, Walczak D, Singer J, Barron J, Blumenthal W, Haydu G, Edge SB: Completeness of American Cancer Registry Treatment Data: implications for quality of care research. J Am Coll Surg 2013, 216(3):428–437. - PMC - PubMed
    1. An MW, Tang J, Grothey A, Sargent DJ, Ou FS, Mandrekar SJ: Missing tumor measurement (TM) data in the search for alternative TM-based endpoints in cancer clinical trials. Contemp Clin Trials Commun 2020, 17:100492. - PMC - PubMed
    1. Newman DA: Missing Data:Five Practical Guidelines. Organizational Research Methods 2014, 17(4):372–411.