Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep;32(9):1799-1810.
doi: 10.1177/09622802231192950. Epub 2023 Aug 24.

Smoothing Lexis diagrams using kernel functions: A contemporary approach

Affiliations

Smoothing Lexis diagrams using kernel functions: A contemporary approach

Philip S Rosenberg et al. Stat Methods Med Res. 2023 Sep.

Abstract

Lexis diagrams are rectangular arrays of event rates indexed by age and period. Analysis of Lexis diagrams is a cornerstone of cancer surveillance research. Typically, population-based descriptive studies analyze multiple Lexis diagrams defined by sex, tumor characteristics, race/ethnicity, geographic region, etc. Inevitably the amount of information per Lexis diminishes with increasing stratification. Several methods have been proposed to smooth observed Lexis diagrams up front to clarify salient patterns and improve summary estimates of averages, gradients, and trends. In this article, we develop a novel bivariate kernel-based smoother that incorporates two key innovations. First, for any given kernel, we calculate its singular values decomposition, and select an optimal truncation point-the number of leading singular vectors to retain-based on the bias-corrected Akaike information criterion. Second, we model-average over a panel of candidate kernels with diverse shapes and bandwidths. The truncated model averaging approach is fast, automatic, has excellent performance, and provides a variance-covariance matrix that takes model selection into account. We present an in-depth case study (invasive estrogen receptor-negative breast cancer incidence among non-Hispanic white women in the United States) and simulate operating characteristics for 20 representative cancers. The truncated model averaging approach consistently outperforms any fixed kernel. Our results support the routine use of the truncated model averaging approach in descriptive studies of cancer.

Keywords: Epidemiology; Lexis diagram; Surveillance; and End Results Program; cancer surveillance research; kernel methods; nonparametric smoothing.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Estrogen receptor-negative breast cancer incidence among non-Hispanic white women. Raw data (panels A–C), benchmark kernel (panels D – F), and truncated model average (panels G – I). Left panels: Lexis diagram heat maps. Center panels: Rates over time within 5-year age groups. Right panels: Rates by age within 5-year calendar periods. Shaded envelopes show 95% point-wise confidence limits.
Figure 2.
Figure 2.
Breast cancer averages, gradients, and trends. Features extracted from the data are shown in Figure 1. Raw data (panels A–C), benchmark kernel (panels D – F), and truncated model average (panels G – I). Left panels: Marginal period curve (left axis) and gradient (right axis). Center panels: Marginal age curve (left axis) and gradient (right axis). Right panels: Age-specific period trends. Gradient estimates in the left and center panels are trimmed to exclude the first and last time points. Shaded envelopes show pointwise 95% confidence limits.
Figure 3.
Figure 3.
Arrow plots of simulation results. Rows correspond to 20 cancers summarized in Table 1. Panels correspond to features. Blue circles show a percent reduction for benchmark kernel versus raw, and yellow triangles show a percent reduction for truncated model average (TMA) versus raw.
Figure 4.
Figure 4.
Operating characteristics of truncated model average. (A) Rows correspond to 20 cancers summarized in Table 1. Median (squares) and 90% limits (bars) of the estimated over-dispersion parameter ( ϕ~2 ), versus true values (triangles). (B) Box plots of effective degrees of freedom ( edf ). (C) Best-fit kernels, frequency of selection across 20 cancers.

References

    1. Keiding N. Statistical-inference in the Lexis diagram. Philos T Roy Soc A 1990; 332: 487–509.
    1. Carstensen B. Age-period-cohort models for the Lexis diagram. Stat Med 2007; 26: 3018–3045. - PubMed
    1. Breslow NE, Day NE. Statistical Methods in Cancer Research, Volume 2, The Design and Analysis of Cohort Studies. Oxford: International Agency for Research on Cancer, 1987. - PubMed
    1. Froelicher JH, Forjaz G, Rosenberg PS, et al. Geographic disparities of breast cancer incidence in Portugal at the district level: a spatial age-period-cohort analysis, 1998-2011. Cancer Epidemiol 2021; 74: 102009. - PubMed
    1. Lynn BCD, Chernyavskiy P, Gierach GL, et al. Decreasing incidence of estrogen receptor-negative breast cancer in the United States: trends by race and region. J Natl Cancer Inst 2022; 114: 263–270. DOI: 10.1093/jnci/djab186 - DOI - PMC - PubMed