Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021;15(2):4420-4461.
doi: 10.1214/21-ejs1881. Epub 2021 Sep 14.

Envelope method with ignorable missing data

Affiliations

Envelope method with ignorable missing data

Linquan Ma et al. Electron J Stat. 2021.

Abstract

Envelope method was recently proposed as a method to reduce the dimension of responses in multivariate regressions. However, when there exists missing data, the envelope method using the complete case observations may lead to biased and inefficient results. In this paper, we generalize the envelope estimation when the predictors and/or the responses are missing at random. Specifically, we incorporate the envelope structure in the expectation-maximization (EM) algorithm. As the parameters under the envelope method are not pointwise identifiable, the EM algorithm for the envelope method was not straightforward and requires a special decomposition. Our method is guaranteed to be more efficient, or at least as efficient as, the standard EM algorithm. Moreover, our method has the potential to outperform the full data MLE. We give asymptotic properties of our method under both normal and non-normal cases. The efficiency gain over the standard EM is confirmed in simulation studies and in an application to the Chronic Renal Insufficiency Cohort (CRIC) study.

Keywords: EM-algorithm; Efficiency gain; Missing data; Multivariate regression; Sufficient dimension reduction.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Intuitive illustration of the envelope method without missing data. Two groups are shown using circle dots X=0 and triangles X=1. The solid line is the true envelope direction, the dashed lines are the estimated envelope. The density curves of the two groups using the envelope method are shown at the bottom of each subfigure.
Figure 2:
Figure 2:
Intuitive illustration of the envelope method in the presence of missing data. Two groups are shown using circle dots X=0 and triangles X=1. Hollow circle dots or triangles indicate one of the components of Y is missing: the hollow triangle has Y1 missing, and the hollow circle dot has Y2 missing. The solid line is the true envelope direction, the dashed lines are the estimated envelope using different methods. The density curves of the two groups using different methods are shown at the bottom of each subfigure.
Figure 3:
Figure 3:
Histograms of the MSEs of the EM envelope estimator, the complete case (CC) envelope estimator, the full data envelope estimator, the standard EM estimator, the standard complete case (CC) estimator, and the full data MLE when Ω0=1000Iq.
Figure 4:
Figure 4:
Histograms of the MSEs of the EM envelope estimator, the complete case (CC) envelope estimator, the full data envelope estimator, the standard EM estimator, the standard complete case (CC) estimator, and the full data MLE when the error term ϵi follows t-distribution and Xi follows Bernoulli distribution.
Figure 5:
Figure 5:
Histograms of the MSEs of the EM envelope estimator, the complete case (CC) envelope estimator, the full data envelope estimator, the standard EM estimator, the standard complete case (CC) estimator, and the full data MLE when the error term ϵi and Xi follows t-distribution.
Figure 6:
Figure 6:
The empirical cumulative distribution of the ratio between the standard errors of the standard EM and our method without adjusting for the established biomarkers.
Figure 7:
Figure 7:
The empirical cumulative distribution of the ratio between the standard errors of the standard EM and our method adjusted for the established biomarkers.
Figure 8:
Figure 8:
Histograms of the MSEs of the EM envelope estimator, the complete case (CC) envelope estimator, the full data envelope estimator, the standard EM estimator, the standard complete case (CC) estimator and the full data MLE when Ω0=10Iq.

Similar articles

References

    1. Anderson A, Yang W, Townsend R, Pan Q, Chertow G, Kusek J, Charleston J, He J, Kallem R, Lash J, et al. (2015). Time-updated systolic blood pressure and the progression of chronic kidney disease: a cohort study. Annals of Internal Medicine, 162:258–265. - PMC - PubMed
    1. Bansal N, Keane M, Delafontaine P, Dries D, Foster E, Gadegbeku C, Go A, Hamm L, Kusek J, Ojo A, et al. (2013). A longitudinal study of left ventricular function and structure from CKD to ESRD: the CRIC study. Clinical Journal of the American Society of Nephrology, 8:355–362. - PMC - PubMed
    1. Breiman L and Friedman JH (1997). Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59:3–54.
    1. Budoff M, Rader D, Reilly M, Mohler E, Lash J, Yang W, Rosen L, Glenn M, Teal V, and Feldman H (2011). Relationship of estimated GFR and coronary artery calcification in the CRIC (Chronic Renal Insufficiency Cohort) study. American Journal of Kidney Diseases, 58:519–526. - PMC - PubMed
    1. Capuano V, Bambacaro A, D’Arminio T, Vecchio G, and Cappuccio L (2003). Correlation between body mass index and others risk factors for cardiovascular disease in women compared with men. Monaldi Archives for Chest Disease, 60:295–300. - PubMed

LinkOut - more resources