Robust Score Tests With Missing Data in Genomics Studies
- PMID: 31920211
- PMCID: PMC6951249
- DOI: 10.1080/01621459.2018.1514304
Robust Score Tests With Missing Data in Genomics Studies
Abstract
Analysis of genomic data is often complicated by the presence of missing values, which may arise due to cost or other reasons. The prevailing approach of single imputation is generally invalid if the imputation model is misspecified. In this paper, we propose a robust score statistic based on imputed data for testing the association between a phenotype and a genomic variable with (partially) missing values. We fit a semiparametric regression model for the genomic variable against an arbitrary function of the linear predictor in the phenotype model and impute each missing value by its estimated posterior expectation. We show that the score statistic with such imputed values is asymptotically unbiased under general missing-data mechanisms, even when the imputation model is misspecified. We develop a spline-based method to estimate the semiparametric imputation model and derive the asymptotic distribution of the corresponding score statistic with a consistent variance estimator using sieve approximation theory and empirical process theory. The proposed test is computationally feasible regardless of the number of independent variables in the imputation model. We demonstrate the advantages of the proposed method over existing methods through extensive simulation studies and provide an application to a major cancer genomics study.
Keywords: Association tests; Imputation; Integrative analysis; Multiple genomics platforms; Semiparametric models; Sieve estimation.
Figures


Similar articles
-
A bias-corrected estimator in multiple imputation for missing data.Stat Med. 2018 Oct 15;37(23):3373-3386. doi: 10.1002/sim.7833. Epub 2018 May 29. Stat Med. 2018. PMID: 29845646
-
Nonlinear multiple imputation for continuous covariate within semiparametric Cox model: application to HIV data in Senegal.Stat Med. 2013 Nov 20;32(26):4651-65. doi: 10.1002/sim.5854. Epub 2013 May 28. Stat Med. 2013. PMID: 23712767
-
Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans.BMC Bioinformatics. 2016 Feb 2;17:55. doi: 10.1186/s12859-016-0899-7. BMC Bioinformatics. 2016. PMID: 26830693 Free PMC article.
-
Multiply robust imputation procedures for zero-inflated distributions in surveys.Metron. 2017 Dec;75(3):333-343. doi: 10.1007/s40300-017-0128-9. Epub 2017 Oct 11. Metron. 2017. PMID: 29371744 Free PMC article.
-
A nonparametric multiple imputation approach for missing categorical data.BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2. BMC Med Res Methodol. 2017. PMID: 28587662 Free PMC article.
Cited by
-
Two-phase sample selection strategies for design and analysis in post-genome-wide association fine-mapping studies.Stat Med. 2021 Dec 30;40(30):6792-6817. doi: 10.1002/sim.9211. Epub 2021 Oct 1. Stat Med. 2021. PMID: 34596256 Free PMC article.
-
pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data.Mol Ecol Resour. 2021 May;21(4):1359-1368. doi: 10.1111/1755-0998.13326. Epub 2021 Feb 5. Mol Ecol Resour. 2021. PMID: 33453139 Free PMC article.
-
Missing Values in Longitudinal Proteome Dynamics Studies: Making a Case for Data Multiple Imputation.J Proteome Res. 2024 Sep 6;23(9):4151-4162. doi: 10.1021/acs.jproteome.4c00263. Epub 2024 Aug 27. J Proteome Res. 2024. PMID: 39189460 Free PMC article.
References
-
- Arend RC, Londoño-Joshi AI, Straughn JM, and Buchsbaum DJ (2013), “The Wnt/β-Catenin Pathway in Ovarian Cancer: A Review,” Gynecologic Oncology, 131, 772–779. - PubMed
-
- Auer PL, Johnsen JM, Johnson AD, Logsdon BA, Lange LA, Nalls MA, Zhang G, Franceschini N, Fox K, Lange EM et al. (2012), “Imputation of Exome Sequence Variants Into Population-Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project,” The American Journal of Human Genetics, 91, 794–808, - PMC - PubMed
-
- Bignotti E, Tassi RA, Calza S, Ravaggi A, Bandiera E, Rossi E, Donzelli C, Pasinetti B, Pecorelli S, and Santin AD (2007), “Gene Expression Profile of Ovarian Serous Papillary Carcinomas: Identification of Metastasis-associated Genes,” American Journal of Obstetrics & Gynecology, 196, 245–el. - PubMed
-
- De Boor C (1978), A Practical Guide to Splines, New York: Springer.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources