Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record
- PMID: 31684865
- PMCID: PMC6829851
- DOI: 10.1186/s12864-019-6192-1
Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record
Abstract
Background: The growth of DNA biobanks linked to data from electronic health records (EHRs) has enabled the discovery of numerous associations between genomic variants and clinical phenotypes. Nonetheless, although clinical data are generally longitudinal, standard approaches for detecting genotype-phenotype associations in such linked data, notably logistic regression, do not naturally account for variation in the period of follow-up or the time at which an event occurs. Here we explored the advantages of quantifying associations using Cox proportional hazards regression, which can account for the age at which a patient first visited the healthcare system (left truncation) and the age at which a patient either last visited the healthcare system or acquired a particular phenotype (right censoring).
Results: In comprehensive simulations, we found that, compared to logistic regression, Cox regression had greater power at equivalent Type I error. We then scanned for genotype-phenotype associations using logistic regression and Cox regression on 50 phenotypes derived from the EHRs of 49,792 genotyped individuals. Consistent with the findings from our simulations, Cox regression had approximately 10% greater relative sensitivity for detecting known associations from the NHGRI-EBI GWAS Catalog. In terms of effect sizes, the hazard ratios estimated by Cox regression were strongly correlated with the odds ratios estimated by logistic regression.
Conclusions: As longitudinal health-related data continue to grow, Cox regression may improve our ability to identify the genetic basis for a wide range of human phenotypes.
Keywords: Cox regression; Electronic health record; GWAS; Time-to-event modeling.
Conflict of interest statement
The authors declare they have no competing interests.
Figures




Similar articles
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5. Clin Orthop Relat Res. 2025. PMID: 39915110
-
The clinical effectiveness and cost-effectiveness of enzyme replacement therapy for Gaucher's disease: a systematic review.Health Technol Assess. 2006 Jul;10(24):iii-iv, ix-136. doi: 10.3310/hta10240. Health Technol Assess. 2006. PMID: 16796930
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780. Cochrane Database Syst Rev. 2024. PMID: 39679851 Free PMC article.
Cited by
-
Making the Most of Clumping and Thresholding for Polygenic Scores.Am J Hum Genet. 2019 Dec 5;105(6):1213-1221. doi: 10.1016/j.ajhg.2019.11.001. Epub 2019 Nov 21. Am J Hum Genet. 2019. PMID: 31761295 Free PMC article.
-
A novel age-informed approach for genetic association analysis in Alzheimer's disease.Alzheimers Res Ther. 2021 Apr 1;13(1):72. doi: 10.1186/s13195-021-00808-5. Alzheimers Res Ther. 2021. PMID: 33794991 Free PMC article.
-
Cox regression is robust to inaccurate EHR-extracted event time: an application to EHR-based GWAS.Bioinformatics. 2022 Apr 12;38(8):2297-2306. doi: 10.1093/bioinformatics/btac086. Bioinformatics. 2022. PMID: 35157022 Free PMC article.
-
Humanizing Big Data: Recognizing the Human Aspect of Big Data.Front Oncol. 2020 Mar 13;10:186. doi: 10.3389/fonc.2020.00186. eCollection 2020. Front Oncol. 2020. PMID: 32231993 Free PMC article. Review.
-
Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS.Annu Rev Biomed Data Sci. 2021 Jul 20;4:1-19. doi: 10.1146/annurev-biodatasci-122320-112352. Annu Rev Biomed Data Sci. 2021. PMID: 34465180 Free PMC article. Review.
References
-
- Collett D. Modelling Survival Data in Medical Research: CRC Press; 2015.
-
- Cox DR. Regression models and life-tables. J R Stat Soc Series B Stat Methodol. 1972;34:187–202.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Research Materials