Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 13;22(4):772-788.
doi: 10.1093/biostatistics/kxz065.

A mixed-model approach for powerful testing of genetic associations with cancer risk incorporating tumor characteristics

Affiliations

A mixed-model approach for powerful testing of genetic associations with cancer risk incorporating tumor characteristics

Haoyu Zhang et al. Biostatistics. .

Abstract

Cancers are routinely classified into subtypes according to various features, including histopathological characteristics and molecular markers. Previous genome-wide association studies have reported heterogeneous associations between loci and cancer subtypes. However, it is not evident what is the optimal modeling strategy for handling correlated tumor features, missing data, and increased degrees-of-freedom in the underlying tests of associations. We propose to test for genetic associations using a mixed-effect two-stage polytomous model score test (MTOP). In the first stage, a standard polytomous model is used to specify all possible subtypes defined by the cross-classification of the tumor characteristics. In the second stage, the subtype-specific case-control odds ratios are specified using a more parsimonious model based on the case-control odds ratio for a baseline subtype, and the case-case parameters associated with tumor markers. Further, to reduce the degrees-of-freedom, we specify case-case parameters for additional exploratory markers using a random-effect model. We use the Expectation-Maximization algorithm to account for missing data on tumor markers. Through simulations across a range of realistic scenarios and data from the Polish Breast Cancer Study (PBCS), we show MTOP outperforms alternative methods for identifying heterogeneous associations between risk loci and tumor subtypes. The proposed methods have been implemented in a user-friendly and high-speed R statistical package called TOP (https://github.com/andrewhaoyu/TOP).

Keywords: Cancer subtypes; EM algorithm; Etiologic heterogeneity; Score tests; Susceptibility variants; Two-stage polytomous model.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Power comparison among MTOP, FTOP, standard logistic regression, two-stage model with only complete data and polytomous model with formula image random samples. For the three figures in the first row, four tumor markers were included in the analysis. Three binary tumor marker and one ordinal tumor marker defined 24 cancer subtypes. Around formula image cases would be incomplete. For the three figures in the second row, two extra binary tumor markers were included in the analysis. The six tumor markers defined 96 subtypes. Around formula image cases would be incomplete. The power was estimated by controlling the type I error formula image.
Fig. 2.
Fig. 2.
Power comparison of global association test with pairwise interactions. Four methods were evaluated, including FTOP with additive structure, MTOP with additive structure (ER fixed), FTOP with pairwise interactions and MTOP with pairwise interactions (ER fixed). For the three figures in the first row, four tumor markers were included in the analysis. Three binary tumor marker and one ordinal tumor marker defined 24 cancer subtypes. Around formula image cases were incomplete. For the three figures in the second row, two extra binary tumor markers were included in the analysis. The six tumor markers defined 96 subtypes. Around formula image cases were incomplete. The total sample size was 25 000, 50 000, and 100 000. We generated formula image random replicates. The power was estimated by controlling the type I error formula image.
Fig. 3.
Fig. 3.
Manhattan plot of genome-wide association analysis with PBCS using four different methods. PBCS have 2078 invasive breast cancer and 2219 controls. In total, 7 017 694 SNPs on 22 auto chromosomes with MAF more than 5formula image were included in the analysis. ER, PR, HER2, and grade were used to define breast cancer subtypes.

References

    1. Ahearn, T. U., Zhang, H., Michailidou, K., Milne, R. L., Bolla, M. K., Dennis, J., Dunning, A. M., Lush, M., Wang, Q., Andrulis, I. L.. and others. (2019). Common breast cancer risk loci predispose to distinct tumor subtypes. bioRxiv, 733402. - PMC - PubMed
    1. Barnard, M. E., Boeke, C. E. and Tamimi, R. M. (2015). Established breast cancer risk factors and risk of intrinsic tumor subtypes. Biochimica et Biophysica Acta (BBA)-Reviews on Cancer 1856, 73–85. - PubMed
    1. Begg, C. B. and Zhang, Z. F. (1994). Statistical analysis of molecular epidemiology studies employing case-series. Cancer Epidemiology and Prevention Biomarkers 3, 173–175. - PubMed
    1. Cancer Genome Atlas Network. (2012). Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70. - PMC - PubMed
    1. Cancer Genome Atlas Research Network. (2014). Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–50. - PMC - PubMed

Publication types