Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 28;12(1):115.
doi: 10.1186/s13073-020-00818-2.

Statistical power in COVID-19 case-control host genomic study design

Collaborators, Affiliations

Statistical power in COVID-19 case-control host genomic study design

Yu-Chung Lin et al. Genome Med. .

Abstract

The identification of genetic variation that directly impacts infection susceptibility to SARS-CoV-2 and disease severity of COVID-19 is an important step towards risk stratification, personalized treatment plans, therapeutic, and vaccine development and deployment. Given the importance of study design in infectious disease genetic epidemiology, we use simulation and draw on current estimates of exposure, infectivity, and test accuracy of COVID-19 to demonstrate the feasibility of detecting host genetic factors associated with susceptibility and severity in published COVID-19 study designs. We demonstrate that limited phenotypic data and exposure/infection information in the early stages of the pandemic significantly impact the ability to detect most genetic variants with moderate effect sizes, especially when studying susceptibility to SARS-CoV-2 infection. Our insights can aid in the interpretation of genetic findings emerging in the literature and guide the design of future host genetic studies.

Keywords: Genetic epidemiology; Genome-wide association studies; Statistical genetics; Study design.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Statistical power (i.e., the probability of detecting an association when it truly exists) to detect association between a genetic variant and infection susceptibility at the genome-wide significance level (5e−8) [18]. A 1:1 case-control study design was used for all parameter settings. Reported effect sizes are on the odds ratio (OR) scale, parameterized as log-additive for each additional protective allele. a Assuming perfect test accuracy and baseline infection susceptibility at 80% based on recent estimates [15], there is low statistical power to detect true associations when there is either low population-level exposure to SARS-CoV-2 or moderate genetic (protective) effect sizes (OR = 0.7). Detecting rare variants (MAF = 0.01) remains challenging even with a much larger protective effect size (OR = 0.2). b Reducing sensitivity for testing SARS-CoV-2 infection not only reduces statistical power but also negates gains that result from increasing population exposure. c Assuming 20% population exposure rate seen in the hardest-hit regions, baseline infection susceptibility, in the absence of the contributing protective genetic allele, can also severely impact power. Higher infection susceptibility (i.e., higher infectivity) can diminish any chance of detecting true signals with currently available sample sizes. Lower population exposure will further dampen statistical power as seen in a. MAF, minor allele frequency
Fig. 2
Fig. 2
Statistical power in hypothesis testing to detect a true association between a genetic variant and COVID-19 disease severity at the genome-wide significance level (5e−8). A 1:1 case-control study design was used for all parameter settings. Only one red curve is shown since the study design uses confirmed infected individuals with mild or no symptoms as controls (test-positive controls), which is unaffected by population-level infection rates and the corresponding case-control misclassification. Effect sizes are reported on the odds ratio (OR) scale for each additional risk allele (log-additive scale). Perfect test accuracy is assumed in all plots. a Assumes a common variant with large effect size (OR = 1.7, MAF = 0.2). Using test-positive controls (red) yields higher power than using population-based (untested) controls (blue). High population infection rates reduce the gap between the two study designs but remains unlikely in the current phase of the pandemic. b Detecting a common variant with moderate effect size (OR = 1.3, MAF = 0.2) is challenging without drastically increasing the number of participants included for either design. c Detecting a rare variant even with a large effect size (OR = 5, MAF = 0.01) is more difficult with currently available sample sizes. Using test-positive controls without misclassification once again demonstrates higher power compared to population-based untested controls with misclassification. d Assumes OR = 1.7 and MAF = 0.2. Relative reduction in sample size, 1ntest_positive_controlsnpopulation_controls, from using test-positive controls compared to population-based controls. ntest _ positive _ controls and npopulation _ controls refer to the number of cases (1:1 case-control ratio) needed to achieve 80% power at the genome-wide significance level (5e–8) [18]. Relative reduction in sample size for other settings show similar trend and can be found in Additional file 2: Table S1

References

    1. Burgner D, Jamieson SE, Blackwell JM. Genetic susceptibility to infectious diseases: big is beautiful, but will bigger be even better? Lancet Infect Dis. 2006;6(10):653–663. doi: 10.1016/S1473-3099(06)70601-6. - DOI - PMC - PubMed
    1. Kellam P, Weiss RA. Infectogenomics: insights from the host genome into infectious diseases. Cell. 2006;124(4):695–697. doi: 10.1016/j.cell.2006.02.003. - DOI - PMC - PubMed
    1. Murray MF, Kenny EE, Ritchie MD, et al. COVID-19 outcomes and the human genome. Genet Med. 2020;22(7):1175–1177. doi: 10.1038/s41436-020-0832-3. - DOI - PMC - PubMed
    1. Williams FM, Freydin M, Mangino M, et al. Self-reported symptoms of COVID-19 including symptoms most predictive of SARS-CoV-2 infection, are heritable. medRxiv. 2020. 10.1101/2020.04.22.20072124. - PubMed
    1. COVID-19 Host Genetics Initiative The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur J Hum Genet. 2020;28(6):715–718. doi: 10.1038/s41431-020-0636-6. - DOI - PMC - PubMed

Publication types