Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Feb 23;19(2):e1010596.
doi: 10.1371/journal.pgen.1010596. eCollection 2023 Feb.

Strategies to investigate and mitigate collider bias in genetic and Mendelian randomisation studies of disease progression

Affiliations
Review

Strategies to investigate and mitigate collider bias in genetic and Mendelian randomisation studies of disease progression

Ruth E Mitchell et al. PLoS Genet. .

Abstract

Genetic studies of disease progression can be used to identify factors that may influence survival or prognosis, which may differ from factors that influence on disease susceptibility. Studies of disease progression feed directly into therapeutics for disease, whereas studies of incidence inform prevention strategies. However, studies of disease progression are known to be affected by collider (also known as "index event") bias since the disease progression phenotype can only be observed for individuals who have the disease. This applies equally to observational and genetic studies, including genome-wide association studies and Mendelian randomisation (MR) analyses. In this paper, our aim is to review several statistical methods that can be used to detect and adjust for index event bias in studies of disease progression, and how they apply to genetic and MR studies using both individual- and summary-level data. Methods to detect the presence of index event bias include the use of negative controls, a comparison of associations between risk factors for incidence in individuals with and without the disease, and an inspection of Miami plots. Methods to adjust for the bias include inverse probability weighting (with individual-level data), or Slope-Hunter and Dudbridge et al.'s index event bias adjustment (when only summary-level data are available). We also outline two approaches for sensitivity analysis. We then illustrate how three methods to minimise bias can be used in practice with two applied examples. Our first example investigates the effects of blood lipid traits on mortality from coronary heart disease, while our second example investigates genetic associations with breast cancer mortality.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Directed acyclic graph demonstrating the introduction of collider bias in observational case only studies.
Conditioning on disease incidence induces the association between previously independent causal risk factor 1 and causal risk factor 2, shown by the dashed line. Because risk factor 1 is also a causal risk factor for disease progression, a case-only setting has led to a biased association between risk factor 2 and disease progression via the path RF1->RF2->DP. The association of the risk factor 2 with disease progression when conditioning on incidence is entirely due to collider bias.
Fig 2
Fig 2. Directed acyclic graph demonstrating the introduction of collider bias in genetic case only studies.
(A) Conditioning on disease incidence induces the association between a previously independent casual risk factor and causal genetic variant for disease incidence, shown by the dashed line. Because risk factor 1 is also a casual risk factor for disease progression (a confounder of disease incidence and progression), a case-only setting has led to a biased association between the genetic variant and disease progression via the path Genetic variant->Measured/unmeasured confounder->Disease progression. The association of the genetic variant with disease progression when conditioning on incidence is entirely due to collider bias. (B) Collider bias will induce an association between genetic variants that both cause disease incidence. This will make a noncausal genetic variant (risk factor 2) to appear associated with disease progression. (C) A third scenario is where this induced path is in addition to the direct effect of the genetic variant on disease progression.
Fig 3
Fig 3. Directed acyclic graph demonstrating the introduction of collider bias in Mendelian randomisation case only studies.
In Mendelian randomisation analyses, the exposure is proxied by a causal genetic instrument. Conditioning on disease incidence induces the association between the previously independent genetic instrument and a common cause for disease incidence and disease progression, shown by the dashed line. This would violate the independence MR assumption invalidating the analysis.
Fig 4
Fig 4
(A) Manhattan plot of GWAS for age at recruitment in myocardial infarction (MI) incidence cases only in UK Biobank. Cases were defined as individuals who had had an acute MI event using the International Classification of Diseases 10th Revision codes (ICD-10: I21.0-I21.9). GWAS was performed using Plink. This plot illustrates one genetic signal on chromosome 5 that is shown to be strongly associated with age at recruitment (P < 5 × 10−8). This signal could potentially be induced due to collider bias as, in a general random population a GWAS for age should not show any signal. However, this signal could also be due to biases other than collider bias. (B) Manhattan plot of GWAS for sex in MI incidence cases only in UK Biobank. Cases were defined as individuals who had had an acute MI event using the ICD10 codes I21.0-I21.9. This plot does not show any strong signal associated with sex, suggesting that no evidence of collider bias is detected.
Fig 5
Fig 5. An example of a Miami plot comparing results from a GWAS of smoking initiation (top) and a GWAS of smoking cessation (bottom) in a population of smokers.
Plotted using publicly available summary statistics of Liu et al. [60]. There are several loci strongly associated with smoking cessation where there is no strong evidence for an association with smoking initiation (e.g., chr11, 19), suggesting that the association between these loci and smoking cessation is not the product of collider bias. However, further inspection of the magnitude of effect and confidence intervals is required to determine that these loci are not associated with initiation. The locus on chromosome 20 reaching genome-wide significance also appears to be associated with smoking initiation, albeit not at genome-wide significance, suggesting that the association of this locus with smoking cessation may be affected by collider bias.
Fig 6
Fig 6. Contrasting scenarios in which index event bias is or is not expected, based on the likely causality of exposures for disease onset.
Solid black lines indicate assumed causality. Absent lines indicate assumed lack of causality. Dashed black line indicates induced association. Absent dashed line indicates lack of induced association. Boxes indicate a variable that has been conditioned on. CHD, coronary heart disease; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; SNP, single nucleotide polymorphism.
Fig 7
Fig 7. Slope-Hunter fitted model showing assignment of each SNP to “hunted” or “pleiotropic” clusters for an analysis examining the effect of a breast cancer susceptibility PRS on breast cancer–specific mortality.
“Hunted” refers to SNPs that only affect incidence (here, breast cancer risk), and “pleiotropic” refers to SNPs that affect both incidence and prognosis (here, breast cancer risk and breast cancer–specific mortality). In this example, there were 5 hunted SNPs (i.e., those used to generate a “correction factor” for index event bias) and 171 pleiotropic SNPs.

References

    1. Phipps AI, Passarelli MN, Chan AT, Harrison TA, Jeon J, Hutter CM, et al. Common genetic variation and survival after colorectal cancer diagnosis: a genome-wide analysis. Carcinogenesis. 2016;37(1):87–95. doi: 10.1093/carcin/bgv161 - DOI - PMC - PubMed
    1. Chang IS, Jiang SS, Yang JC-H, Su W-C, Chien L-H, Hsiao C-F, et al. Genetic Modifiers of Progression-Free Survival in Never-Smoking Lung Adenocarcinoma Patients Treated with First-Line Tyrosine Kinase Inhibitors. Am J Respir Crit Care Med. 2017;195(5):663–73. doi: 10.1164/rccm.201602-0300OC - DOI - PMC - PubMed
    1. Fogh I, Lin K, Tiloca C, Rooney J, Gellera C, Diekstra FP, et al. Association of a Locus in the CAMTA1 Gene With Survival in Patients With Sporadic Amyotrophic Lateral Sclerosis. JAMA Neurol. 2016;73(7):812–20. doi: 10.1001/jamaneurol.2016.1114 - DOI - PMC - PubMed
    1. Lee JC, Biasci D, Roberts R, Gearry RB, Mansfield JC, Ahmad T, et al. Genome-wide association study identifies distinct genetic contributions to prognosis and susceptibility in Crohn’s disease. Nat Genet. 2017;49(2):262–8. doi: 10.1038/ng.3755 - DOI - PMC - PubMed
    1. Guo Q, Schmidt MK, Kraft P, Canisius S, Chen C, Khan S, et al. Identification of novel genetic markers of breast cancer survival. J Natl Cancer Inst. 2015;107(5):djv081. doi: 10.1093/jnci/djv081 - DOI - PMC - PubMed

Publication types