Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct;10(5):e001616.
doi: 10.1161/CIRCGENETICS.116.001616.

Impact of Selection Bias on Estimation of Subsequent Event Risk

Affiliations

Impact of Selection Bias on Estimation of Subsequent Event Risk

Yi-Juan Hu et al. Circ Cardiovasc Genet. 2017 Oct.

Abstract

Background: Studies of recurrent or subsequent disease events may be susceptible to bias caused by selection of subjects who both experience and survive the primary indexing event. Currently, the magnitude of any selection bias, particularly for subsequent time-to-event analysis in genetic association studies, is unknown.

Methods and results: We used empirically inspired simulation studies to explore the impact of selection bias on the marginal hazard ratio for risk of subsequent events among those with established coronary heart disease. The extent of selection bias was determined by the magnitudes of genetic and nongenetic effects on the indexing (first) coronary heart disease event. Unless the genetic hazard ratio was unrealistically large (>1.6 per allele) and assuming the sum of all nongenetic hazard ratios was <10, bias was usually <10% (downward toward the null). Despite the low bias, the probability that a confidence interval included the true effect decreased (undercoverage) with increasing sample size because of increasing precision. Importantly, false-positive rates were not affected by selection bias.

Conclusions: In most empirical settings, selection bias is expected to have a limited impact on genetic effect estimates of subsequent event risk. Nevertheless, because of undercoverage increasing with sample size, most confidence intervals will be over precise (not wide enough). When there is no effect modification by history of coronary heart disease, the false-positive rates of association tests will be close to nominal.

Keywords: alleles; confidence intervals; genetic association studies; risk; sample size; selection bias.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Three populations
D1 denotes the first/index event. S is the indicator of surviving the first event. Population 1 = general population; population 2 = those with a first event (fatal and non-fatal cases); population 3 = those with a non-fatal first event.
Figure 2
Figure 2. Directed acyclic graphs
(a) Scenario 1: the genetic variant (G) associates with risk of first event (D1), survival (S), and risk of subsequent event (D2). (b) Scenario 2: the genetic variant encodes a biomarker (Z) that associates with risk of first event, survival, and risk of subsequent event.
Figure 3
Figure 3. Results of the estimated hazard ratio (HR) for a genetic variant that associates with risk of first event, survival, and risk of a subsequent CHD event (scenario 1)
Power under the HR of 1 for G means type 1 error. The black bars pertain to population 2 (selection of subjects with fatal or non-fatal first events) and the grey bars to population 3 (selection of subjects with non-fatal first events). The dashed line in the middle panel indicates the expected coverage of 0.95. The dashed line in the lower panel indicates the nominal significance level of 0.05. Sample size is set at 25000.
Figure 4
Figure 4. Results of the estimated hazard ratio (HR) for a genetic variant (scenario 1) with different sample sizes
The HR of G is set to 1.3. The black bars pertain to population 2 (selection of subjects with fatal or non-fatal first events) and the grey bars to population 3 (selection of subjects with non-fatal first events). The dashed line in the middle panel indicates the expected coverage of 0.95. The dashed line in the lower panel indicates the nominal significance level of 0.05.
Figure 5
Figure 5. Results of the estimated hazard ratio (HR) for a genetic variant that encodes a biomarker that associates with risk of first event, survival, and risk of a subsequent CHD event (scenario 2)
Power under the HR of 1 for Z means type 1 error. The black bars pertain to populations 2 (selection of subjects with fatal or non-fatal first events) and the grey bars to population 3 (selection of subjects with non-fatal first events). The dashed line in the middle panel indicates the expected coverage of 0.95. The dashed line in the lower panel indicates the nominal significance level of 0.05.
Figure 6
Figure 6
Results of the estimated hazard ratio (HR) for a non-genetic biomarker that associates with risk of first event, survival, and risk of a subsequent CHD event (scenario 2) Power under the HR of 1 for Z means type 1 error. The black bars pertain to populations 2 (selection of subjects with fatal or non-fatal first events) and the grey bars to population 3 (selection of subjects with non-fatal first events). The dashed line in the middle panel indicates the expected coverage of 0.95. The dashed lines in the lower panel indicate 1.00 and 0.05 (the nominal significance level).

Comment in

References

    1. Capewell S, Allender S, Critchley J, Lloyd-Williams F, O’Flaherty M, Rayner M, et al. Modelling the UK burden of cardiovascular disease to 2020: A Research Report for the Cardio & Vascular Coalition and the British Heart Foundation. 2008.
    1. Mozaffarian D, Benjamin EJ, Go AS, Arnett DK, Blaha MJ, Cushman M, et al. Heart disease and stroke statistics-2015 update: A report from the American Heart Association. Circulation. 2015;131:e29–e39. - PubMed
    1. Piepoli MF, Hoes AW, Agewall S, Albus C, Brotons C, Catapano AL, et al. 2016 European Guidelines on cardiovascular disease prevention in clinical practice. Eur Heart J. 2016;37:2315–2381. - PMC - PubMed
    1. Reilly MP, Li M, He J, Ferguson JF, Stylianou IM, Mehta NN, et al. Identification of ADAMTS7 as a novel locus for coronary atherosclerosis and association of ABO with myocardial infarction in the presence of coronary atherosclerosis: two genome-wide association studies. Lancet. 2017;377:383–392. - PMC - PubMed
    1. Patel RS, Asselbergs FW. The GENIUS-CHD consortium. Eur Heart J. 2015;36:2674–2676. - PubMed