Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar;26(2):229-37.
doi: 10.1097/EDE.0000000000000218.

On the assumption of bivariate normality in selection models: a Copula approach applied to estimating HIV prevalence

Affiliations

On the assumption of bivariate normality in selection models: a Copula approach applied to estimating HIV prevalence

Mark E McGovern et al. Epidemiology. 2015 Mar.

Abstract

Background: Heckman-type selection models have been used to control HIV prevalence estimates for selection bias when participation in HIV testing and HIV status are associated after controlling for observed variables. These models typically rely on the strong assumption that the error terms in the participation and the outcome equations that comprise the model are distributed as bivariate normal.

Methods: We introduce a novel approach for relaxing the bivariate normality assumption in selection models using copula functions. We apply this method to estimating HIV prevalence and new confidence intervals (CI) in the 2007 Zambia Demographic and Health Survey (DHS) by using interviewer identity as the selection variable that predicts participation (consent to test) but not the outcome (HIV status).

Results: We show in a simulation study that selection models can generate biased results when the bivariate normality assumption is violated. In the 2007 Zambia DHS, HIV prevalence estimates are similar irrespective of the structure of the association assumed between participation and outcome. For men, we estimate a population HIV prevalence of 21% (95% CI = 16%-25%) compared with 12% (11%-13%) among those who consented to be tested; for women, the corresponding figures are 19% (13%-24%) and 16% (15%-17%).

Conclusions: Copula approaches to Heckman-type selection models are a useful addition to the methodological toolkit of HIV epidemiology and of epidemiology in general. We develop the use of this approach to systematically evaluate the robustness of HIV prevalence estimates based on selection models, both empirically and in a simulation study.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of Modelling Dependence Using Copulae. Observations are drawn from the corresponding bivariate distributions with n=1,000 and τ = −0.50. See the eAppendix for the code for drawing from these distributions.
Figure 2
Figure 2
Simulation Results for HIV Prevalence Estimates with Non-Normal Errors. This scenario illustrates the case with cubed normal errors. The distribution of the proportional error of estimates of HIV prevalence obtained from the normal selection model (Gaussian Copula), a Copula selection model and an imputation model are shown. The simulation is based on the 2007 Zambia Demographic and Health Survey for men, with n=6,500 and 1,000 replications. For each replication, the proportional error for each estimator is calculated as mean (HIVModelHIVTrue)/HIVTrue. The copula model is defined as the copula with the best fit in each replication according to the Akaike Information Criterion (AIC). Errors for the latent variables for consent and HIV status were drawn from a bivariate normal distribution with mean = 0 and τ = -0.50, cubed, and then scaled to have mean 0. The mean true HIV prevalence was 21%, observed HIV prevalence (for those with consent=1) was 12%. Consent to be tested was 81%, and the F statistic for interviewer identity was 3.5. The F statistic is calculated as a joint test of significance for interviewer identity in a regression of consent on interviewer identity with the inclusion of the model control variables. See the eAppendix for further details, including the R code for replicating the simulations.

Similar articles

Cited by

References

    1. Gersovitz M. HIV testing: principles and practice. World Bank Res. Obs. 2011;26:1–41.
    1. Boerma JT, Ghys PD, Walker N. Estimates of HIV-1 prevalence from national population-based surveys as a new gold standard. Lancet. 2003;362:1929–1931. - PubMed
    1. Hogan DR, et al. National HIV prevalence estimates for sub-Saharan Africa: controlling selection bias with Heckman-type selection models. Sex. Transm. Infect. 2012;88:i17–i23. - PMC - PubMed
    1. Sterck O. Why Are Testing Rates So Low in Sub-Saharan Africa? Misconceptions and Strategic Behaviors. Forum for Health Economics and Policy. 2013;16 - PubMed
    1. Marston M, Harriss K, Slaymaker E. Non-response bias in estimates of HIV prevalence due to the mobility of absentees in national population-based surveys: a study of nine national surveys. Sex. Transm. Infect. 2008;84:i71–i77. - PMC - PubMed

Publication types