Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 20;113(51):14668-14673.
doi: 10.1073/pnas.1617258113. Epub 2016 Dec 7.

Estimating uncertainty in respondent-driven sampling using a tree bootstrap method

Affiliations

Estimating uncertainty in respondent-driven sampling using a tree bootstrap method

Aaron J Baraff et al. Proc Natl Acad Sci U S A. .

Abstract

Respondent-driven sampling (RDS) is a network-based form of chain-referral sampling used to estimate attributes of populations that are difficult to access using standard survey tools. Although it has grown quickly in popularity since its introduction, the statistical properties of RDS estimates remain elusive. In particular, the sampling variability of these estimates has been shown to be much higher than previously acknowledged, and even methods designed to account for RDS result in misleadingly narrow confidence intervals. In this paper, we introduce a tree bootstrap method for estimating uncertainty in RDS estimates based on resampling recruitment trees. We use simulations from known social networks to show that the tree bootstrap method not only outperforms existing methods but also captures the high variability of RDS, even in extreme cases with high design effects. We also apply the method to data from injecting drug users in Ukraine. Unlike other methods, the tree bootstrap depends only on the structure of the sampled recruitment trees, not on the attributes being measured on the respondents, so correlations between attributes can be estimated as well as variability. Our results suggest that it is possible to accurately assess the high level of uncertainty inherent in RDS.

Keywords: HIV; hard-to-reach population; injecting drug user; snowball sampling; social network.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
(A) Example of an RDS recruitment tree. (B) Resample taken from the RDS recruitment tree by the tree bootstrap method. Individuals are shaded according to their attribute value.
Fig. 2.
Fig. 2.
Estimating population proportions via RDS with the Project 90 data. Coverage probabilities and widths of 95% confidence intervals estimated by the following methods: (i) the naive proportion variance estimator; (ii) the Volz–Heckathorn variance estimator; (iii) the Salganik bootstrap; (iv) the Yamanis bootstrap; (v) the Gile successive-sampling bootstrap; and (vi) the tree bootstrap. For the coverage probabilities and widths in A and C, sampling was performed with replacement, and for the coverage probabilities in B, sampling was performed without replacement. Attributes are in decreasing order of prevalence in the network. The dashed vertical black lines in A and B are at 0.95, so that for a perfectly calibrated method, the symbol would lie on the line. The short black lines in C are the expected 95% interval widths based on 10,000 simulated samples.
Fig. 3.
Fig. 3.
Estimating population proportions via RDS with the Add Health data. Mean coverage probabilities of 95% confidence intervals across the 84 school pairs estimated by the following methods: (i) the naive proportion variance estimator; (ii) the Volz–Heckathorn variance estimator; (iii) the Salganik bootstrap; (iv) the Gile successive-sampling bootstrap; and (v) the tree bootstrap. Sampling was performed with replacement, and attributes are in decreasing order of mean prevalence over the 84 networks. The dashed vertical black lines are at 0.95, so that for a perfectly calibrated method, the symbol would lie on the line.
Fig. 4.
Fig. 4.
Ukrainian IDU data: estimates of the proportion of IDUs hospitalized to state drug treatment in-patient clinics during 2010 (A), the proportion of IDUs who participated in the state SMT program (B), the proportion of IDUs registered at NGOs that provide HIV prevention services (C), and the average number of HIV rapid tests distributed by NGOs that are used by each registered IDU (D). The figures show 80% (dark) and 95% (light) confidence intervals obtained from the naive proportion variance estimator, the Volz–Heckathorn variance estimator, and the tree bootstrap for two Ukrainian cities, Simferopol and Bila Tserkva.

References

    1. Heckathorn DD. Respondent-driven sampling: A new approach to the study of hidden populations. Soc Probl. 1997;44(2):174–199.
    1. Heckathorn DD. Respondent-driven sampling II: Deriving valid population estimates from chain-referral samples of hidden populations. Soc Probl. 2002;49(1):11–34.
    1. Volz E, Heckathorn DD. Probability based estimation theory for respondent driven sampling. J Offic Stat. 2008;24(1):79–97.
    1. White RG, et al. Strengthening the Reporting of Observational Studies in Epidemiology for respondent-driven sampling studies: “STROBE-RDS” statement. J Clin Epidemiol. 2015;68(12):1463–1471. - PMC - PubMed
    1. Heckathorn DD. Extensions of respondent-driven sampling: Analyzing continuous variables and controlling for differential recruitment. Socio Meth. 2007;37(1):151–207.

Publication types

MeSH terms

LinkOut - more resources