Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 26;13(4):e0195959.
doi: 10.1371/journal.pone.0195959. eCollection 2018.

One-step estimation of networked population size: Respondent-driven capture-recapture with anonymity

Affiliations

One-step estimation of networked population size: Respondent-driven capture-recapture with anonymity

Bilal Khan et al. PLoS One. .

Abstract

Size estimation is particularly important for populations whose members experience disproportionate health issues or pose elevated health risks to the ambient social structures in which they are embedded. Efforts to derive size estimates are often frustrated when the population is hidden or hard-to-reach in ways that preclude conventional survey strategies, as is the case when social stigma is associated with group membership or when group members are involved in illegal activities. This paper extends prior research on the problem of network population size estimation, building on established survey/sampling methodologies commonly used with hard-to-reach groups. Three novel one-step, network-based population size estimators are presented, for use in the context of uniform random sampling, respondent-driven sampling, and when networks exhibit significant clustering effects. We give provably sufficient conditions for the consistency of these estimators in large configuration networks. Simulation experiments across a wide range of synthetic network topologies validate the performance of the estimators, which also perform well on a real-world location-based social networking data set with significant clustering. Finally, the proposed schemes are extended to allow them to be used in settings where participant anonymity is required. Systematic experiments show favorable tradeoffs between anonymity guarantees and estimator performance. Taken together, we demonstrate that reasonable population size estimates are derived from anonymous respondent driven samples of 250-750 individuals, within ambient populations of 5,000-40,000. The method thus represents a novel and cost-effective means for health planners and those agencies concerned with health and disease surveillance to estimate the size of hidden populations. We discuss limitations and future work in the concluding section.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Estimator n1 on uniform samples in populations of size n = 5 ⋅ 103 to 40 ⋅ 103.
In each box, the thick line indicates the sample median; the top of the box is the median of the upper half of the estimated values (75% quartile); the bottom of the box indicates the median of the lower half of the estimated values (25% quartile); and the whiskers indicate the full range of estimated values. No (finite) outliers were removed.
Fig 2
Fig 2. Estimator n2 on RDS samples in populations of size n = 5 ⋅ 103 to 40 ⋅ 103.
In each box, the thick line indicates the sample median; the top of the box is the median of the upper half of the estimated values (75% quartile); the bottom of the box indicates the median of the lower half of the estimated values (25% quartile); and the whiskers indicate the full range of estimated values. No (finite) outliers were removed.
Fig 3
Fig 3. Estimator n3 on RDS samples in populations of size n = 5 ⋅ 103 to 40 ⋅ 103.
In each box, the thick line indicates the sample median; the top of the box is the median of the upper half of the estimated values (75% quartile); the bottom of the box indicates the median of the lower half of the estimated values (25% quartile); and the whiskers indicate the full range of estimated values. No (finite) outliers were removed.
Fig 4
Fig 4. The action of ψ on V.
Fig 5
Fig 5. Estimator n2ψ on RDS samples of size r = 500 with |Ω| = 2 ⋅ 103 to 256 ⋅ 103.
In each box, the thick line indicates the sample median; the top of the box is the median of the upper half of the estimated values (75% quartile); the bottom of the box indicates the median of the lower half of the estimated values (25% quartile); and the whiskers indicate the full range of estimated values. No (finite) outliers were removed.
Fig 6
Fig 6. Estimator n3ψ on RDS samples of size r = 500 with |Ω| = 2 ⋅ 103 to 256 ⋅ 103.
In each box, the thick line indicates the sample median; the top of the box is the median of the upper half of the estimated values (75% quartile); the bottom of the box indicates the median of the lower half of the estimated values (25% quartile); and the whiskers indicate the full range of estimated values. No (finite) outliers were removed.
Fig 7
Fig 7. Estimator n2ψ (above) and n3ψ (below) on Brightkite network; |Ω| = 2 ⋅ 103 to 256 ⋅ 103, with sample size r = 250, 500, 750.
In each box, the thick line indicates the sample median; the top of the box is the median of the upper half of the estimated values (75% quartile); the bottom of the box indicates the median of the lower half of the estimated values (25% quartile); and the whiskers indicate the full range of estimated values. Data points that exceeded the third quartile boundary by over 1.5 times the interquartile range (IQR) were treated as outliers and removed.
Fig 8
Fig 8. Mean failure rate analysis of the proposed estimators.

Similar articles

Cited by

References

    1. Magnani R, Sabin K, Saidel T, Heckathorn D. Review of sampling hard-to-reach and hidden populations for HIV surveillance. AIDS. 2005;19:S67 10.1097/01.aids.0000172879.20628.e1 - DOI - PubMed
    1. Dombrowski K. Topological and Historical Considerations for Infectious Disease Transmission among Injecting Drug Users in Bushwick, Brooklyn (USA). World Journal of AIDS. 2013;03(01):1–9. 10.4236/wja.2013.31001 - DOI - PMC - PubMed
    1. Reluga T, Meza R, Walton Db, Galvani Ap. Reservoir interactions and disease emergence. Theoretical Population Biology. 2007;72(3):400–408. 10.1016/j.tpb.2007.07.001 - DOI - PMC - PubMed
    1. Bonin JP, Fournier L, Blais R. A Typology of Mentally Disordered Users of Resources for Homeless People: Towards Better Planning of Mental Health Services. Administration and Policy in Mental Health and Mental Health Services Research. 2009;36(4):223–235. 10.1007/s10488-009-0206-2 - DOI - PubMed
    1. Burt MR. Critical Factors in Counting the Homeless. American Journal of Orthopsychiatry. 1995;65(3):334–339. 10.1037/h0085059 - DOI - PubMed

Publication types

LinkOut - more resources