Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 10;12(1):19230.
doi: 10.1038/s41598-022-21924-8.

Impact of molecular sequence data completeness on HIV cluster detection and a network science approach to enhance detection

Affiliations

Impact of molecular sequence data completeness on HIV cluster detection and a network science approach to enhance detection

Sepideh Mazrouee et al. Sci Rep. .

Abstract

Detection of viral transmission clusters using molecular epidemiology is critical to the response pillar of the Ending the HIV Epidemic initiative. Here, we studied whether inference with an incomplete dataset would influence the accuracy of the reconstructed molecular transmission network. We analyzed viral sequence data available from ~ 13,000 individuals with diagnosed HIV (2012-2019) from Houston Health Department surveillance data with 53% completeness (n = 6852 individuals with sequences). We extracted random subsamples and compared the resulting reconstructed networks versus the full-size network. Increasing simulated completeness was associated with an increase in the number of detected clusters. We also subsampled based on the network node influence in the transmission of the virus where we measured Expected Force (ExF) for each node in the network. We simulated the removal of nodes with the highest and then lowest ExF from the full dataset and discovered that 4.7% and 60% of priority clusters were detected respectively. These results highlight the non-uniform impact of capturing high influence nodes in identifying transmission clusters. Although increasing sequence reporting completeness is the way to fully detect HIV transmission patterns, reaching high completeness has remained challenging in the real world. Hence, we suggest taking a network science approach to enhance performance of molecular cluster detection, augmented by node influence information.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Metadata—age distribution of people with diagnosed HIV (2012–2019).
Figure 2
Figure 2
Example of how missing one node in a network can change its clustering dynamics: (a) cluster A with 14 individuals and their pairwise genetic distance, (b) missingness of one node (degree = 6), causes cluster A to split into two smaller clusters and also leaves 4 nodes as singletons, (c) repeating cluster A with another node missingness (degree = 4), (d) the new missingness, caused cluster A split into three smaller clusters and to leave one node as singleton.
Figure 3
Figure 3
The distribution of clustered nodes in three demographic categories and transmission risk for full versus artificially subsampled data: (a) race/ethnicity, (b) transmission risk category, (c) sex (assigned at birth) and (d) gender.
Figure 3
Figure 3
The distribution of clustered nodes in three demographic categories and transmission risk for full versus artificially subsampled data: (a) race/ethnicity, (b) transmission risk category, (c) sex (assigned at birth) and (d) gender.
Figure 4
Figure 4
Houston/Harris County (2012–2019) data: (a) average infection risk improvement, (b) cluster detection trend (left y-axis) and Singleton sequence rate versus different data completeness rates (right y-axis) vs various genotype data completeness.
Figure 5
Figure 5
Cluster detection comparison for two sampling methods (Houston/Harris county 2012–2019): Left Y axis shows priority clusters detected in Randomly subsampled data, Right Y axis shows priority clusters detected from subsampled data based on Expected Force (ExF) node influence measure.
Figure 6
Figure 6
Node Influence (ExF) per Node degree distribution of clustered sequence—Houston Health Department: 2012–2019.

References

    1. Mollison D, Denis M. Epidemic Models: Their Structure and Relation to Data. Cambridge University Press; 1995.
    1. Kosakovsky Pond SL, Weaver S, Leigh Brown AJ, Wertheim JO. HIV-TRACE (TRAnsmission Cluster Engine): A tool for large scale molecular epidemiology of HIV-1 and other rapidly evolving pathogens. Mol. Biol. Evol. 2018;35:1812–1819. doi: 10.1093/molbev/msy016. - DOI - PMC - PubMed
    1. Cheronis N, Bean H, Tremoglie M, Magrini C, Blazejewski L, Hsiao CB. 1300. Symptom driven testing is not enough: A retrospective review of patients enrolled into HIV care 2015–2018 at a ryan white patient-centered medical home in Pittsburgh, Pennsylvania. Open Forum Infect. Dis. 2019;6(Suppl 2):S468–S469. doi: 10.1093/ofid/ofz360.1163. - DOI
    1. Oster AM, France AM, Panneer N, et al. Identifying clusters of recent and rapid HIV transmission through analysis of molecular surveillance data. J. Acquir. Immune Defic. Syndr. 2018;79(5):543–550. doi: 10.1097/QAI.0000000000001856. - DOI - PMC - PubMed
    1. Chan PA, Hogan JW, Huang A, et al. Phylogenetic investigation of a statewide HIV-1 epidemic reveals ongoing and active transmission networks among men who have sex with men. J. Acquir. Immune Defic. Syndr. 2015;70(4):428–435. doi: 10.1097/QAI.0000000000000786. - DOI - PMC - PubMed

Publication types