Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 10:4:1400003.
doi: 10.3389/fbinf.2024.1400003. eCollection 2024.

AUTO-TUNE: selecting the distance threshold for inferring HIV transmission clusters

Affiliations

AUTO-TUNE: selecting the distance threshold for inferring HIV transmission clusters

Steven Weaver et al. Front Bioinform. .

Abstract

Molecular surveillance of viral pathogens and inference of transmission networks from genomic data play an increasingly important role in public health efforts, especially for HIV-1. For many methods, the genetic distance threshold used to connect sequences in the transmission network is a key parameter informing the properties of inferred networks. Using a distance threshold that is too high can result in a network with many spurious links, making it difficult to interpret. Conversely, a distance threshold that is too low can result in a network with too few links, which may not capture key insights into clusters of public health concern. Published research using the HIV-TRACE software package frequently uses the default threshold of 0.015 substitutions/site for HIV pol gene sequences, but in many cases, investigators heuristically select other threshold parameters to better capture the underlying dynamics of the epidemic they are studying. Here, we present a general heuristic scoring approach for tuning a distance threshold adaptively, which seeks to prevent the formation of giant clusters. We prioritize the ratio of the sizes of the largest and the second largest cluster, maximizing the number of clusters present in the network. We apply our scoring heuristic to outbreaks with different characteristics, such as regional or temporal variability, and demonstrate the utility of using the scoring mechanism's suggested distance threshold to identify clusters exhibiting risk factors that would have otherwise been more difficult to identify. For example, while we found that a 0.015 substitutions/site distance threshold is typical for US-like epidemics, recent outbreaks like the CRF07_BC subtype among men who have sex with men (MSM) in China have been found to have a lower optimal threshold of 0.005 to better capture the transition from injected drug use (IDU) to MSM as the primary risk factor. Alternatively, in communities surrounding Lake Victoria in Uganda, where there has been sustained heterosexual transmission for many years, we found that a larger distance threshold is necessary to capture a more risk factor-diverse population with sparse sampling over a longer period of time. Such identification may allow for more informed intervention action by respective public health officials.

Keywords: HIV; molecular epidemiology; network; surveillance; transmission cluster.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Method flowchart for computing and recommending a distance threshold. See text for details on normalization and specific transforms used.
FIGURE 2
FIGURE 2
The user interface of the AUTO-TUNE web application (http://autotune.datamonkey.org/analyze). The platform provides a multi-faceted view of AUTO-TUNE’s analysis, including a score plot that visualizes trends across different genetic distance thresholds. It also displays graphs of the number of clusters and the R1/R2 ratio—both key metrics in AUTO-TUNE’s heuristic scoring system. These interactive visualizations aid researchers in making nuanced decisions for threshold selection, especially when multiple thresholds yield similar scores.
FIGURE 3
FIGURE 3
(A) Box plot representing the AUTO-TUNE scores across ten random samples at 25%, 50%, and 75% of the (Rhee et al., 2019) dataset, showing a trend of increasing confidence in score estimates with denser sampling. (B) Box plot of the selected distance thresholds across the same random samples at 25%, 50%, and 75% proportions, demonstrating improved consistency in threshold selection with increased sample size. (C) Scatterplot of the chosen thresholds (Y-axis) against their corresponding AUTO-TUNE scores (X-axis) for the three subsample proportions.
FIGURE 4
FIGURE 4
Examples of AUTO-TUNE scores profiles. (A) Lowering the genetic distance threshold removes some of the edges from the network (shown in grey) and disconnects a large cluster into color-coded smaller clusters; here “None” means that the node is not connected to anything at the lower threshold. (B) Raising the genetic distance threshold adds edges to the network (shown in grey) and connectes previously separte clusters into a larger component (C) Each circle is a cluster in the larger threshold network, and with a proportion of nodes removed when the threshold is lowered. (D) Changes to the node degree distribution (colors represent the counts of nodes with the same degree). (E) A significant enlargement of a small network at a higher threshold, with grey edges only present at the larger threshold.
FIGURE 5
FIGURE 5
Examples of how changing thresholds affects inferred networks. (A) A high-scoring network Bbosa et al. (2020) has a distance threshold which achieves the number of clusters near the maximum, while also avoiding the formation of a large (weakly connected) cluster. (B) A low-scoring network Liu et al. (2020) has a misalignment between the distance for which the maximum number of clusters is found, and where the big jumps in the cluster size ratio occur. Here, AUTO-TUNE effectively optimizes the number of clusters while preventing excessive growth of the largest cluster.
FIGURE 6
FIGURE 6
Figure A and B present the effects of subsampling on network structure using different thresholds. Figure A illustrates the proportion of nodes subsampled that remained clustered in both the original and the subsampled networks, with an observable increase in nodes captured as the threshold transitions from 1.5.

Update of

References

    1. Abidi S. H., Aibekova L., Davlidova S., Amangeldiyeva A., Foley B., Ali S. (2021). Origin and evolution of HIV-1 subtype A6. PLoS One 16, e0260604. 10.1371/journal.pone.0260604 - DOI - PMC - PubMed
    1. Bartlett S. R., Wertheim J. O., Bull R. A., Matthews G. V., Lamoury F. M., Scheffler K., et al. (2017). A molecular transmission network of recent hepatitis c infection in people with and without hiv: implications for targeted treatment strategies. J. viral Hepat. 24, 404–411. 10.1111/jvh.12652 - DOI - PMC - PubMed
    1. Bbosa N., Ssemwanga D., Kaleebu P. (2020). Short communication: choosing the right program for the identification of HIV-1 transmission networks from nucleotide sequences sampled from different populations. AIDS Res. Hum. retroviruses 36, 948–951. 10.1089/AID.2020.0033 - DOI - PMC - PubMed
    1. Billings E., Kijak G. H., Sanders-Buell E., Ndembi N., O’Sullivan A. M., Adebajo S., et al. (2019). New subtype b containing hiv-1 circulating recombinant of sub-saharan africa origin in nigerian men who have sex with men. J. Acquir Immune Defic. Syndr. 81, 578–584. 10.1097/QAI.0000000000002076 - DOI - PMC - PubMed
    1. Boender T. S., Smit C., Sighem A. v., Bezemer D., Ester C. J., Zaheri S., et al. (2018). AIDS therapy evaluation in The Netherlands (ATHENA) national observational HIV cohort: cohort profile. BMJ Open 8, e022516. 10.1136/bmjopen-2018-022516 - DOI - PMC - PubMed