Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 13;15(3):737.
doi: 10.3390/v15030737.

An Automated Bioinformatics Pipeline Informing Near-Real-Time Public Health Responses to New HIV Diagnoses in a Statewide HIV Epidemic

Affiliations

An Automated Bioinformatics Pipeline Informing Near-Real-Time Public Health Responses to New HIV Diagnoses in a Statewide HIV Epidemic

Mark Howison et al. Viruses. .

Abstract

Molecular HIV cluster data can guide public health responses towards ending the HIV epidemic. Currently, real-time data integration, analysis, and interpretation are challenging, leading to a delayed public health response. We present a comprehensive methodology for addressing these challenges through data integration, analysis, and reporting. We integrated heterogeneous data sources across systems and developed an open-source, automatic bioinformatics pipeline that provides molecular HIV cluster data to inform public health responses to new statewide HIV-1 diagnoses, overcoming data management, computational, and analytical challenges. We demonstrate implementation of this pipeline in a statewide HIV epidemic and use it to compare the impact of specific phylogenetic and distance-only methods and datasets on molecular HIV cluster analyses. The pipeline was applied to 18 monthly datasets generated between January 2020 and June 2022 in Rhode Island, USA, that provide statewide molecular HIV data to support routine public health case management by a multi-disciplinary team. The resulting cluster analyses and near-real-time reporting guided public health actions in 37 phylogenetically clustered cases out of 57 new HIV-1 diagnoses. Of the 37, only 21 (57%) clustered by distance-only methods. Through a unique academic-public health partnership, an automated open-source pipeline was developed and applied to prospective, routine analysis of statewide molecular HIV data in near-real-time. This collaboration informed public health actions to optimize disruption of HIV transmission.

Keywords: HIV transmission networks; contact tracing; molecular HIV clusters; molecular epidemiology; near-real-time data integration; phylogenetics.

PubMed Disclaimer

Conflict of interest statement

M.H. is currently Sr. Data Scientist at Amazon.com, Inc., but conducted this research prior to starting that role.

Figures

Figure 1
Figure 1
Title: Integrated report for monthly academic-public health case management meetings. Legend: An example of the summary page from automated monthly reports generated by the pipeline is illustrated here with synthetic data to protect patient privacy. The table summarizes demographic and clinical data of all nine newly available RI sequences in the past month, their index case status, and their cluster analysis outcomes that are relevant to the monthly case management meeting of new HIV diagnoses in the state. Numbers in green and gray in parentheses indicate comparison to the prior month. Missing values are indicated by ‘-’. Viral load is in copies/mL, approximated. MSM, men who have sex with men; DRMs, drug resistance mutations; SDRMs, surveillance drug resistance mutations. Notes: All information shown in this table is synthetic and for illustrative purposes only. No information from real patients is shown.
Figure 2
Figure 2
Title: A growing cluster with differential detection between phylogenetic and distance-only methods. Legend: The figure shows a comparison between a phylogenetic cluster (panel (a)) and a HIV-TRACE cluster graph (panel (b)) for those same cases. The 100% bootstrap supported (red 100) cluster in panel a (red box) contains a new index case from dataset 18 (A, red); three prior index cases from datasets 7, 9, and 12 (D–F, blue), which formed a cluster in the analysis of dataset 12 by both phylogenetic methods and HIV-TRACE at the 0.5% threshold; and previously un-clustered cases B, C, and G. Panel (a) also shows the nearest non-clustered case (H). The HIV-TRACE distance-based cluster graph in panel (b) of the same cases demonstrates that only cases D, E, and F remain part of the cluster in dataset 18 (blue edges). Cases A, B, C, and G are not clustered by HIV-TRACE as their pairwise distances (gray edges) are larger than distance thresholds established by the Centers for Disease Control and Prevention (CDC). Notes: Branch lengths in (a) are scaled by the estimated substitutions per site in the phylogeny, while edge thicknesses in (b) are scaled by the TN93 pairwise genetic distance calculated by HIV-TRACE. Phylogenetic bootstrap support (out of 100 bootstrap replicates) is shown in small text next to splits in the tree in (a) and the split with bootstrap support of 100 that defines the cluster is highlighted with a red dot.
Figure 3
Figure 3
Title: Visual representation of HIV cluster growth in RI. Legend: This figure illustrates one visual output of the automated pipeline that is used in monthly academic-public health case management discussions. Each row represents one cluster’s lifespan according to years of HIV diagnosis of its members (X axis). Red dots indicate new index HIV cases in the current month. Blue dots indicate new HIV index cases in the prior 18 datasets. Gray dots indicate cluster members who were diagnosed prior to the start of the study. For patient privacy concerns, we use the rank ordering of diagnosis dates instead of the exact dates.

References

    1. Fauci A.S., Redfield R.R., Sigounas G. Ending the HIV epidemic: A plan for the United States. JAMA. 2019;321:844–845. doi: 10.1001/jama.2019.1343. - DOI - PubMed
    1. Castro-Nallar E., Pérez-Losada M., Burton G.F., Crandall K.A. The evolution of HIV: Inferences using phylogenetics. Mol. Phylogenet. Evol. 2012;62:777–792. doi: 10.1016/j.ympev.2011.11.019. - DOI - PMC - PubMed
    1. Adler M.W., Johnson A.M. Contact tracing for HIV infection. Br. Med. J. 1988;296:1420–1421. doi: 10.1136/bmj.296.6634.1420. - DOI - PMC - PubMed
    1. Smith D.M., May S., Tweeten S., Drumright L., Pacold M.E., Pond S.L., Pesano R.L., Lie Y.S., Richman D.D., Frost S.D., et al. A public health model for the molecular surveillance of HIV transmission in San Diego, California. AIDS. 2009;23:225–232. doi: 10.1097/QAD.0b013e32831d2a81. - DOI - PMC - PubMed
    1. Kantor R., Fulton J.P., Steingrimsson J., Novitsky V., Howison M., Gillani F., Li Y., Manne A., Parillo Z., Spence M., et al. Challenges in evaluating the use of viral sequence data to identify HIV transmission networks for public health. Stat. Commun. Infect. Dis. 2020;12:20190019. doi: 10.1515/scid-2019-0019. - DOI - PMC - PubMed

Publication types