Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 21;4(4):e0000825.
doi: 10.1371/journal.pdig.0000825. eCollection 2025 Apr.

Addressing data management and analysis challenges in viral genomics: The Swiss HIV cohort study viral next generation sequencing database

Affiliations

Addressing data management and analysis challenges in viral genomics: The Swiss HIV cohort study viral next generation sequencing database

Marius Zeeb et al. PLOS Digit Health. .

Abstract

Numerous HIV related outcomes can be determined on the viral genome, for example, resistance associated mutations, population transmission dynamics, viral heritability traits, or time since infection. Viral sequences of people with HIV (PWH) are therefore essential for therapeutic and research purposes. While in the first three decades of the HIV pandemic viral genomes were mainly sequenced using Sanger sequencing, the last decade has seen a shift towards next-generation sequencing (NGS) as the preferred method. NGS can achieve near full length genome sequence coverage and simultaneously, it accurately encapsulates the within-host diversity by characterizing HIV subpopulations. NGS opens new avenues for HIV research, but it also presents challenges concerning data management and analysis. We therefore set up the Swiss HIV Cohort Study Viral NGS Database (SHCND) to address key issues in the handling of NGS data including high loads of raw- and processed NGS data, data storage solutions, downstream application of sophisticated bioinformatic tools, high-performance computing resources, and reproducibility. The database is nested within the Swiss HIV Cohort Study (SHCS) and the Zurich Primary HIV Infection Cohort Study (ZPHI), which together enrolled 21,876 PWH since 1988 and include a biobank dating back to the early nineties. Since its initiation in 2018, the SHCND accumulated NGS sequences (plasma and proviral origin) of 5,178 unique PWH. We here describe the design, set-up, and use of this NGS database. Overall, the SHCND has contributed to several research projects on HIV pathogenesis, treatment, drug resistance, and molecular epidemiology, and has thereby become a central part of HIV-genomics research in Switzerland.

PubMed Disclaimer

Conflict of interest statement

I have read the journal's policy and the authors of this manuscript have the following competing interests: Within the last 5 years, K.J.M. has received travel grants and advisory board honoraria from Gilead Sciences and ViiV; and the University of Zurich received research grants from Gilead Sciences and Novartis for studies that Dr Metzner serves as principal investigator. H. F. G. has received research grants from the Swiss National Science Foundation, Swiss HIV Cohort Study, Yvonne Jacob Foundation, NIH, Gilead, ViiV, and is a subcontractor to a Bill and Melinda Gates foundation grant, paid to his institution; personal honoraria for data safety monitoring board or advisory board consultations from Merck, ViiV healthcare, Gilead Sciences, Janssen, Johnson and Johnson, Novartis, and GSK; and personal travel expenses from Gilead. All other authors declare no conflicts of interest.

Figures

Fig 1
Fig 1. From sample to genome.
Description of the steps from blood sampling of people with HIV, Next Generation Sequencing and its data handling, to the bio-informatics tools for computational HIV genome analysis.
Fig 2
Fig 2. Illustration of the database storage.
(A) NGS storage and (B) Bioinformatic result storage implementation in SHCND. Basically, a graph of UUID labeled nodes is formed. The graph structure (edges) itself is also materialized in UUID-named files containing JSON (not shown). Furthermore, the exact processing tool used to generate each result container is included in the metadata of each result container.
Fig 3
Fig 3. Illustration of the bioinformatic pipeline processing.
From initial job submission to storage of processed results. In the case of “global” processing tools, multiple samples’ fastq and output files from previous processing tool runs can be retrieved as inputs to the Processing step (not shown).
Fig 4
Fig 4. Sequence availability and timing.
(A) Sequences available per gene and for the whole genome in the Swiss HIV Cohort Study Viral NGS Database (SHCND), stratified by sample source. Viral gene sequences considered, cover at least 40% of the respective gene length compared to the HIV-1 HXB2 gene reference sequence (GenBank accession number K03455). (B) Cumulative sample counts by sampling year and year when NGS was performed, stratified by HIV-1 plasma RNA and HIV-1 proviral DNA.

Similar articles

Cited by

References

    1. Chabria S, Gupta S, Kozal M. Deep sequencing of HIV: clinical and research applications. Annual Review of Genomics and Human Genetics. 2014;15:295–325. https://pubmed.ncbi.nlm.nih.gov/24821496/ - PubMed
    1. Bonsall D, Golubchik T, de Cesare M, Limbada M, Kosloff B, MacIntyre-Cockett G, et al.. A comprehensive genomics solution for HIV surveillance and clinical monitoring in low-income settings. Journal of Clinical Microbiology. 2020;58(10):382–402. https://pmc/articles/PMC7512176/ - PMC - PubMed
    1. McLaren PJ, Fellay J. HIV-1 and human genetic variation. Nat Rev Genet. 2021;22(10):645–57. doi: 10.1038/s41576-021-00378-0 - DOI - PMC - PubMed
    1. Gabrielaite M, Bennedbæk M, Zucco A, Ekenberg C, Murray D, Kan V. Human immunotypes impose selection on viral genotypes through viral epitope specificity. J Infect Dis. 2021;224(12):2053–63. - PMC - PubMed
    1. Wong JK, Ignacio CC, Torriani F, Havlir D, Fitch NJ, Richman DD. In vivo compartmentalization of human immunodeficiency virus: evidence from the examination of pol sequences from autopsy tissues. J Virol. 1997;71(3):2059–71. doi: 10.1128/JVI.71.3.2059-2071.1997 - DOI - PMC - PubMed

LinkOut - more resources