Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 5;8(2):veac114.
doi: 10.1093/ve/veac114. eCollection 2022.

Identifying high-confidence variants in human cytomegalovirus genomes sequenced from clinical samples

Affiliations

Identifying high-confidence variants in human cytomegalovirus genomes sequenced from clinical samples

Salvatore Camiolo et al. Virus Evol. .

Abstract

Understanding the intrahost evolution of viral populations has implications in pathogenesis, diagnosis, and treatment and has recently made impressive advances from developments in high-throughput sequencing. However, the underlying analyses are very sensitive to sources of bias, error, and artefact in the data, and it is important that these are addressed adequately if robust conclusions are to be drawn. The key factors include (1) determining the number of viral strains present in the sample analysed; (2) monitoring the extent to which the data represent these strains and assessing the quality of these data; (3) dealing with the effects of cross-contamination; and (4) ensuring that the results are reproducible. We investigated these factors by generating sequence datasets, including biological and technical replicates, directly from clinical samples obtained from a small cohort of patients who had been infected congenitally with the herpesvirus human cytomegalovirus, with the aim of developing a strategy for identifying high-confidence intrahost variants. We found that such variants were few in number and typically present in low proportions and concluded that human cytomegalovirus exhibits a very low level of intrahost variability. In addition to clarifying the situation regarding human cytomegalovirus, our strategy has wider applicability to understanding the intrahost variability of other viruses.

Keywords: congenital infection; human cytomegalovirus; intrahost evolution; sequence variability.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Viral population dynamics of the cohort samples. As indicated in the key, the colours of the lines represent the sample types, and the duration of antiviral therapy is represented by the coloured blocks. Approximate sampling time points (M, months) are indicated on the x axis and viral loads (log10 IU/mL) on the y axis.
Figure 2.
Figure 2.
Schematic representation of the experimental design for generating biological and technical replicate sequencing libraries.
Figure 3.
Figure 3.
Application of preprocessing steps to the cohort datasets. (A) Schematic representation of the filtering steps (see text for details), in which each line represents a paired-end read. (B) Correlation between the proportion of reads removed and the number of PCR cycles performed during the post-enrichment amplification step. The proportions of reads removed for sequencing libraries undergoing the same number of PCR cycles were averaged. (C) Number of SNVs detected using LoFreq after each filtering step and the number remaining after application of the Cov > 9 threshold. (D) Violin plots showing the distribution of the number of reads identifying the SNVs detected using LoFreq after applying the filtering steps at each time point. The number of supporting reads was set to 20 when >20.
Figure 4.
Figure 4.
Application of dataset filtering steps to the public datasets (Supplementary Table S2). (A) Number of SNVs detected using LoFreq grouped by patient after each step. The colour of each point indicates the patient, and the shape indicates whether the dataset represented a single-strain (dot) or a multiple-strain (star) infection. (B) Violin plots showing the distribution of the number of reads identifying the SNVs detected using LoFreq for each dataset. The number of supporting reads was truncated to 25 when >20.

Similar articles

References

    1. Bankevich A. et al. (2012) ‘SPAdes: A New Genome Assembly Algorithm and its Applications to Single-cell Sequencing’, Journal of Computational Biology, 19: 455–77. - PMC - PubMed
    1. Beerenwinkel N. et al. (2012) ‘Challenges and Opportunities in Estimating Viral Genetic Diversity from Next-generation Sequencing Data’, Frontiers in Microbiology, 3: 329. - PMC - PubMed
    1. Bian X. et al. (2018) ‘Comparing the Performance of Selected Variant Callers Using Synthetic Data and Genome Segmentation’, BMC Bioinformatics, 19: 1–11. - PMC - PubMed
    1. Camiolo S. et al. (2021) ‘GRACy: A Tool for Analysing Human Cytomegalovirus Sequence Data’, Virus Evolution, 7: veaa099. - PMC - PubMed
    1. Cudini J. et al. (2019) ‘Human Cytomegalovirus Haplotype Reconstruction Reveals High Diversity Due to Superinfection and Evidence of Within-host Recombination’, Proceedings of the National Academy of Sciences of the United States of America, 116: 5693–8. - PMC - PubMed