Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 5;17(1):708.
doi: 10.1186/s12864-016-3030-6.

Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler

Affiliations

Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler

Samuel S Shepard et al. BMC Genomics. .

Erratum in

Abstract

Background: Deep sequencing makes it possible to observe low-frequency viral variants and sub-populations with greater accuracy and sensitivity than ever before. Existing platforms can be used to multiplex a large number of samples; however, analysis of the resulting data is complex and involves separating barcoded samples and various read manipulation processes ending in final assembly. Many assembly tools were designed with larger genomes and higher fidelity polymerases in mind and do not perform well with reads derived from highly variable viral genomes. Reference-based assemblers may leave gaps in viral assemblies while de novo assemblers may struggle to assemble unique genomes.

Results: The IRMA (iterative refinement meta-assembler) pipeline solves the problem of viral variation by the iterative optimization of read gathering and assembly. As with all reference-based assembly, reads are included in assembly when they match consensus template sets; however, IRMA provides for on-the-fly reference editing, correction, and optional elongation without the need for additional reference selection. This increases both read depth and breadth. IRMA also focuses on quality control, error correction, indel reporting, variant calling and variant phasing. In fact, IRMA's ability to detect and phase minor variants is one of its most distinguishing features. We have built modules for influenza and ebolavirus. We demonstrate usage and provide calibration data from mixture experiments. Methods for variant calling, phasing, and error estimation/correction have been redesigned to meet the needs of viral genomic sequencing.

Conclusion: IRMA provides a robust next-generation sequencing assembly solution that is adapted to the needs and characteristics of viral genomes. The software solves issues related to the genetic diversity of viruses while providing customized variant calling, phasing, and quality control. IRMA is freely available for non-commercial use on Linux and Mac OS X and has been parallelized for high-throughput computing.

Keywords: Deep sequencing; Ebola; High throughput; Influenza; NGS; Public health; Surveillance.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Iterative refinement meta-assembler (IRMA) workflow: the influenza module. (a) The general process of sequencing a segmented RNA virus and assembling with IRMA. (b) Diagram of IRMA steps 1 through 9, showing the iterative processes involved. Steps in (b) are also labeled under the steps of (a) where they correspond
Fig. 2
Fig. 2
Genetic diversity of 1097 influenza A(H3) hemagglutinins collected in 2012. a Upper triangle of the host group-to-group average pairwise distance matrix plotted on and expressed as the number of estimated mutations per 150 nucleotides. b Density plot for the upper triangle of the pairwise distance matrix over all sequences, plotted and expressed on the same scale. c Maximum likelihood phylogenetic tree of H3 HA sequences with labeling by host group color
Fig. 3
Fig. 3
Sensitivity to influenza biological diversity. For each influenza type, subtype, and gene segment—39 alignments in all—randomly chosen subsequences of fixed length (150 nucleotides) were matched against alignment consensuses with programs shown. Hamming distance is the number of mismatching nucleotides between subsequences and references. a Histograms, with binning width 5, give the count of the subsequences matched to a flu consensus sequence or not. b Line plots show normalized frequencies at each un-binned hamming distance and further characterize matched fragments into the proportion of misclassified fragments (matched to the wrong influenza consensus). Tabular summaries represent the total count or proportion for each method across all 205,873 subsequences. The dashed vertical lines represent the general limit of detection for non-statistical approaches. c The minimum identity parameter (p) for BLAT was varied on the same dataset with summary proportions and counts shown for each parameter value
Fig. 4
Fig. 4
Heamgglutinin cumulative coverage depth by IRMA maximum (read-gather) rounds. A randomly primed A/equine/Detroit/3/64-like sample is assembled against influenza A H7 HA global consensus, and three other full CDS references from the H7 tree using IRMA with max rounds set to 1, 2, 3 and 4. Non-iterative assembly (Bowtie2, local, very-sensitive) to the same references is in light gray
Fig. 5
Fig. 5
Neuraminadase cumulative coverage depth by IRMA maximum (read-gather) rounds. A randomly primed A/equine/Detroit/3/64-like sample is assembled against influenza A N7 NA global consensus, and three other full CDS references from the N7 tree using IRMA with max rounds set to 1, 2, 3 and 4. Non-iterative assembly (Bowtie2, local, very-sensitive) to the same references is in light gray
Fig. 6
Fig. 6
Assembled consensus differences to baseline. Known baseline consensus sequences correspond to an A/equine/Detroit/3/64-like sample. Differences versus the baselines are shown for (a) H7 HA and (b) N7 NA starting references and for the progression of assembled consensus sequences corresponding to each maximum IRMA round. A non-iterative assembly (Bowtie2, local, very sensitive) using each starting reference is shown for comparison. Match, mismatch, deletion, and insertion states are relative to the baseline in blue, red, yellow, and green respectively while white is used for a baseline gap in the alignment created by non-baseline insertions. Percent identity versus the baseline sequence is shown to the right of each graphed sequence. The phylogenetic trees depict approximate placement of the starting references on our H7 and N7 datasets, with the baseline labeled as “+”
Fig. 7
Fig. 7
The artificial mixture, variant calling, and phasing of variants for an influenza A(H3N2) M gene. Donor viruses (1) are mixed (2) in a 99:1 ratio with new variants called (3) for the mixed virus and pairwise tested for phasing and visualization (4) by heat map. Consensus and minority phases are colored red and blue corresponding to the consensus alleles of each parent donor virus. Single nucleotide variants are shown with a triangle and colored according to their phase. Independent phase SNVs—without linkage to other minority variants—have green and gray triangles

References

    1. Reed C, Chaves SS, Daily Kirley P, Emerson R, Aragon D, Hancock EB, Butler L, Baumbach J, Hollick G, Bennett NM, et al. Estimating influenza disease burden from population-based surveillance data in the United States. PLoS One. 2015;10(3):e0118369. doi: 10.1371/journal.pone.0118369. - DOI - PMC - PubMed
    1. FluNet: total influenza A and B specimens detected. [http://www.who.int/influenza/gisrs_laboratory/flunet]. Accessed 7 Nov 2015.
    1. Westgeest KB, Russell CA, Lin X, Spronken MI, Bestebroer TM, Bahl J, van Beek R, Skepner E, Halpin RA, de Jong JC, et al. Genomewide analysis of reassortment and evolution of human influenza A(H3N2) viruses circulating between 1968 and 2011. J Virol. 2014;88(5):2844–57. doi: 10.1128/JVI.02163-13. - DOI - PMC - PubMed
    1. Gatherer D. The 2009 H1N1 influenza outbreak in its historical context. J Clin Virol. 2009;45(3):174–8. doi: 10.1016/j.jcv.2009.06.004. - DOI - PubMed
    1. Watson SJ, Welkers MR, Depledge DP, Coulter E, Breuer JM, de Jong MD, Kellam P. Viral population analysis and minority-variant detection using short read next-generation sequencing. Philos Trans R Soc Lond B Biol Sci. 2013;368(1614):20120205. doi: 10.1098/rstb.2012.0205. - DOI - PMC - PubMed

Publication types

LinkOut - more resources