. 2016 Sep 5;17(1):708.

doi: 10.1186/s12864-016-3030-6.

Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler

Samuel S Shepard¹, Sarah Meno², Justin Bahl³, Malania M Wilson^{2

4}, John Barnes², Elizabeth Neuhaus⁵

Affiliations

¹ Influenza Division, Centers for Disease Control and Prevention, 1600 Clifton Road, Atlanta, GA, 30329, USA. vfn4@cdc.gov.
² Influenza Division, Centers for Disease Control and Prevention, 1600 Clifton Road, Atlanta, GA, 30329, USA.
³ Center for Infectious Diseases, The University of Texas School of Public Health, Houston, TX, USA.
⁴ Battelle Memorial Research Institute, 1600 Clifton Road, Atlanta, GA, 30329, USA.
⁵ Influenza Division, Centers for Disease Control and Prevention, 1600 Clifton Road, Atlanta, GA, 30329, USA. ebn9@cdc.gov.

PMID: 27595578
PMCID: PMC5011931
DOI: 10.1186/s12864-016-3030-6

Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler

Samuel S Shepard et al. BMC Genomics. 2016.

. 2016 Sep 5;17(1):708.

doi: 10.1186/s12864-016-3030-6.

Authors

Samuel S Shepard¹, Sarah Meno², Justin Bahl³, Malania M Wilson^{2

4}, John Barnes², Elizabeth Neuhaus⁵

Affiliations

¹ Influenza Division, Centers for Disease Control and Prevention, 1600 Clifton Road, Atlanta, GA, 30329, USA. vfn4@cdc.gov.
² Influenza Division, Centers for Disease Control and Prevention, 1600 Clifton Road, Atlanta, GA, 30329, USA.
³ Center for Infectious Diseases, The University of Texas School of Public Health, Houston, TX, USA.
⁴ Battelle Memorial Research Institute, 1600 Clifton Road, Atlanta, GA, 30329, USA.
⁵ Influenza Division, Centers for Disease Control and Prevention, 1600 Clifton Road, Atlanta, GA, 30329, USA. ebn9@cdc.gov.

PMID: 27595578
PMCID: PMC5011931
DOI: 10.1186/s12864-016-3030-6

Erratum in

Erratum to: Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler.
Shepard SS, Meno S, Bahl J, Wilson MM, Barnes J, Neuhaus E. Shepard SS, et al. BMC Genomics. 2016 Oct 13;17(1):801. doi: 10.1186/s12864-016-3138-8. BMC Genomics. 2016. PMID: 27737640 Free PMC article. No abstract available.

Abstract

Background: Deep sequencing makes it possible to observe low-frequency viral variants and sub-populations with greater accuracy and sensitivity than ever before. Existing platforms can be used to multiplex a large number of samples; however, analysis of the resulting data is complex and involves separating barcoded samples and various read manipulation processes ending in final assembly. Many assembly tools were designed with larger genomes and higher fidelity polymerases in mind and do not perform well with reads derived from highly variable viral genomes. Reference-based assemblers may leave gaps in viral assemblies while de novo assemblers may struggle to assemble unique genomes.

Results: The IRMA (iterative refinement meta-assembler) pipeline solves the problem of viral variation by the iterative optimization of read gathering and assembly. As with all reference-based assembly, reads are included in assembly when they match consensus template sets; however, IRMA provides for on-the-fly reference editing, correction, and optional elongation without the need for additional reference selection. This increases both read depth and breadth. IRMA also focuses on quality control, error correction, indel reporting, variant calling and variant phasing. In fact, IRMA's ability to detect and phase minor variants is one of its most distinguishing features. We have built modules for influenza and ebolavirus. We demonstrate usage and provide calibration data from mixture experiments. Methods for variant calling, phasing, and error estimation/correction have been redesigned to meet the needs of viral genomic sequencing.

Conclusion: IRMA provides a robust next-generation sequencing assembly solution that is adapted to the needs and characteristics of viral genomes. The software solves issues related to the genetic diversity of viruses while providing customized variant calling, phasing, and quality control. IRMA is freely available for non-commercial use on Linux and Mac OS X and has been parallelized for high-throughput computing.

Keywords: Deep sequencing; Ebola; High throughput; Influenza; NGS; Public health; Surveillance.

PubMed Disclaimer

Figures

**Fig. 1**
Iterative refinement meta-assembler (IRMA) workflow: the influenza module. (a) The general process of sequencing a segmented RNA virus and assembling with IRMA. (b) Diagram of IRMA steps 1 through 9, showing the iterative processes involved. Steps in (b) are also labeled under the steps of (a) where they correspond

**Fig. 2**
Genetic diversity of 1097 influenza A(H3) hemagglutinins collected in 2012. a Upper triangle of the host group-to-group average pairwise distance matrix plotted on and expressed as the number of estimated mutations per 150 nucleotides. b Density plot for the upper triangle of the pairwise distance matrix over all sequences, plotted and expressed on the same scale. c Maximum likelihood phylogenetic tree of H3 HA sequences with labeling by host group color

**Fig. 3**
Sensitivity to influenza biological diversity. For each influenza type, subtype, and gene segment—39 alignments in all—randomly chosen subsequences of fixed length (150 nucleotides) were matched against alignment consensuses with programs shown. Hamming distance is the number of mismatching nucleotides between subsequences and references. a Histograms, with binning width 5, give the count of the subsequences matched to a flu consensus sequence or not. b Line plots show normalized frequencies at each un-binned hamming distance and further characterize matched fragments into the proportion of misclassified fragments (matched to the wrong influenza consensus). Tabular summaries represent the total count or proportion for each method across all 205,873 subsequences. The dashed vertical lines represent the general limit of detection for non-statistical approaches. c The minimum identity parameter (p) for BLAT was varied on the same dataset with summary proportions and counts shown for each parameter value

**Fig. 4**
Heamgglutinin cumulative coverage depth by IRMA maximum (read-gather) rounds. A randomly primed A/equine/Detroit/3/64-like sample is assembled against influenza A H7 HA global consensus, and three other full CDS references from the H7 tree using IRMA with max rounds set to 1, 2, 3 and 4. Non-iterative assembly (Bowtie2, local, very-sensitive) to the same references is in *light gray*

**Fig. 5**
Neuraminadase cumulative coverage depth by IRMA maximum (read-gather) rounds. A randomly primed A/equine/Detroit/3/64-like sample is assembled against influenza A N7 NA global consensus, and three other full CDS references from the N7 tree using IRMA with max rounds set to 1, 2, 3 and 4. Non-iterative assembly (Bowtie2, local, very-sensitive) to the same references is in *light gray*

**Fig. 6**
Assembled consensus differences to baseline. Known baseline consensus sequences correspond to an A/equine/Detroit/3/64-like sample. Differences versus the baselines are shown for (a) H7 HA and (b) N7 NA starting references and for the progression of assembled consensus sequences corresponding to each maximum IRMA round. A non-iterative assembly (Bowtie2, local, very sensitive) using each starting reference is shown for comparison. Match, mismatch, deletion, and insertion states are relative to the baseline in blue, red, yellow, and green respectively while white is used for a baseline gap in the alignment created by non-baseline insertions. Percent identity versus the baseline sequence is shown to the right of each graphed sequence. The phylogenetic trees depict approximate placement of the starting references on our H7 and N7 datasets, with the baseline labeled as “+”

**Fig. 7**
The artificial mixture, variant calling, and phasing of variants for an influenza A(H3N2) M gene. Donor viruses (1) are mixed (2) in a 99:1 ratio with new variants called (3) for the mixed virus and pairwise tested for phasing and visualization (4) by heat map. Consensus and minority phases are colored *red* and *blue* corresponding to the consensus alleles of each parent donor virus. Single nucleotide variants are shown with a triangle and colored according to their phase. Independent phase SNVs—without linkage to other minority variants—have *green* and *gray* triangles

See this image and copyright information in PMC

References

1. Reed C, Chaves SS, Daily Kirley P, Emerson R, Aragon D, Hancock EB, Butler L, Baumbach J, Hollick G, Bennett NM, et al. Estimating influenza disease burden from population-based surveillance data in the United States. PLoS One. 2015;10(3):e0118369. doi: 10.1371/journal.pone.0118369. - DOI - PMC - PubMed
1. FluNet: total influenza A and B specimens detected. [http://www.who.int/influenza/gisrs_laboratory/flunet]. Accessed 7 Nov 2015.
1. Westgeest KB, Russell CA, Lin X, Spronken MI, Bestebroer TM, Bahl J, van Beek R, Skepner E, Halpin RA, de Jong JC, et al. Genomewide analysis of reassortment and evolution of human influenza A(H3N2) viruses circulating between 1968 and 2011. J Virol. 2014;88(5):2844–57. doi: 10.1128/JVI.02163-13. - DOI - PMC - PubMed
1. Gatherer D. The 2009 H1N1 influenza outbreak in its historical context. J Clin Virol. 2009;45(3):174–8. doi: 10.1016/j.jcv.2009.06.004. - DOI - PubMed
1. Watson SJ, Welkers MR, Depledge DP, Coulter E, Breuer JM, de Jong MD, Kellam P. Viral population analysis and minority-variant detection using short read next-generation sequencing. Philos Trans R Soc Lond B Biol Sci. 2013;368(1614):20120205. doi: 10.1098/rstb.2012.0205. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

HHSN272201400006C/AI/NIAID NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler

Affiliations

Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler

Authors

Affiliations

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials