. 2019 May 8;9(1):7081.

doi: 10.1038/s41598-019-43524-9.

Illumina and Nanopore methods for whole genome sequencing of hepatitis B virus (HBV)

Anna L McNaughton¹, Hannah E Roberts², David Bonsall^{1

3

4}, Mariateresa de Cesare², Jolynne Mokaya¹, Sheila F Lumley^{1

3}, Tanya Golubchik^{2

4}, Paolo Piazza⁵, Jacqueline B Martin⁶, Catherine de Lara¹, Anthony Brown¹, M Azim Ansari¹, Rory Bowden², Eleanor Barnes^{1

7

8}, Philippa C Matthews^{9

10

11}

Affiliations

¹ Nuffield Department of Medicine, Medawar Building, University of Oxford, South Parks Road, Oxford, OX1 3SY, UK.
² Wellcome Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK.
³ Department of Infectious Diseases and Microbiology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK.
⁴ Big Data Institute, Old Road, Oxford, OX3 7FZ, UK.
⁵ Imperial BRC Genomics Facility, Imperial College, London, UK.
⁶ Gastroenterology and Hepatology Clinical Trials Facility, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, OX3 9DU, UK.
⁷ Department of Hepatology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, OX3 9DU, UK.
⁸ NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, OX3 9DU, UK.
⁹ Nuffield Department of Medicine, Medawar Building, University of Oxford, South Parks Road, Oxford, OX1 3SY, UK. philippa.matthews@ndm.ox.ac.uk.
¹⁰ Department of Infectious Diseases and Microbiology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK. philippa.matthews@ndm.ox.ac.uk.
¹¹ NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, OX3 9DU, UK. philippa.matthews@ndm.ox.ac.uk.

PMID: 31068626
PMCID: PMC6506499
DOI: 10.1038/s41598-019-43524-9

Illumina and Nanopore methods for whole genome sequencing of hepatitis B virus (HBV)

Anna L McNaughton et al. Sci Rep. 2019.

. 2019 May 8;9(1):7081.

doi: 10.1038/s41598-019-43524-9.

Authors

Affiliations

¹ Nuffield Department of Medicine, Medawar Building, University of Oxford, South Parks Road, Oxford, OX1 3SY, UK.
² Wellcome Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK.
³ Department of Infectious Diseases and Microbiology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK.
⁴ Big Data Institute, Old Road, Oxford, OX3 7FZ, UK.
⁵ Imperial BRC Genomics Facility, Imperial College, London, UK.
⁶ Gastroenterology and Hepatology Clinical Trials Facility, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, OX3 9DU, UK.
⁷ Department of Hepatology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, OX3 9DU, UK.
⁸ NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, OX3 9DU, UK.
⁹ Nuffield Department of Medicine, Medawar Building, University of Oxford, South Parks Road, Oxford, OX1 3SY, UK. philippa.matthews@ndm.ox.ac.uk.
¹⁰ Department of Infectious Diseases and Microbiology, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK. philippa.matthews@ndm.ox.ac.uk.
¹¹ NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, OX3 9DU, UK. philippa.matthews@ndm.ox.ac.uk.

PMID: 31068626
PMCID: PMC6506499
DOI: 10.1038/s41598-019-43524-9

Abstract

Advancing interventions to tackle the huge global burden of hepatitis B virus (HBV) infection depends on improved insights into virus epidemiology, transmission, within-host diversity, drug resistance and pathogenesis, all of which can be advanced through the large-scale generation of full-length virus genome data. Here we describe advances to a protocol that exploits the circular HBV genome structure, using isothermal rolling-circle amplification to enrich HBV DNA, generating concatemeric amplicons containing multiple successive copies of the same genome. We show that this product is suitable for Nanopore sequencing as single reads, as well as for generating short-read Illumina sequences. Nanopore reads can be used to implement a straightforward method for error correction that reduces the per-read error rate, by comparing multiple genome copies combined into a single concatemer and by analysing reads generated from plus and minus strands. With this approach, we can achieve an improved consensus sequencing accuracy of 99.7% and resolve intra-sample sequence variants to form whole-genome haplotypes. Thus while Illumina sequencing may still be the most accurate way to capture within-sample diversity, Nanopore data can contribute to an understanding of linkage between polymorphisms within individual virions. The combination of isothermal amplification and Nanopore sequencing also offers appealing potential to develop point-of-care tests for HBV, and for other viruses.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Schematic diagrams to show the pipeline for HBV sample processing. (A) (i) HBV genomes comprise partially double-stranded DNA in human plasma samples; (ii) completion-ligation (CL) derives a completely double-stranded DNA molecule; (iii) the complete dsDNA molecule is denatured and primers (red) bind; (iv) rolling circle amplification (RCA) generates genome concatemers, containing multiple end-to-end copies of the HBV genome (shown in orange). Amplification may also arise de novo due to priming along the length of the concatemer, creating a branched structure (primers shown in red). (B) Flow diagram to illustrate sample processing from from plasma through to HBV genome sequencing on Nanopore (yellow) and Illumina (red and green) platforms. This work flow allowed us to undertake a comparison between data derived from Illumina sequencing with RCA vs. without RCA, and comparison of RCA followed by sequencing using Illumina vs. Nanopore. Comparison of Nanopore with RCA vs. without RCA was not possible due to the requirement for amplification of HBV DNA prior to Nanopore sequencing (as shown in Table 2). (C) The sequence dataset derived from Nanopore comprises concatemeric reads comprising multiple reads of the same HBV genome (shown in orange). As indicated, concatemers containing three full length genomes also contain first and last segments that are partial (<3.2 kb). Other HBV genomes from among the quasispecies are represented by other individual concatemers (shown in blue, green, purple).

**Figure 2**
Comparison between HBV sequence coverage and diversity in Illumina sequences generated by completion/ligation (CL) alone, versus CL followed by Phi 29 rolling circle amplification (RCA). (A) Read depth across the length of the HBV genome for samples 1331, 1332 and 1348 by CL alone (solid lines) and by CL + RCA (dashed lines); (B) Average insert size across the HBV genome for sample 1348; (C) Variation detected in sequences based on CL alone, vs. CL + RCA. Each point represents a genome position with read depth >100. For each of these positions, variation is measured as the proportion of non-consensus base calls, and plotted for both sample types. The red dotted line indicates y = x. In all plots points are coloured by patient as follows: 1331 = orange, 1332 = grey, 1348 = blue.

**Figure 3**
HBV sequence data generated by Nanopore sequencing following completion/ligation (CL) of the genome and rolling circle amplification (RCA). (A) Read length and template length of all reads generated from sample 1331. ‘Template length’ refers to the length of the primary alignment of the read, based on a concatenated reference genome. Template length is capped at 3.3 kb. Reads with alignments ≥3.2 kb in length are considered ‘full length’ concatemers; these are shown in dark purple. (B) Plot to show the number of repeat segments in ‘full length’ concatemers. This is equal to the number of segments that a read is chopped into based on the repeated location of an anchor sequence (see methods for details). Reads with ≥5 repeat segments will contain ≥3 full length copies of the HBV genome, as shown in Fig. 1C. These are taken forward for error correction and further analysis.

**Figure 4**
Error correction in Nanopore HBV sequence dataset. Schematic to depict the identification and removal of basecaller errors. (i) 6 concatemers containing at least three full length HBV reads (plus two partial genome reads) are illustrated. The same 6 colours are used throughout this figure to indicate the concatemer of origin. (ii) Concatemers are shown chopped into full and partial genome reads, partitioned according to whether they align to the forward (LHS) or reverse (RHS) strand of the reference. (iii) Each position is considered independently. Aligned bases for the position in question are collected and grouped by concatemer, as shown by the coloured list of bases. (iv) Fisher’s Exact test is conducted to determine the strength of association between base and concatemer within each read set. In the example contingency table on the left for the forward read set, guanine is found consistently in the dark purple concatemer but not in the other two concatemers. (v) The example contingency table illustrates conducting a Chi-squared test to see whether concatemers containing the variant, guanine, are significantly more common in one of the two read sets (forward or reverse). Significance criteria for the tests in (iv) and (v) are shown on the flow diagram, with significant results highlighted in green and non-significant results highlighted in red. (vi) The corrected concatemer sequence for this position of interest is illustrated, for the case where concatemers are corrected using the whole sample consensus base (right), and for the case where concatemers are corrected using the within-concatemer consensus base (left). Note that the p-values from step (iv) are also used to assign a quality score to each variant, as described in the methods and reported in Suppl Table 3.

**Figure 5**
Comparison of HBV sequence data generated by Nanopore vs Illumina platforms, using completion/ligation (CL) and rolling circle amplification (RCA). (A) Proportion of non-consensus calls at each position in the genome based on Nanopore (y-axis) vs Illumina (x-axis), for samples 1331 (orange), 1332 (grey) and 1348 (blue). Note that the ‘proportion of non-consensus calls’ represents a slightly different quantity in the two data sets: in the Illumina data, an individual concatemer may give rise to multiple reads covering a position, where as in the Nanopore data each concatemer results in only one base call. The two sites with 100% variation in Nanopore data are positions 1741–1742 in sample 1332. These lie adjacent to a homopolymer repeat and the high error rate is the result of misalignment when the homopolymer length is miscalled. Positions that are only ever called as ambiguous in the Nanopore data are omitted from this plot (totalling 5 in both 1331 and 1348). Otherwise, sites called as ambiguous (‘N’) or gaps (‘−’) are considered ‘non-consensus’. (B) As for panel A, but sites called as ambiguous or gaps are not considered ‘non-consensus’ any more; only alternate bases (A,C,G,T) are included in the ‘non-consensus’ total. (C) Phylogenetic tree of consensus sequences for samples 1331 (orange), 1332 (grey) and 1348 (blue) generated by Illumina following CL, Illumina following CL + RCA, and Nanopore following CL + RCA sequencing, together with reference sequences for Genotypes A-H. Bootstrap values ≥80% are indicated. Scale bar shows substitutions per site.

**Figure 6**
Maximum parsimony trees showing haplotypes called using corrected Nanopore concatemers. For each of samples 1331 and 1348, the high quality variant calls (as listed in Suppl Table 3) were used as a definitive set of variant sites. For each corrected concatemer, the haplotype was called according to the corrected bases at these variant sites. Haplotypes that occurred at >1% frequency within the sample are shown here, with the additional exclusion of one haplotype in sample 1331 that occurred at much lower frequency than those shown (only 3 occurrences) and did not allow for construction of a maximum parsimony tree without homoplasy. Counts of haplotypes are recorded on the left hand side, while the frequency of the variants in the Illumina and Nanopore data is indicated in bar charts along the top of each diagram. Variants (bases differing from the consensus) are indicated with a red bar on the horizontal lines that represent the whole-genome haplotypes. A potential method for assigning quality scores to haplotype calls, based on the length and number of the concatemers supporting the call, is presented in Suppl Methods 3. Based on these calculations, all haplotypes with ≥ 3 concatemers supporting them have a phred-based quality score of >30.

See this image and copyright information in PMC

References

1. Razavi-Shearer Devin, Gamkrelidze Ivane, Nguyen Mindie H, Chen Ding-Shinn, Van Damme Pierre, Abbas Zaigham, Abdulla Maheeba, Abou Rached Antoine, Adda Danjuma, Aho Inka, Akarca Ulus, Hasan Fuad, Al Lawati Faryal, Al Naamani Khalid, Al-Ashgar Hamad Ibrahim, Alavian Seyed M, Alawadhi Sameer, Albillos Agustin, Al-Busafi Said A, Aleman Soo, Alfaleh Faleh Z, Aljumah Abdulrahman A, Anand Anil C, Anh Nguyen Thu, Arends Joop E, Arkkila Perttu, Athanasakis Kostas, Bane Abate, Ben-Ari Ziv, Berg Thomas, Bizri Abdul R, Blach Sarah, Brandão Mello Carlos E, Brandon Samantha M, Bright Bisi, Bruggmann Philip, Brunetto Maurizia, Buti Maria, Chan Henry L Y, Chaudhry Asad, Chien Rong-Nan, Choi Moon S, Christensen Peer B, Chuang Wan-Long, Chulanov Vladimir, Clausen Mette R, Colombo Massimo, Cornberg Markus, Cowie Benjamin, Craxi Antonio, Croes Esther A, Cuellar Diego Alberto, Cunningham Chris, Desalegn Hailemichael, Drazilova Sylvia, Duberg Ann-Sofi, Egeonu Steve S, El-Sayed Manal H, Estes Chris, Falconer Karolin, Ferraz Maria L G, Ferreira Paulo R, Flisiak Robert, Frankova Sona, Gaeta Giovanni B, García-Samaniego Javier, Genov Jordan, Gerstoft Jan, Goldis Adrian, Gountas Ilias, Gray Richard, Guimarães Pessôa Mário, Hajarizadeh Behzad, Hatzakis Angelos, Hézode Christophe, Himatt Sayed M, Hoepelman Andy, Hrstic Irena, Hui Yee-Tak T, Husa Petr, Jahis Rohani, Janjua Naveed Z, Jarčuška Peter, Jaroszewicz Jerzy, Kaymakoglu Sabahattin, Kershenobich David, Kondili Loreta A, Konysbekova Aliya, Krajden Mel, Kristian Pavol, Laleman Wim, Lao Wai-cheung C, Layden Jen, Lazarus Jeffrey V, Lee Mei-Hsuan, Liakina Valentina, Lim Young-Suk S, Loo Ching-kong K, Lukšić Boris, Malekzadeh Reza, Malu Abraham O, Mamatkulov Adkhamjon, Manns Michael, Marinho Rui T, Maticic Mojca, Mauss Stefan, Memon Muhammad S, Mendes Correa Maria C, Mendez-Sanchez Nahum, Merat Shahin, Metwally Ammal M, Mohamed Rosmawati, Mokhbat Jacques E, Moreno Christophe, Mossong Joel, Mourad Fadi H, Müllhaupt Beat, Murphy Kimberly, Musabaev Erkin, Nawaz Arif, Nde Helen M, Negro Francesco, Nersesov Alexander, Nguyen Van Thi Thuy, Njouom Richard, Ntagirabiri Renovat, Nurmatov Zuridin, Obekpa Solomon, Ocama Ponsiano, Oguche Stephen, Omede Ogu, Omuemu Casimir, Opare-Sem Ohene, Opio Christopher K, Örmeci Necati, Papatheodoridis George, Pasini Ken, Pimenov Nikolay, Poustchi Hossein, Quang Trân D, Qureshi Huma, Ramji Alnoor, Razavi-Shearer Kathryn, Redae Berhane, Reesink Henk W, Rios Cielo Yaneth, Rjaskova Gabriela, Robbins Sarah, Roberts Lewis R, Roberts Stuart K, Ryder Stephen D, Safadi Rifaat, Sagalova Olga, Salupere Riina, Sanai Faisal M, Sanchez-Avila Juan F, Saraswat Vivek, Sarrazin Christoph, Schmelzer Jonathan D, Schréter Ivan, Scott Julia, Seguin-Devaux Carole, Shah Samir R, Sharara Ala I, Sharma Manik, Shiha Gamal E, Shin Tesia, Sievert William, Sperl Jan, Stärkel Peter, Stedman Catherine, Sypsa Vana, Tacke Frank, Tan Soek S, Tanaka Junko, Tomasiewicz Krzysztof, Urbanek Petr, van der Meer Adriaan J, Van Vlierberghe Hans, Vella Stefano, Vince Adriana, Waheed Yasir, Waked Imam, Walsh Nicholas, Weis Nina, Wong Vincent W, Woodring Joseph, Yaghi Cesar, Yang Hwai-I, Yang Chung-Lin, Yesmembetov Kakharman, Yosry Ayman, Yuen Man-Fung, Yusuf Muhammed Aasim M, Zeuzem Stefan, Razavi Homie. Global prevalence, treatment, and prevention of hepatitis B virus infection in 2016: a modelling study. The Lancet Gastroenterology & Hepatology. 2018;3(6):383–403. doi: 10.1016/S2468-1253(18)30056-6. - DOI - PubMed
1. WHO. Hepatitis B Fact Sheet. Available at: http://www.who.int/mediacentre/factsheets/fs204/en/ (Accessed: May 2017) (2017).
1. Griggs D, et al. Policy: Sustainable development goals for people and planet. Nature. 2013;495:305–307. doi: 10.1038/495305a. - DOI - PubMed
1. O’Hara GA, et al. Hepatitis B virus infection as a neglected tropical disease. PLoS Negl. Trop. Dis. 2017;11:e0005842. doi: 10.1371/journal.pntd.0005842. - DOI - PMC - PubMed
1. McNaughton AL, et al. HBV vaccination and PMTCT as elimination tools in the presence of HIV: insights from a clinical cohort and dynamic model. BMC Med. 2019;17:43. doi: 10.1186/s12916-019-1269-x. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Illumina and Nanopore methods for whole genome sequencing of hepatitis B virus (HBV)

Affiliations

Illumina and Nanopore methods for whole genome sequencing of hepatitis B virus (HBV)

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical