. 2014;15(11):517.

doi: 10.1186/PREACCEPT-6768001251451949.

BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads

Lewis Z Hong, Shuzhen Hong, Han Teng Wong, Pauline P K Aw, Yan Cheng, Andreas Wilm, Paola F de Sessions, Seng Gee Lim, Niranjan Nagarajan, Martin L Hibberd, Stephen R Quake, William F Burkholder

PMID: 25406369
PMCID: PMC4269956
DOI: 10.1186/PREACCEPT-6768001251451949

BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads

Lewis Z Hong et al. Genome Biol. 2014.

. 2014;15(11):517.

doi: 10.1186/PREACCEPT-6768001251451949.

Authors

Lewis Z Hong, Shuzhen Hong, Han Teng Wong, Pauline P K Aw, Yan Cheng, Andreas Wilm, Paola F de Sessions, Seng Gee Lim, Niranjan Nagarajan, Martin L Hibberd, Stephen R Quake, William F Burkholder

PMID: 25406369
PMCID: PMC4269956
DOI: 10.1186/PREACCEPT-6768001251451949

Abstract

We present a method for obtaining long haplotypes, of over 3 kb in length, using a short-read sequencer, Barcode-directed Assembly for Extra-long Sequences (BAsE-Seq). BAsE-Seq relies on transposing a template-specific barcode onto random segments of the template molecule and assembling the barcoded short reads into complete haplotypes. We applied BAsE-Seq on mixed clones of hepatitis B virus and accurately identified haplotypes occurring at frequencies greater than or equal to 0.4%, with >99.9% specificity. Applying BAsE-Seq to a clinical sample, we obtained over 9,000 viral haplotypes, which provided an unprecedented view of hepatitis B virus population structure during chronic infection. BAsE-Seq is readily applicable for monitoring quasispecies evolution in viral diseases.

Trial registration: ClinicalTrials.gov NCT00962871.

PubMed Disclaimer

Figures

**Figure 1**
**Outline of BAsE-Seq methodology. (a)** The goal of library preparation is to attach unique barcodes to full-length HBV genomes, and then juxtapose the assigned barcode to random overlapping fragments of the viral genome. A unique barcode is first assigned to each HBV genome using PCR. The two barcode assignment primers contain HBV-specific sequences on their 3′ ends, universal sequences (green) on their 5′ ends, and one of the primers also contains a random barcode (blue). Subsequently, barcode-tagged genomes are clonally amplified by PCR using primers that anneal to Uni-A and Uni-B and that add a biotin label (Bio) to the barcode-proximal end. The barcode-distal end is digested with exonuclease to obtain a broad size distribution of nested deletion fragments. Barcode-containing fragments are purified using Dynabeads, and intramolecular ligation of these fragments yields a library of circular molecules in which different regions of each HBV genome are juxtaposed to its assigned barcode. The circularized molecules are used as a template for random fragmentation and adapter tagging following the Nextera protocol. During PCR enrichment, a set of primers is used to incorporate Illumina-specific paired-end adapters and enrich for barcode-tagged molecules during sequencing. **(b)** Bioinformatics workflow. Barcode-containing read pairs are used to obtain a 'bulk consensus' genome by iterative alignment of read pairs against a GenBank sequence. Aligned read pairs are de-multiplexed into individual genomes based on barcode identity. Consensus base calls are extracted to obtain 'individual consensus' genomes and SNVs are identified in each genome to construct haplotypes.

**Figure 2**
**SNVs in BAsE-Seq and Deep-Seq libraries. (a-d)** SNVs in BAsE-Seq libraries Lib_1:9 and Lib_1:99 were identified as true SNVs (red diamonds) or errors (blue dots) using the 'bulk' approach **(a,c)** or the 'individual' approach **(b,d)**. The frequency of each SNV (y-axis) is plotted against base position in the consensus sequence (x-axis). Additional information is also provided in Tables 1 and 3. **(e,f)** SNVs from S7.1 were identified using Deep-Seq and BAsE-Seq. The BAsE-Seq library contained an internal standard that was used to calculate the error-free frequency cutoff for the library; hence, only error-free SNVs are shown in the BAsE-Seq analysis of S7.1. **(g)** The frequency of SNVs detected in the BAsE-Seq library (y-axis) is plotted against the frequency of SNVs detected in the Deep-Seq library (x-axis). All 68 error-free SNVs identified by BAsE-Seq were also identified by Deep-Seq (Pearson correlation coefficient = 0.94).

**Figure 3**
**Phylogenetic analysis of intra-host viral quasispecies.** A phylogenetic analysis of HBV haplotypes identified by BAsE-Seq identified six distinct clades (numbered 1 to 6) in S7.1. The black scale bar represents the expected number of substitutions per site and the blue scale bar represents the frequency at which a particular haplotype was identified in the sample. Amino acid changes that are found in ≥70% of clade members are listed within each clade. Amino acid changes that are unique to each clade are listed with an asterisk. Five out of six clades contain at least one amino acid change (red) that is likely to confer the ability to escape immune detection.

See this image and copyright information in PMC

Cited by

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.
Salk JJ, Schmitt MW, Loeb LA. Salk JJ, et al. Nat Rev Genet. 2018 May;19(5):269-285. doi: 10.1038/nrg.2017.117. Epub 2018 Mar 26. Nat Rev Genet. 2018. PMID: 29576615 Free PMC article. Review.
IFDlong: an isoform and fusion detector for accurate annotation and quantification of long-read RNA-seq data.
Wang W, Li Y, Ko S, Feng N, Zhang M, Liu JJ, Zheng S, Ren B, Yu YP, Luo JH, Tseng GC, Liu S. Wang W, et al. bioRxiv [Preprint]. 2024 May 14:2024.05.11.593690. doi: 10.1101/2024.05.11.593690. bioRxiv. 2024. PMID: 38798496 Free PMC article. Preprint.
Genomic approaches for understanding dengue: insights from the virus, vector, and host.
Sim S, Hibberd ML. Sim S, et al. Genome Biol. 2016 Mar 2;17:38. doi: 10.1186/s13059-016-0907-2. Genome Biol. 2016. PMID: 26931545 Free PMC article. Review.
Targeted transcriptome analysis using synthetic long read sequencing uncovers isoform reprograming in the progression of colon cancer.
Liu S, Wu I, Yu YP, Balamotis M, Ren B, Ben Yehezkel T, Luo JH. Liu S, et al. Commun Biol. 2021 Apr 27;4(1):506. doi: 10.1038/s42003-021-02024-1. Commun Biol. 2021. PMID: 33907296 Free PMC article.
MrHAMER yields highly accurate single molecule viral sequences enabling analysis of intra-host evolution.
Gallardo CM, Wang S, Montiel-Garcia DJ, Little SJ, Smith DM, Routh AL, Torbett BE. Gallardo CM, et al. Nucleic Acids Res. 2021 Jul 9;49(12):e70. doi: 10.1093/nar/gkab231. Nucleic Acids Res. 2021. PMID: 33849057 Free PMC article.

See all "Cited by" articles

References

1. Domingo E, Sabo D, Taniguchi T, Weissmann C. Nucleotide sequence heterogeneity of an RNA phage population. Cell. 1978;13:735–744. doi: 10.1016/0092-8674(78)90223-4. - DOI - PubMed
1. Duffy S, Shackelton LA, Holmes E. Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet. 2008;9:267–276. doi: 10.1038/nrg2323. - DOI - PubMed
1. Burch CL, Chao L. Evolvability of an RNA virus is determined by its mutational neighbourhood. Nature. 2000;406:625–628. doi: 10.1038/35020564. - DOI - PubMed
1. Domingo E, Sheldon J, Perales C. Viral quasispecies evolution. Microbiol Mol Biol Rev. 2012;76:159–216. doi: 10.1128/MMBR.05023-11. - DOI - PMC - PubMed
1. Eigen M. Viral quasispecies. Sci Am. 1993;269:42–49. doi: 10.1038/scientificamerican0793-42. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Associated data

Actions
- Search in PubMed
- Search in ClinicalTrials.gov

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads

BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads

Authors

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Associated data

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Associated data

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical