Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 28;24(1):197.
doi: 10.1186/s13059-023-03033-5.

Ariadne: synthetic long read deconvolution using assembly graphs

Affiliations

Ariadne: synthetic long read deconvolution using assembly graphs

Lauren Mak et al. Genome Biol. .

Abstract

Synthetic long read sequencing techniques such as UST's TELL-Seq and Loop Genomics' LoopSeq combine 3[Formula: see text] barcoding with standard short-read sequencing to expand the range of linkage resolution from hundreds to tens of thousands of base-pairs. However, the lack of a 1:1 correspondence between a long fragment and a 3[Formula: see text] unique molecular identifier confounds the assignment of linkage between short reads. We introduce Ariadne, a novel assembly graph-based synthetic long read deconvolution algorithm, that can be used to extract single-species read-clouds from synthetic long read datasets to improve the taxonomic classification and de novo assembly of complex populations, such as metagenomes.

Keywords: Assembly graphs; Barcode deconvolution; Metagenomics; Synthetic long read.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The size-weighted purity of SLR read clouds increases after applying deconvolution methods. All graphs were generated from 40,000 randomly sampled clouds from the dataset. Top row: No deconvolution. Middle row: Reference deconvolution based on read alignment to species and then grouping reads in 200 kbp regions. Bottom row: Ariadne deconvolution with a search distance of 5 kbp and a minimum cloud size cutoff of 6
Fig. 2
Fig. 2
Read cloud deconvolution improves metagenomic assembly compared to raw SLR data. We compared assemblies built from raw linked reads (no deconvolution) to assemblies built from reads deconvolved using two methods: reference deconvolution, which maps reads to reference genomes, and deconvolution using Ariadne. Top row: The NA50 of assemblies for each species in each sample between deconvolved reads and raw reads. Larger numbers indicate better performance. Middle row: The largest alignment of assemblies for each species in each sample between deconvolved reads and raw reads. Larger numbers indicate better performance. Bottom row: The proportion of misassembled bases pmiss is the number of bases in misassembled contigs over the total number of assembled bases
Fig. 3
Fig. 3
Halving and doubling the maximum fragment length does not meaningfully change the quality of de novo assembly using reference-deconvolved linked-reads. Shown here are the NA50, largest alignments, and relative misassembly rate of the MOCK5 LoopSeq reference-deconvolved assembly. As before, we compared assemblies built from raw linked reads (no deconvolution) to assemblies built from reads deconvolved using two methods: reference deconvolution with maximum fragment lengths set to 100 kbp (Ref_100), 200 kbp (Ref_200), and 400 kbp (Ref_400), and deconvolution using Ariadne
Fig. 4
Fig. 4
Read cloud deconvolution improves the specificity of short-read taxonomic classification, especially from high ranks such as root (R), kingdom/domain (D), and phylum (P) to low ranks such as species (S). A MOCK5 LoopSeq. B MOCK20 TELL-Seq, for which the y-axis has been truncated at 120,000 promoted reads for display purposes. There were 683,635 domain-to-species promotions with reference-based deconvolution
Fig. 5
Fig. 5
Graphical description of the Ariadne deconvolution process. A Reads with the same 3 UMI are in a read cloud. Blue and red reads originate from different fragments. The B de Bruijn assembly graph is generated by cloudSPAdes, and C a focal read is mapped to one of its edges. From a read’s 3-terminal vertex, D a Djikstra graph (indicated by a large black circle) is created from all edges and vertices within the maximum search distance from the 3-terminal vertex. These vertices and edges (within the black circle) comprise read i’s search-distance-limited connected subgraph within the whole assembly graph. Reads aligning to edges in this connected subgraph are added to read i’s connected set. E Reads originating from different fragments likely coincide with non-included vertices. F Connected read-sets with at least one intersection (i.e., one read in common) are output together as an enhanced read cloud

Similar articles

Cited by

References

    1. Moss EL, Maghini DG, Bhatt AS. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat Biotechnol. 2020;38(6):701–707. doi: 10.1038/s41587-020-0422-6. - DOI - PMC - PubMed
    1. De Coster W, Weissensteiner MH, Sedlazeck FJ. Towards population-scale long-read sequencing. Nat Rev Genet. 2021;22(9):572–587. doi: 10.1038/s41576-021-00367-3. - DOI - PMC - PubMed
    1. Sedlazeck FJ, Lee H, Darby CA, Schatz MC. Piercing the dark matter: Bioinformatics of long-range sequencing and mapping. Nat Rev Genet. 2018;19(6):329–346. doi: 10.1038/s41576-018-0003-4. - DOI - PubMed
    1. Bertrand D, Shaw J, Kalathiyappan M, Ng AHQ, Kumar MS, Li C, et al. Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat Biotechnol. 2019;37(8):937–944. doi: 10.1038/s41587-019-0191-2. - DOI - PubMed
    1. Zlitni S, Bishara A, Moss EL, Tkachenko E, Kang JB, Culver RN, et al. Strain-resolved microbiome sequencing reveals mobile elements that drive bacterial competition on a clinical timescale. Genome Med. 2020;12(1):1–17. doi: 10.1186/s13073-020-00747-0. - DOI - PMC - PubMed

Publication types

LinkOut - more resources