Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 19;10(1):176.
doi: 10.1186/s40168-022-01365-1.

LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis

Affiliations

LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis

Ezgi Özkurt et al. Microbiome. .

Abstract

Background: Amplicon sequencing is an established and cost-efficient method for profiling microbiomes. However, many available tools to process this data require both bioinformatics skills and high computational power to process big datasets. Furthermore, there are only few tools that allow for long read amplicon data analysis. To bridge this gap, we developed the LotuS2 (less OTU scripts 2) pipeline, enabling user-friendly, resource friendly, and versatile analysis of raw amplicon sequences.

Results: In LotuS2, six different sequence clustering algorithms as well as extensive pre- and post-processing options allow for flexible data analysis by both experts, where parameters can be fully adjusted, and novices, where defaults are provided for different scenarios. We benchmarked three independent gut and soil datasets, where LotuS2 was on average 29 times faster compared to other pipelines, yet could better reproduce the alpha- and beta-diversity of technical replicate samples. Further benchmarking a mock community with known taxon composition showed that, compared to the other pipelines, LotuS2 recovered a higher fraction of correctly identified taxa and a higher fraction of reads assigned to true taxa (48% and 57% at species; 83% and 98% at genus level, respectively). At ASV/OTU level, precision and F-score were highest for LotuS2, as was the fraction of correctly reported 16S sequences.

Conclusion: LotuS2 is a lightweight and user-friendly pipeline that is fast, precise, and streamlined, using extensive pre- and post-ASV/OTU clustering steps to further increase data quality. High data usage rates and reliability enable high-throughput microbiome analysis in minutes.

Availability: LotuS2 is available from GitHub, conda, or via a Galaxy web interface, documented at http://lotus2.earlham.ac.uk/ . Video Abstract.

Keywords: 16S rRNA; Amplicon data analysis; Amplicon sequencing; ITS; Long read; Microbiome; Short read.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Workflow of the LotuS2 pipeline. A LotuS2 can be installed either through (i) Bioconda, (ii) GitHub with the provided autoInstaller script, or (iii) using a Docker image. Alternatively, (iv) Galaxy web servers can also run LotuS2 (e.g., https://usegalaxy.eu/). B LotuS2 accepts amplicon reads from different sequencing platforms, along with a map file that describes barcodes, file locations, sample IDs, and other information. After demultiplexing and quality filtering, high-quality reads are clustered into either ASVs or OTUs. The optimal sequence representing each OTU/ASV is calculated in the seed extension step, where read pairs are also merged. Mid-quality reads are subsequently mapped onto these sequence clusters to increase cluster representation in abundance matrices. From OTU/ASV sequences, a phylogenetic tree is constructed, and each cluster is taxonomically assigned. These results are made available in multiple standard formats, such as tab-delimited files, .biom, or phyloseq objects to enable downstream analysis. New options in LotuS2 for each step are denoted with black colour whereas options in grey font were already available in LotuS
Fig. 2
Fig. 2
Computational performance of amplicon sequencing pipelines. 16S rRNA amplicon MiSeq data from A gut-16S, B soil-16S, and C soil-ITS samples were processed to benchmark resource usage of each pipeline, run on the same system under equal conditions (12 cores, max 150 Gb memory). In all pipelines, OTUs/ASVs were classified by similarity comparisons to SILVA 138.1. In LotuS2, Lambda was used to align sequences for all clustering algorithms. Pipeline runs were separated by common steps (pre-processing, sequence clustering, taxonomic classification, and phylogenetic tree construction and/or off-target removal). Because native DADA2 cannot demultiplex reads, we used the average demultiplexing time of QIIME 2 and LotuS2 (LotuS2 demultiplexed, unfiltered reads were provided to DADA2). Since phylogenetic trees based on ITS sequences may lead to erroneous phylogenies [55], we did not include the phylogenetic tree construction step in the analysis of the soil-ITS dataset. LotuS2 runs are labelled with red color. D, E, F Data usage efficiency of each tested pipeline, by comparing the number of sequence clusters (OTUs or ASVs) to retrieved read counts in the final output matrix of each pipeline. Note that mothur results for soil-16S are not shown, because the pipeline rejected all sequences at the default parameters
Fig. 3
Fig. 3
Reproducibility from different amplicons sequence data analysis pipelines. Three independent datasets were used to represent different biomes and amplicon technologies, using A, D human faecal samples (16S rRNA gene, N = 40 replicates). B, E soil samples (16S rRNA gene, N = 50 replicates), and C, F soil samples (ITS 2, N = 50 replicates). AC Bray-Curtis distances among technical replicate samples were used to assess the reproducibility of community compositions by different pipelines. The pipeline with the lowest BCd in each subfigure is denoted with a star (*). The significance of pairwise comparisons of each pipeline was calculated using the Tukey’s HSD test (Supplementary Table S2). DF Further, the fraction of technical replicates being closest to each other (BCd) was calculated to simulate identifying technical replicates without additional knowledge. Numbers above bars are the ordered pipelines performing best. Lower Bray-Curtis distances between technical replicates and a higher fraction of correct technical replicates indicate better reproducibility. LotuS2 runs are labelled with red color
Fig. 4
Fig. 4
Benchmarking of amplicon sequence data analysis pipeline’s performance using a mock community with known species composition. A Accuracy of each pipeline in predicting the mock community composition at genus level. For benchmarking we compared the fraction of reads assigned to true genera and both correctly and erroneously recovered genera. Precision, Recall, and F-score were calculated based on the true positive, false positive, and false negative taxa identified. At species level, LotuS2 excelled also in these statistics (Supplementary Figure S9). B Percentage of true positive ASVs/OTUs having a nucleotide identity ≥ indicated thresholds to 16S rRNA gene sequences of genomes from the mock community. Pipeline(s) showing the highest performance in each comparison is denoted with a star (*). TP, true positive; ASV, amplicon sequencing variant; OTU, operational taxonomic unit. LotuS2-UPARSE and LotuS2-VSEARCH had the same result, therefore colors are overlaid

Similar articles

Cited by

References

    1. Bahram M, Hildebrand F, Forslund SK, Anderson JL, Soudzilovskaia NA, Bodegom PM, et al. Structure and function of the global topsoil microbiome. Nature. 2018;560:233–237. - PubMed
    1. Özkurt E, Hassani MA, Sesiz U, Künzel S, Dagan T, Özkan H, et al. Seed-derived microbial colonization of wild emmer and domesticated bread wheat (Triticum dicoccoides and T. aestivum) seedlings shows pronounced differences in overall diversity and composition. mBio. 2020;e02637–20. - PMC - PubMed
    1. Bedarf JR, Beraza N, Khazneh H, Özkurt E, Baker D, Borger V, et al. Much ado about nothing? Off-target amplification can lead to false-positive bacterial brain microbiome detection in healthy and Parkinson’s disease individuals. Microbiome. 2021;9:75. - PMC - PubMed
    1. Tedersoo L, Anslan S, Bahram M, Põlme S, Riit T, Liiv I, et al. Shotgun metagenomes and multiple primer pair-barcode combinations of amplicons reveal biases in metabarcoding analyses of fungi. MycoKeys. 2015;10:1–43.
    1. Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. Chimeric 16S rRNA sequence formation and detection in sanger and 454-pyrosequenced PCR amplicons. Genome Res. 2011;21:494–504. - PMC - PubMed

Publication types

LinkOut - more resources