. 2022 Oct 19;10(1):176.

doi: 10.1186/s40168-022-01365-1.

LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis

Ezgi Özkurt^{1

2}, Joachim Fritscher^{1

2}, Nicola Soranzo², Duncan Y K Ng¹, Robert P Davey², Mohammad Bahram^{3

4}, Falk Hildebrand^{5

6}

Affiliations

¹ Gut Microbes & Health, Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, NR4 7UQ, UK.
² Earlham Institute, Norwich Research Park, Norwich, Norfolk, NR4 7UZ, UK.
³ Department of Ecology, Swedish University of Agricultural Sciences, Ulls väg 16, 756 51, Uppsala, Sweden.
⁴ Institute of Ecology and Earth Sciences, University of Tartu, Lai St, 40, Tartu, Estonia.
⁵ Gut Microbes & Health, Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, NR4 7UQ, UK. falk.hildebrand@quadram.ac.uk.
⁶ Earlham Institute, Norwich Research Park, Norwich, Norfolk, NR4 7UZ, UK. falk.hildebrand@quadram.ac.uk.

PMID: 36258257
PMCID: PMC9580208
DOI: 10.1186/s40168-022-01365-1

LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis

Ezgi Özkurt et al. Microbiome. 2022.

. 2022 Oct 19;10(1):176.

doi: 10.1186/s40168-022-01365-1.

Authors

Ezgi Özkurt^{1

2}, Joachim Fritscher^{1

2}, Nicola Soranzo², Duncan Y K Ng¹, Robert P Davey², Mohammad Bahram^{3

4}, Falk Hildebrand^{5

6}

Affiliations

¹ Gut Microbes & Health, Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, NR4 7UQ, UK.
² Earlham Institute, Norwich Research Park, Norwich, Norfolk, NR4 7UZ, UK.
³ Department of Ecology, Swedish University of Agricultural Sciences, Ulls väg 16, 756 51, Uppsala, Sweden.
⁴ Institute of Ecology and Earth Sciences, University of Tartu, Lai St, 40, Tartu, Estonia.
⁵ Gut Microbes & Health, Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, NR4 7UQ, UK. falk.hildebrand@quadram.ac.uk.
⁶ Earlham Institute, Norwich Research Park, Norwich, Norfolk, NR4 7UZ, UK. falk.hildebrand@quadram.ac.uk.

PMID: 36258257
PMCID: PMC9580208
DOI: 10.1186/s40168-022-01365-1

Abstract

Background: Amplicon sequencing is an established and cost-efficient method for profiling microbiomes. However, many available tools to process this data require both bioinformatics skills and high computational power to process big datasets. Furthermore, there are only few tools that allow for long read amplicon data analysis. To bridge this gap, we developed the LotuS2 (less OTU scripts 2) pipeline, enabling user-friendly, resource friendly, and versatile analysis of raw amplicon sequences.

Results: In LotuS2, six different sequence clustering algorithms as well as extensive pre- and post-processing options allow for flexible data analysis by both experts, where parameters can be fully adjusted, and novices, where defaults are provided for different scenarios. We benchmarked three independent gut and soil datasets, where LotuS2 was on average 29 times faster compared to other pipelines, yet could better reproduce the alpha- and beta-diversity of technical replicate samples. Further benchmarking a mock community with known taxon composition showed that, compared to the other pipelines, LotuS2 recovered a higher fraction of correctly identified taxa and a higher fraction of reads assigned to true taxa (48% and 57% at species; 83% and 98% at genus level, respectively). At ASV/OTU level, precision and F-score were highest for LotuS2, as was the fraction of correctly reported 16S sequences.

Conclusion: LotuS2 is a lightweight and user-friendly pipeline that is fast, precise, and streamlined, using extensive pre- and post-ASV/OTU clustering steps to further increase data quality. High data usage rates and reliability enable high-throughput microbiome analysis in minutes.

Availability: LotuS2 is available from GitHub, conda, or via a Galaxy web interface, documented at http://lotus2.earlham.ac.uk/ . Video Abstract.

Keywords: 16S rRNA; Amplicon data analysis; Amplicon sequencing; ITS; Long read; Microbiome; Short read.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Workflow of the LotuS2 pipeline. A LotuS2 can be installed either through (i) Bioconda, (ii) GitHub with the provided autoInstaller script, or (iii) using a Docker image. Alternatively, (iv) Galaxy web servers can also run LotuS2 (e.g., https://usegalaxy.eu/). B LotuS2 accepts amplicon reads from different sequencing platforms, along with a map file that describes barcodes, file locations, sample IDs, and other information. After demultiplexing and quality filtering, high-quality reads are clustered into either ASVs or OTUs. The optimal sequence representing each OTU/ASV is calculated in the seed extension step, where read pairs are also merged. Mid-quality reads are subsequently mapped onto these sequence clusters to increase cluster representation in abundance matrices. From OTU/ASV sequences, a phylogenetic tree is constructed, and each cluster is taxonomically assigned. These results are made available in multiple standard formats, such as tab-delimited files, .biom, or phyloseq objects to enable downstream analysis. New options in LotuS2 for each step are denoted with black colour whereas options in grey font were already available in LotuS

**Fig. 2**
Computational performance of amplicon sequencing pipelines. 16S rRNA amplicon MiSeq data from A gut-16S, B soil-16S, and C soil-ITS samples were processed to benchmark resource usage of each pipeline, run on the same system under equal conditions (12 cores, max 150 Gb memory). In all pipelines, OTUs/ASVs were classified by similarity comparisons to SILVA 138.1. In LotuS2, Lambda was used to align sequences for all clustering algorithms. Pipeline runs were separated by common steps (pre-processing, sequence clustering, taxonomic classification, and phylogenetic tree construction and/or off-target removal). Because native DADA2 cannot demultiplex reads, we used the average demultiplexing time of QIIME 2 and LotuS2 (LotuS2 demultiplexed, unfiltered reads were provided to DADA2). Since phylogenetic trees based on ITS sequences may lead to erroneous phylogenies [55], we did not include the phylogenetic tree construction step in the analysis of the soil-ITS dataset. LotuS2 runs are labelled with red color. D, E, F Data usage efficiency of each tested pipeline, by comparing the number of sequence clusters (OTUs or ASVs) to retrieved read counts in the final output matrix of each pipeline. Note that mothur results for soil-16S are not shown, because the pipeline rejected all sequences at the default parameters

**Fig. 3**
Reproducibility from different amplicons sequence data analysis pipelines. Three independent datasets were used to represent different biomes and amplicon technologies, using A, D human faecal samples (16S rRNA gene, N = 40 replicates). B, E soil samples (16S rRNA gene, N = 50 replicates), and C, F soil samples (ITS 2, N = 50 replicates). A–C Bray-Curtis distances among technical replicate samples were used to assess the reproducibility of community compositions by different pipelines. The pipeline with the lowest BCd in each subfigure is denoted with a star (*). The significance of pairwise comparisons of each pipeline was calculated using the Tukey’s HSD test (Supplementary Table S2). D–F Further, the fraction of technical replicates being closest to each other (BCd) was calculated to simulate identifying technical replicates without additional knowledge. Numbers above bars are the ordered pipelines performing best. Lower Bray-Curtis distances between technical replicates and a higher fraction of correct technical replicates indicate better reproducibility. LotuS2 runs are labelled with red color

**Fig. 4**
Benchmarking of amplicon sequence data analysis pipeline’s performance using a mock community with known species composition. A Accuracy of each pipeline in predicting the mock community composition at genus level. For benchmarking we compared the fraction of reads assigned to true genera and both correctly and erroneously recovered genera. Precision, Recall, and F-score were calculated based on the true positive, false positive, and false negative taxa identified. At species level, LotuS2 excelled also in these statistics (Supplementary Figure S9). B Percentage of true positive ASVs/OTUs having a nucleotide identity ≥ indicated thresholds to 16S rRNA gene sequences of genomes from the mock community. Pipeline(s) showing the highest performance in each comparison is denoted with a star (*). TP, true positive; ASV, amplicon sequencing variant; OTU, operational taxonomic unit. LotuS2-UPARSE and LotuS2-VSEARCH had the same result, therefore colors are overlaid

See this image and copyright information in PMC

Cited by

Host dispersal relaxes selective pressures in rafting microbiomes and triggers successional changes.
Pearman WS, Duffy GA, Smith RO, Currie KI, Gemmell NJ, Morales SE, Fraser CI. Pearman WS, et al. Nat Commun. 2024 Dec 30;15(1):10759. doi: 10.1038/s41467-024-54954-z. Nat Commun. 2024. PMID: 39737966 Free PMC article.
A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses.
Hakimzadeh A, Abdala Asbun A, Albanese D, Bernard M, Buchner D, Callahan B, Caporaso JG, Curd E, Djemiel C, Brandström Durling M, Elbrecht V, Gold Z, Gweon HS, Hajibabaei M, Hildebrand F, Mikryukov V, Normandeau E, Özkurt E, M Palmer J, Pascal G, Porter TM, Straub D, Vasar M, Větrovský T, Zafeiropoulos H, Anslan S. Hakimzadeh A, et al. Mol Ecol Resour. 2024 Jul;24(5):e13847. doi: 10.1111/1755-0998.13847. Epub 2023 Aug 7. Mol Ecol Resour. 2024. PMID: 37548515 Free PMC article. Review.
Global trends in research of high-throughput sequencing technology associated with chronic wounds from 2002 to 2022: A bibliometric and visualized study.
Meng H, Peng Y, Li P, Su J, Jiang Y, Fu X. Meng H, et al. Front Surg. 2023 Feb 22;10:1089203. doi: 10.3389/fsurg.2023.1089203. eCollection 2023. Front Surg. 2023. PMID: 36911623 Free PMC article.
Intragenomic diversity of the V9 hypervariable domain in eukaryotes has little effect on metabarcoding.
Flegontova O, Lukeš J, Horák A. Flegontova O, et al. iScience. 2023 Jul 12;26(8):107291. doi: 10.1016/j.isci.2023.107291. eCollection 2023 Aug 18. iScience. 2023. PMID: 37554448 Free PMC article.
Patterns in soil microbial diversity across Europe.
Labouyrie M, Ballabio C, Romero F, Panagos P, Jones A, Schmid MW, Mikryukov V, Dulya O, Tedersoo L, Bahram M, Lugato E, van der Heijden MGA, Orgiazzi A. Labouyrie M, et al. Nat Commun. 2023 Jun 8;14(1):3311. doi: 10.1038/s41467-023-37937-4. Nat Commun. 2023. PMID: 37291086 Free PMC article.

See all "Cited by" articles

References

1. Bahram M, Hildebrand F, Forslund SK, Anderson JL, Soudzilovskaia NA, Bodegom PM, et al. Structure and function of the global topsoil microbiome. Nature. 2018;560:233–237. - PubMed
1. Özkurt E, Hassani MA, Sesiz U, Künzel S, Dagan T, Özkan H, et al. Seed-derived microbial colonization of wild emmer and domesticated bread wheat (Triticum dicoccoides and T. aestivum) seedlings shows pronounced differences in overall diversity and composition. mBio. 2020;e02637–20. - PMC - PubMed
1. Bedarf JR, Beraza N, Khazneh H, Özkurt E, Baker D, Borger V, et al. Much ado about nothing? Off-target amplification can lead to false-positive bacterial brain microbiome detection in healthy and Parkinson’s disease individuals. Microbiome. 2021;9:75. - PMC - PubMed
1. Tedersoo L, Anslan S, Bahram M, Põlme S, Riit T, Liiv I, et al. Shotgun metagenomes and multiple primer pair-barcode combinations of amplicons reveal biases in metabarcoding analyses of fungi. MycoKeys. 2015;10:1–43.
1. Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. Chimeric 16S rRNA sequence formation and detection in sanger and 454-pyrosequenced PCR amplicons. Genome Res. 2011;21:494–504. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis

Affiliations

LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous