Review

. 2022 Dec;17(12):2815-2839.

doi: 10.1038/s41596-022-00738-y. Epub 2022 Sep 28.

Metagenome analysis using the Kraken software suite

Jennifer Lu^#^{1

2}, Natalia Rincon^#^{3

4}, Derrick E Wood^{4

5}, Florian P Breitwieser⁴, Christopher Pockrandt⁴, Ben Langmead⁵, Steven L Salzberg^{3

4

5

6}, Martin Steinegger⁷

Affiliations

¹ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA. jennifer.lu717@gmail.com.
² Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA. jennifer.lu717@gmail.com.
³ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
⁴ Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
⁵ Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
⁶ Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
⁷ School of Biological Sciences and Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea. martin.steinegger@snu.ac.kr.

^# Contributed equally.

PMID: 36171387
PMCID: PMC9725748
DOI: 10.1038/s41596-022-00738-y

Review

Metagenome analysis using the Kraken software suite

Jennifer Lu et al. Nat Protoc. 2022 Dec.

. 2022 Dec;17(12):2815-2839.

doi: 10.1038/s41596-022-00738-y. Epub 2022 Sep 28.

Authors

Jennifer Lu^#^{1

2}, Natalia Rincon^#^{3

4}, Derrick E Wood^{4

5}, Florian P Breitwieser⁴, Christopher Pockrandt⁴, Ben Langmead⁵, Steven L Salzberg^{3

4

5

6}, Martin Steinegger⁷

Affiliations

¹ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA. jennifer.lu717@gmail.com.
² Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA. jennifer.lu717@gmail.com.
³ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
⁴ Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
⁵ Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
⁶ Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
⁷ School of Biological Sciences and Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea. martin.steinegger@snu.ac.kr.

^# Contributed equally.

PMID: 36171387
PMCID: PMC9725748
DOI: 10.1038/s41596-022-00738-y

Erratum in

Author Correction: Metagenome analysis using the Kraken software suite.
Lu J, Rincon N, Wood DE, Breitwieser FP, Pockrandt C, Langmead B, Salzberg SL, Steinegger M. Lu J, et al. Nat Protoc. 2024 Aug 29. doi: 10.1038/s41596-024-01064-1. Online ahead of print. Nat Protoc. 2024. PMID: 39210095 No abstract available.

Abstract

Metagenomic experiments expose the wide range of microscopic organisms in any microbial environment through high-throughput DNA sequencing. The computational analysis of the sequencing data is critical for the accurate and complete characterization of the microbial community. To facilitate efficient and reproducible metagenomic analysis, we introduce a step-by-step protocol for the Kraken suite, an end-to-end pipeline for the classification, quantification and visualization of metagenomic datasets. Our protocol describes the execution of the Kraken programs, via a sequence of easy-to-use scripts, in two scenarios: (1) quantification of the species in a given metagenomics sample; and (2) detection of a pathogenic agent from a clinical sample taken from a human patient. The protocol, which is executed within 1-2 h, is targeted to biologists and clinicians working in microbiome or metagenomics analysis who are familiar with the Unix command-line environment.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing financial interest

Figures

**Fig 1.. Protocol workflow.**
Overview of two workflows (1) pathogen identification and (2) microbiome analysis. (1) Here we try to detect an infectious agent using NGS reads. For this we start with a sample from the infection site and (ideally) a negative control. As a first step, host DNA is removed by excluding all reads aligning to the host genome using Bowtie 2. This step usually removes a large fraction of the reads. Remaining reads are then classified by Kraken2Uniq against a reference database, and the taxonomic reports are compared using Pavian. Pavian can distinguish large abundance changes between controls and infected samples using z-statistics. For all potential pathogen candidates, reads can be extracted using extract kraken reads.py. In workflow (2) we try to estimate the abundance of species in microbiome samples and compute the diversity changes between them. In the protocol, we start with multiple sets of reads from a microbiome before and after fecal transfer. All samples are classified using Kraken 2. Bracken takes the classified read counts and estimates the abundance of each taxon in the sample. Pavian can be used to explore and visualize this sample to spot the difference. Additionally, alpha diversity.py can be used to quantify the diversity in a sample and beta diversity.py can be used to compare diversity across samples.

**Fig 2.. Pavian Output for Hierarchical Visualization**
Upon (1) opening the Pavian app, users should (2) upload the microbiome sample files. (3) Choose ”Sample” to view classification visualization results. (4) Select sample from the drop-down menu. (5) Select plot settings to customize visualization. (6) Save image of network.

**Fig 3.. Pavian Output for Pathogen Identification.**
Upon (1) opening the Pavian app, users should (2) upload the pathogen sample files. (3) Choose “Comparison” to view the table of read counts per sample per taxon. (4) Select ‘Species’ and ‘Z-score (reads)’ to filter the table and calculate z-scores. (5). Finally, sort by max z-scores to focus on species that are most likely pathogen candidates.

**Fig 4.. Pavian Alignment Viewer.**
A) shows the graphical interface of Pavian’s alignment viewer. Users should upload the .bam and .bai files to the alignment viewer. B) and C) show two example coverage plots for the pathogen identification samples. Pavian displays the coverage plot along with summarizing coverage statistics.

**Fig 5.. Alpha and Beta diversity results.**
In subplot A, we can see the computed results for alpha diversity. In the equations p is a vector of p_is where $p_{i} = \frac{(n u m b e r o f i n d i v i d u a l s i n i^{t h} s p e c i e s)}{t o t a l n u m b e r o f i n d i v i d u a l s}$ for all i species i.e. $p_{i} = \frac{n_{i}}{N}$ . And $D = \frac{\sum n_{i} (n_{i} - 1)}{N * (N - 1)}$ . In subplot B, we can see a heatmap of the 3 samples from 3 different time points in patient T11’s treatment. The sample taken on day 12, was taken while T11 was taking antibiotics, marked with an ‘A’. The samples taken from days -9 and -2, were taken before the commencement of antibiotic treatment, marked with an ‘N’. Here we use beta_diversity.py to compare diversity across samples. This is the Bray-Curtis dissimilarity matrix. We see that the two samples taken before commencement of treatment are more similar to each other than either sample compared to sample A, from day 12.

**Fig 6.. Microbiome Plots.**
In subplots A–C we see Pavian visualization for samples 1–3. Samples 1 and 2 have a similar taxonomic breakdown (A & B), corresponding to T11’s normal microbiome diversity. Subplot C shows that sample 3 is dominated by a few bacteria when T11 is taking antibiotics. On the right are Krona plots generated from samples 2 and 3. The plot on top (2) shows the diversity of patient T11 before receiving any antibiotic treatment (sample 2) and the plot below it (3) shows how depleted his microbiome diversity while he is taking antibiotics (sample 3).

**Fig 7.. Pathogen Identification Results.**
The above plot summarizes the Kraken2Uniq results across the 10 corneal samples. The number of reads, number of k-mers, and z-scores reveals the most likely pathogen for each sample. For example, *Acanthamoeba quina* has high read and k-mer counts in S88 alone, *Staphylococcus aureus* is prevalent in S90, and *Human alphaherpesvirus 1* is likely to infect S83. For each sample, the true pathogen is the pathogen with the highest z-score for that particular sample.

See this image and copyright information in PMC

References

1. Rappé MS, Giovannoni SJ. The uncultured microbial majority. Annu Rev Microbiol. 2003;57:369–394. - PubMed
1. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014. Mar;15(3):R46. - PMC - PubMed
1. Breitwieser FP, Baker DN, Salzberg SL. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 2018. Nov;19(1):198. - PMC - PubMed
1. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biology. 2019. Nov:762302. - PMC - PubMed
1. Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Comput Sci. 2017. Jan;3:e104.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Metagenome analysis using the Kraken software suite

Affiliations

Metagenome analysis using the Kraken software suite

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources