Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Dec;17(12):2815-2839.
doi: 10.1038/s41596-022-00738-y. Epub 2022 Sep 28.

Metagenome analysis using the Kraken software suite

Affiliations
Review

Metagenome analysis using the Kraken software suite

Jennifer Lu et al. Nat Protoc. 2022 Dec.

Erratum in

Abstract

Metagenomic experiments expose the wide range of microscopic organisms in any microbial environment through high-throughput DNA sequencing. The computational analysis of the sequencing data is critical for the accurate and complete characterization of the microbial community. To facilitate efficient and reproducible metagenomic analysis, we introduce a step-by-step protocol for the Kraken suite, an end-to-end pipeline for the classification, quantification and visualization of metagenomic datasets. Our protocol describes the execution of the Kraken programs, via a sequence of easy-to-use scripts, in two scenarios: (1) quantification of the species in a given metagenomics sample; and (2) detection of a pathogenic agent from a clinical sample taken from a human patient. The protocol, which is executed within 1-2 h, is targeted to biologists and clinicians working in microbiome or metagenomics analysis who are familiar with the Unix command-line environment.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing financial interest

Figures

Fig 1.
Fig 1.. Protocol workflow.
Overview of two workflows (1) pathogen identification and (2) microbiome analysis. (1) Here we try to detect an infectious agent using NGS reads. For this we start with a sample from the infection site and (ideally) a negative control. As a first step, host DNA is removed by excluding all reads aligning to the host genome using Bowtie 2. This step usually removes a large fraction of the reads. Remaining reads are then classified by Kraken2Uniq against a reference database, and the taxonomic reports are compared using Pavian. Pavian can distinguish large abundance changes between controls and infected samples using z-statistics. For all potential pathogen candidates, reads can be extracted using extract kraken reads.py. In workflow (2) we try to estimate the abundance of species in microbiome samples and compute the diversity changes between them. In the protocol, we start with multiple sets of reads from a microbiome before and after fecal transfer. All samples are classified using Kraken 2. Bracken takes the classified read counts and estimates the abundance of each taxon in the sample. Pavian can be used to explore and visualize this sample to spot the difference. Additionally, alpha diversity.py can be used to quantify the diversity in a sample and beta diversity.py can be used to compare diversity across samples.
Fig 2.
Fig 2.. Pavian Output for Hierarchical Visualization
Upon (1) opening the Pavian app, users should (2) upload the microbiome sample files. (3) Choose ”Sample” to view classification visualization results. (4) Select sample from the drop-down menu. (5) Select plot settings to customize visualization. (6) Save image of network.
Fig 3.
Fig 3.. Pavian Output for Pathogen Identification.
Upon (1) opening the Pavian app, users should (2) upload the pathogen sample files. (3) Choose “Comparison” to view the table of read counts per sample per taxon. (4) Select ‘Species’ and ‘Z-score (reads)’ to filter the table and calculate z-scores. (5). Finally, sort by max z-scores to focus on species that are most likely pathogen candidates.
Fig 4.
Fig 4.. Pavian Alignment Viewer.
A) shows the graphical interface of Pavian’s alignment viewer. Users should upload the .bam and .bai files to the alignment viewer. B) and C) show two example coverage plots for the pathogen identification samples. Pavian displays the coverage plot along with summarizing coverage statistics.
Fig 5.
Fig 5.. Alpha and Beta diversity results.
In subplot A, we can see the computed results for alpha diversity. In the equations p is a vector of pis where pi=numberofindividualsinithspeciestotalnumberofindividuals for all i species i.e. pi=niN. And D=ni(ni1)N*(N1). In subplot B, we can see a heatmap of the 3 samples from 3 different time points in patient T11’s treatment. The sample taken on day 12, was taken while T11 was taking antibiotics, marked with an ‘A’. The samples taken from days -9 and -2, were taken before the commencement of antibiotic treatment, marked with an ‘N’. Here we use beta_diversity.py to compare diversity across samples. This is the Bray-Curtis dissimilarity matrix. We see that the two samples taken before commencement of treatment are more similar to each other than either sample compared to sample A, from day 12.
Fig 6.
Fig 6.. Microbiome Plots.
In subplots A–C we see Pavian visualization for samples 1–3. Samples 1 and 2 have a similar taxonomic breakdown (A & B), corresponding to T11’s normal microbiome diversity. Subplot C shows that sample 3 is dominated by a few bacteria when T11 is taking antibiotics. On the right are Krona plots generated from samples 2 and 3. The plot on top (2) shows the diversity of patient T11 before receiving any antibiotic treatment (sample 2) and the plot below it (3) shows how depleted his microbiome diversity while he is taking antibiotics (sample 3).
Fig 7.
Fig 7.. Pathogen Identification Results.
The above plot summarizes the Kraken2Uniq results across the 10 corneal samples. The number of reads, number of k-mers, and z-scores reveals the most likely pathogen for each sample. For example, Acanthamoeba quina has high read and k-mer counts in S88 alone, Staphylococcus aureus is prevalent in S90, and Human alphaherpesvirus 1 is likely to infect S83. For each sample, the true pathogen is the pathogen with the highest z-score for that particular sample.

References

    1. Rappé MS, Giovannoni SJ. The uncultured microbial majority. Annu Rev Microbiol. 2003;57:369–394. - PubMed
    1. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014. Mar;15(3):R46. - PMC - PubMed
    1. Breitwieser FP, Baker DN, Salzberg SL. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 2018. Nov;19(1):198. - PMC - PubMed
    1. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biology. 2019. Nov:762302. - PMC - PubMed
    1. Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Comput Sci. 2017. Jan;3:e104.

Publication types