Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021;6(57):2906.
doi: 10.21105/joss.02906. Epub 2021 Jan 7.

Augur: a bioinformatics toolkit for phylogenetic analyses of human pathogens

Affiliations

Augur: a bioinformatics toolkit for phylogenetic analyses of human pathogens

John Huddleston et al. J Open Source Softw. 2021.

Abstract

The analysis of human pathogens requires a diverse collection of bioinformatics tools. These tools include standard genomic and phylogenetic software and custom software developed to handle the relatively numerous and short genomes of viruses and bacteria. Researchers increasingly depend on the outputs of these tools to infer transmission dynamics of human diseases and make actionable recommendations to public health officials (Black et al., 2020; Gardy et al., 2015). In order to enable real-time analyses of pathogen evolution, bioinformatics tools must scale rapidly with the number of samples and be flexible enough to adapt to a variety of questions and organisms. To meet these needs, we developed Augur, a bioinformatics toolkit designed for phylogenetic analyses of human pathogens.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Example workflows composed with Snakemake from Augur commands for A) Zika virus, B) tuberculosis, C) a BEAST analysis, and D) the Nextstrain SARS-CoV-2 pipeline as of 2020-11-27. Each node in the workflow graph represents a command that performs a specific part of the analysis (e.g., aligning sequences, building a tree, etc.) with Augur commands in black, external software in red, and custom scripts in blue. A typical workflow starts by filtering sequences and metadata to a desired subset for analysis followed by inference of a phylogeny, annotation of that phylogeny, and export of the annotated phylogeny to a JSON that can be viewed on Nextstrain. Workflows for viral (A) and bacterial (B) pathogens follow a similar structure but also support custom pathogen-specific steps. Augur’s modularity enables workflows that build on outputs from other tools in the field like BEAST (C) as well as more complicated analyses such as that behind Nextstrain’s daily SARS-CoV-2 builds (D) which often require custom scripts to perform analysis-specific steps. Multiple outgoing edges from a single node represent opportunities to run the workflow in parallel. See the full workflows behind A, B, and D at https://github.com/nextstrain/zika-tutorial, https://github.com/nextstrain/tb, and https://github.com/nextstrain/ncov.

References

    1. Alm E, Broberg EK, Connor T, Hodcroft EB, Komissarov AB, Maurer-Stroh S, Melidou A, Neher RA, O’Toole Á, Pereyaslov D, & The WHO European Region Sequencing Laboratories and GISAID EpiCoV Group. (2020). Geographical and temporal distribution of SARS-CoV-2 clades in the WHO European Region, January to June 2020. Eurosurveillance, 25(32), 2001410. 10.2807/1560-7917.ES.2020.25.32.2001410 - DOI - PMC - PubMed
    1. Bedford T, Greninger AL, Roychoudhury P, Starita LM, Famulare M, Huang M, Nalla A, Pepper G, Reinhardt A, Xie H, Shrestha L, Nguyen TN, Adler A, Brandstetter E, Cho S, Giroux D, Han PD, Fay K, Frazar CD, … Jerome KR (2020). Cryptic transmission of SARS-CoV-2 in Washington state. Science. 10.1126/science.abc0523 - DOI - PMC - PubMed
    1. Black A, MacCannell DR, Sibley TR, & Bedford T (2020). Ten recommendations for supporting open pathogen genomic analysis in public health. Nature Medicine, 26(6), 832–841. 10.1038/s41591-020-0935-z - DOI - PMC - PubMed
    1. Gardy J, Loman NJ, & Rambaut A (2015). Real-time digital pathogen surveillance — the time is now. Genome Biology, 16, 155. 10.1186/s13059-015-0726-x - DOI - PMC - PubMed
    1. Griffiths EJ, Timme RE, Page AJ, Alikhan N-F, Fornika D, Maguire F, Mendes CI, Tausch SH, Black A, Connor TR, Tyson GH, Aanensen DM, Alcock B, Campos J, Christoffels A, da Silva AG, Hodcroft E, Hsiao WWL, Katz LS, … MacCannell DR (2020). The PHA4GE SARS-CoV-2 Contextual Data Specification for Open Genomic Epidemiology. 10.20944/preprints202008.0220.v1 - DOI