Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 1;73(Suppl_4):S267-S274.
doi: 10.1093/cid/ciab785.

Overcoming Data Bottlenecks in Genomic Pathogen Surveillance

Collaborators, Affiliations

Overcoming Data Bottlenecks in Genomic Pathogen Surveillance

Ayorinde O Afolayan et al. Clin Infect Dis. .

Abstract

Performing whole genome sequencing (WGS) for the surveillance of antimicrobial resistance offers the ability to determine not only the antimicrobials to which rates of resistance are increasing, but also the evolutionary mechanisms and transmission routes responsible for the increase at local, national, and global scales. To derive WGS-based outputs, a series of processes are required, beginning with sample and metadata collection, followed by nucleic acid extraction, library preparation, sequencing, and analysis. Throughout this pathway there are many data-related operations required (informatics) combined with more biologically focused procedures (bioinformatics). For a laboratory aiming to implement pathogen genomics, the informatics and bioinformatics activities can be a barrier to starting on the journey; for a laboratory that has already started, these activities may become overwhelming. Here we describe these data bottlenecks and how they have been addressed in laboratories in India, Colombia, Nigeria, and the Philippines, as part of the National Institute for Health Research Global Health Research Unit on Genomic Surveillance of Antimicrobial Resistance. The approaches taken include the use of reproducible data parsing pipelines and genome sequence analysis workflows, using technologies such as Data-flo, the Nextflow workflow manager, and containerization of software dependencies. By overcoming barriers to WGS implementation in countries where genome sampling for some species may be underrepresented, a body of evidence can be built to determine the concordance of antimicrobial sensitivity testing and genome-derived resistance, and novel high-risk clones and unknown mechanisms of resistance can be discovered.

Keywords: WGS; antimicrobial resistance; bioinformatics; metadata; whole genome sequencing.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An overview of 1 potential pathway from sample to phenotypic and genomic outputs. The bottleneck icons represent some of the steps in the process that can cause particular implementation challenges. ① Sample metadata cleaning and validation. ② Conversion of antimicrobial sensitivity testing minimum inhibitory concentration data into standardized formats for downstream processing and interpretation. ③ Quantitative quality assessment of raw reads and assemblies. ④ Processing raw reads to detect the presence or absence of genetic loci, genes, specific nonsynonymous mutations, and variants. ⑤ Aggregating results to produce human readable reports. Abbreviations: AMR, antimicrobial resistance; AST, antimicrobial sensitivity testing; MLST, multilocus sequence typing; ST, sequence type.
Figure 2.
Figure 2.
Diagram showing the flow of data from sample receipt to final outputs and highlighting the solutions used for each step. The numbers refer to the same data bottlenecks described in Figure 1. The diagram starts when each bacterial sample is submitted accompanied by associated metadata. The sample is processed by traditional phenotypic antimicrobial sensitivity testing to produce minimum inhibitory concentration data. In parallel, genomic DNA from the sample is extracted and sequenced and whole genome sequencing data are processed through reproducible bioinformatics pipelines to produce multiple outputs such as multilocus sequence type, antimicrobial resistance determinant prediction, and single-nucleotide polymorphism–based phylogenies. These data are aggregated using Data-flo and stored in Google Sheets where they can be combined and manipulated using downstream processes such as R scripts or Data-flo pipeline to make final visualizations or reports. Abbreviations: AMR, antimicrobial resistance; AST, antimicrobial sensitivity testing; MIC, minimum inhibitory concentration; MLST, multilocus sequence typing; QC, quality control; RIS, Resistant, Intermediate, Susceptible; SNP, single-nucleotide polymorphism; WGS, whole genome sequencing.
Figure 3.
Figure 3.
Implementation vignettes. Abbreviations: AMR, antimicrobial resistance; KIMS, Kempegowda Institute of Medical Sciences; SOP, standard operating procedure; WGS, whole genome sequencing.

References

    1. Armstrong GL, MacCannell DR, Taylor J, et al. Pathogen genomics in public health. N Engl J Med 2019; 381:2569–80. - PMC - PubMed
    1. Hendriksen RS, Bortolaia V, Tate H, Tyson GH, Aarestrup FM, McDermott PF. Using genomics to track global antimicrobial resistance. Front Public Health 2019; 7:242. - PMC - PubMed
    1. World Health Organization. GLASS whole-genome sequencing for surveillance of antimicrobial resistance. 2020. Available at: https://www.who.int/publications/i/item/9789240011007. Accessed 29 September 2021.
    1. Ellington MJ, Ekelund O, Aarestrup FM, et al. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST subcommittee. Clin Microbiol Infect 2017; 23:2–22. - PubMed
    1. Su M, Satola SW, Read TD. Genome-based prediction of bacterial antibiotic resistance. J Clin Microbiol 2019; 57:e01405-18. Available at: https://jcm.asm.org/content/57/3/e01405-18. Accessed 30 September 2020. - PMC - PubMed

Publication types

Substances