Proficiency Testing of Metagenomics-Based Detection of Food-Borne Pathogens Using a Complex Artificial Sequencing Dataset

Dirk Höper¹, Josephine Grützke², Annika Brinkmann³, Joël Mossong⁴, Sébastien Matamoros⁵, Richard J Ellis⁶, Carlus Deneke², Simon H Tausch², Isabel Cuesta⁷, Sara Monzón⁷, Miguel Juliá⁷, Thomas Nordahl Petersen⁸, Rene S Hendriksen⁸, Sünje J Pamp⁸, Mikael Leijon⁹, Mikhayil Hakhverdyan⁹, Aaron M Walsh¹⁰, Paul D Cotter¹⁰, Lakshmi Chandrasekaran¹¹, Moon Y F Tay¹¹, Joergen Schlundt¹¹, Claudia Sala¹², Alessandra De Cesare¹³, Andreas Nitsche³, Martin Beer¹, Claudia Wylezich¹

Affiliations

¹ Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany.
² Department of Biological Safety, German Federal Institute for Risk Assessment, Berlin, Germany.
³ Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany.
⁴ Département de Microbiologie, Laboratoire National de Santé, Dudelange, Luxembourg.
⁵ Department of Medical Microbiology, Amsterdam UMC University of Amsterdam, Amsterdam, Netherlands.
⁶ Animal and Plant Health Agency, Addlestone, United Kingdom.
⁷ Bioinformatics Unit, Institute of Health Carlos III (ISCIII), Madrid, Spain.
⁸ Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Lyngby, Denmark.
⁹ Department of Microbiology, National Veterinary Institute (SVA), Uppsala, Sweden.
¹⁰ Teagasc Food Research Centre, APC Microbiome Ireland and Vistamilk, Moorepark, Ireland.
¹¹ Nanyang Technological University Food Technology Centre (NAFTEC), Nanyang Technological University (NTU), Singapore, Singapore.
¹² Department of Physics and Astronomy, University of Bologna, Bologna, Italy.
¹³ Department of Veterinary Medical Sciences, University of Bologna, Bologna, Italy.

PMID: 33250869
PMCID: PMC7672002
DOI: 10.3389/fmicb.2020.575377

Proficiency Testing of Metagenomics-Based Detection of Food-Borne Pathogens Using a Complex Artificial Sequencing Dataset

Dirk Höper et al. Front Microbiol. 2020.

. 2020 Nov 4:11:575377.

doi: 10.3389/fmicb.2020.575377. eCollection 2020.

Authors

Affiliations

¹ Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany.
² Department of Biological Safety, German Federal Institute for Risk Assessment, Berlin, Germany.
³ Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany.
⁴ Département de Microbiologie, Laboratoire National de Santé, Dudelange, Luxembourg.
⁵ Department of Medical Microbiology, Amsterdam UMC University of Amsterdam, Amsterdam, Netherlands.
⁶ Animal and Plant Health Agency, Addlestone, United Kingdom.
⁷ Bioinformatics Unit, Institute of Health Carlos III (ISCIII), Madrid, Spain.
⁸ Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Lyngby, Denmark.
⁹ Department of Microbiology, National Veterinary Institute (SVA), Uppsala, Sweden.
¹⁰ Teagasc Food Research Centre, APC Microbiome Ireland and Vistamilk, Moorepark, Ireland.
¹¹ Nanyang Technological University Food Technology Centre (NAFTEC), Nanyang Technological University (NTU), Singapore, Singapore.
¹² Department of Physics and Astronomy, University of Bologna, Bologna, Italy.
¹³ Department of Veterinary Medical Sciences, University of Bologna, Bologna, Italy.

PMID: 33250869
PMCID: PMC7672002
DOI: 10.3389/fmicb.2020.575377

Abstract

Metagenomics-based high-throughput sequencing (HTS) enables comprehensive detection of all species comprised in a sample with a single assay and is becoming a standard method for outbreak investigation. However, unlike real-time PCR or serological assays, HTS datasets generated for pathogen detection do not easily provide yes/no answers. Rather, results of the taxonomic read assignment need to be assessed by trained personnel to gain information thereof. Proficiency tests are important instruments of validation, harmonization, and standardization. Within the European Union funded project COMPARE [COllaborative Management Platform for detection and Analyses of (Re-) emerging and foodborne outbreaks in Europe], we conducted a proficiency test to scrutinize the ability to assess diagnostic metagenomics data. An artificial dataset resembling shotgun sequencing of RNA from a sample of contaminated trout was provided to 12 participants with the request to provide a table with per-read taxonomic assignments at species level and a report with a summary and assessment of their findings, considering different categories like pathogen, background, or contaminations. Analysis of the read assignment tables showed that the software used reliably classified the reads taxonomically overall. However, usage of incomplete reference databases or inappropriate data pre-processing caused difficulties. From the combination of the participants' reports with their read assignments, we conclude that, although most species were detected, a number of important taxa were not or not correctly categorized. This implies that knowledge of and awareness for potentially dangerous species and contaminations need to be improved, hence, capacity building for the interpretation of diagnostic metagenomics datasets is necessary.

Keywords: background contamination; diagnostic assessment; high-throughput sequencing; metagenomics; pathogen; proficiency test; training.

Copyright © 2020 Höper, Grützke, Brinkmann, Mossong, Matamoros, Ellis, Deneke, Tausch, Cuesta, Monzón, Juliá, Petersen, Hendriksen, Pamp, Leijon, Hakhverdyan, Walsh, Cotter, Chandrasekaran, Tay, Schlundt, Sala, De Cesare, Nitsche, Beer and Wylezich.

PubMed Disclaimer

Figures

**FIGURE 1**
Summary of the read assignments at the superkingdom-level for 11 participants that provided the requested read assignment table with the read to species assignments (except P10). Only five read assignment tables (P1, P3, P4, P8, and P11) contained an assignment for all reads of the dataset. Only the compositions reported by P4, P8, and P11 fit the known actual composition (actual; upper left) of the dataset.

**FIGURE 2**
Positive predictive values of read assignments calculated from the complete read set, calculated based on the species assignments. Sequences of the taxa labeled gray (*Brugia malayi*, *Caenorhabditis remanei*, *Danio rerio*, and *Scomber japonicus*) were downloaded unintentionally as part of the *Anisakis* sequence dataset.

**FIGURE 3**
Sensitivity of read assignments calculated from the complete read set. **(A)** Sensitivities calculated based on the species assignments. **(B)** Sensitivities calculated based on the genus assignments. Sequences of the taxa labeled gray (*Brugia malayi*, *Caenorhabditis remanei*, *Danio rerio*, and *Scomber japonicus*) were downloaded unintentionally as part of the *Anisakis* sequence dataset.

**FIGURE 4**
Summary of the read-assignment tables **(A)**, and assessment and interpretation of the assignments **(B)**. Heatmap showing the positive (comprised species detected; green) and negative (comprised species NOT detected; red) results of the software analyses. The results shown for participants are based on their uploaded read assignment tables, except for P10, for which the results are derived from their summary table and assessment **(A)**. Heatmap summarizing assessments of the detected species by the participants **(B)**. Sequences of the taxa labeled gray (*Brugia malayi*, *Caenorhabditis remanei*, *Danio rerio*, and *Scomber japonicus*) were downloaded unintentionally as part of the *Anisakis* sequence dataset.

See this image and copyright information in PMC

References

1. Bolger A. M., Lohse M., Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30 2114–2120. 10.1093/bioinformatics/btu170 - DOI - PMC - PubMed
1. Boratyn G. M., Schäffer A. A., Agarwala R., Altschul S. F., Lipman D. J., Madden T. L. (2012). Domain enhanced lookup time accelerated BLAST. Biol. Direct. 7:12. 10.1186/1745-6150-7-12 - DOI - PMC - PubMed
1. Brinkmann A., Andrusch A., Belka A., Wylezich C., Höper D., Pohlmann A., et al. (2019). Proficiency testing of virus diagnostics based on bioinformatics analysis of simulated in silico high-throughput sequencing datasets. J. Clin. Microbiol. 57:e00466-19. 10.1128/JCM.00466-19 - DOI - PMC - PubMed
1. Buchfink B., Xie C., Huson D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12 59–60. 10.1038/nmeth.3176 - DOI - PubMed
1. Chen S., Zhou Y., Chen Y., Gu J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34 i884–i890. 10.1093/bioinformatics/bty560 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Proficiency Testing of Metagenomics-Based Detection of Food-Borne Pathogens Using a Complex Artificial Sequencing Dataset

Affiliations

Proficiency Testing of Metagenomics-Based Detection of Food-Borne Pathogens Using a Complex Artificial Sequencing Dataset

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources