Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 4:11:575377.
doi: 10.3389/fmicb.2020.575377. eCollection 2020.

Proficiency Testing of Metagenomics-Based Detection of Food-Borne Pathogens Using a Complex Artificial Sequencing Dataset

Affiliations

Proficiency Testing of Metagenomics-Based Detection of Food-Borne Pathogens Using a Complex Artificial Sequencing Dataset

Dirk Höper et al. Front Microbiol. .

Abstract

Metagenomics-based high-throughput sequencing (HTS) enables comprehensive detection of all species comprised in a sample with a single assay and is becoming a standard method for outbreak investigation. However, unlike real-time PCR or serological assays, HTS datasets generated for pathogen detection do not easily provide yes/no answers. Rather, results of the taxonomic read assignment need to be assessed by trained personnel to gain information thereof. Proficiency tests are important instruments of validation, harmonization, and standardization. Within the European Union funded project COMPARE [COllaborative Management Platform for detection and Analyses of (Re-) emerging and foodborne outbreaks in Europe], we conducted a proficiency test to scrutinize the ability to assess diagnostic metagenomics data. An artificial dataset resembling shotgun sequencing of RNA from a sample of contaminated trout was provided to 12 participants with the request to provide a table with per-read taxonomic assignments at species level and a report with a summary and assessment of their findings, considering different categories like pathogen, background, or contaminations. Analysis of the read assignment tables showed that the software used reliably classified the reads taxonomically overall. However, usage of incomplete reference databases or inappropriate data pre-processing caused difficulties. From the combination of the participants' reports with their read assignments, we conclude that, although most species were detected, a number of important taxa were not or not correctly categorized. This implies that knowledge of and awareness for potentially dangerous species and contaminations need to be improved, hence, capacity building for the interpretation of diagnostic metagenomics datasets is necessary.

Keywords: background contamination; diagnostic assessment; high-throughput sequencing; metagenomics; pathogen; proficiency test; training.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Summary of the read assignments at the superkingdom-level for 11 participants that provided the requested read assignment table with the read to species assignments (except P10). Only five read assignment tables (P1, P3, P4, P8, and P11) contained an assignment for all reads of the dataset. Only the compositions reported by P4, P8, and P11 fit the known actual composition (actual; upper left) of the dataset.
FIGURE 2
FIGURE 2
Positive predictive values of read assignments calculated from the complete read set, calculated based on the species assignments. Sequences of the taxa labeled gray (Brugia malayi, Caenorhabditis remanei, Danio rerio, and Scomber japonicus) were downloaded unintentionally as part of the Anisakis sequence dataset.
FIGURE 3
FIGURE 3
Sensitivity of read assignments calculated from the complete read set. (A) Sensitivities calculated based on the species assignments. (B) Sensitivities calculated based on the genus assignments. Sequences of the taxa labeled gray (Brugia malayi, Caenorhabditis remanei, Danio rerio, and Scomber japonicus) were downloaded unintentionally as part of the Anisakis sequence dataset.
FIGURE 4
FIGURE 4
Summary of the read-assignment tables (A), and assessment and interpretation of the assignments (B). Heatmap showing the positive (comprised species detected; green) and negative (comprised species NOT detected; red) results of the software analyses. The results shown for participants are based on their uploaded read assignment tables, except for P10, for which the results are derived from their summary table and assessment (A). Heatmap summarizing assessments of the detected species by the participants (B). Sequences of the taxa labeled gray (Brugia malayi, Caenorhabditis remanei, Danio rerio, and Scomber japonicus) were downloaded unintentionally as part of the Anisakis sequence dataset.

References

    1. Bolger A. M., Lohse M., Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30 2114–2120. 10.1093/bioinformatics/btu170 - DOI - PMC - PubMed
    1. Boratyn G. M., Schäffer A. A., Agarwala R., Altschul S. F., Lipman D. J., Madden T. L. (2012). Domain enhanced lookup time accelerated BLAST. Biol. Direct. 7:12. 10.1186/1745-6150-7-12 - DOI - PMC - PubMed
    1. Brinkmann A., Andrusch A., Belka A., Wylezich C., Höper D., Pohlmann A., et al. (2019). Proficiency testing of virus diagnostics based on bioinformatics analysis of simulated in silico high-throughput sequencing datasets. J. Clin. Microbiol. 57:e00466-19. 10.1128/JCM.00466-19 - DOI - PMC - PubMed
    1. Buchfink B., Xie C., Huson D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12 59–60. 10.1038/nmeth.3176 - DOI - PubMed
    1. Chen S., Zhou Y., Chen Y., Gu J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34 i884–i890. 10.1093/bioinformatics/bty560 - DOI - PMC - PubMed

LinkOut - more resources