NGS read classification using AI
- PMID: 34936673
- PMCID: PMC8694450
- DOI: 10.1371/journal.pone.0261548
NGS read classification using AI
Erratum in
-
Correction: NGS read classification using AI.PLoS One. 2024 Apr 1;19(4):e0301793. doi: 10.1371/journal.pone.0301793. eCollection 2024. PLoS One. 2024. PMID: 38557766 Free PMC article.
Abstract
Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient's sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen's genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures




References
-
- NCBI. Genbank growth statistics; 2020. Available from: https://www.ncbi.nlm.nih.gov/genbank/statistics/.
-
- NCBI. SRA growth statistics; 2020. Available from: https://www.ncbi.nlm.nih.gov/sra/docs/sragrowth/.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous