. 2018 Feb 1;6(1):23.

doi: 10.1186/s40168-018-0401-z.

DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data

Gustavo Arango-Argoty¹, Emily Garner², Amy Pruden², Lenwood S Heath¹, Peter Vikesland², Liqing Zhang³

Affiliations

¹ Department of Computer Science, Virginia Tech, Blacksburg, VA, USA.
² Department of Civil and Environmental Engineering, Virginia Tech, Blacksburg, VA, USA.
³ Department of Computer Science, Virginia Tech, Blacksburg, VA, USA. lqzhang@cs.vt.edu.

PMID: 29391044
PMCID: PMC5796597
DOI: 10.1186/s40168-018-0401-z

DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data

Gustavo Arango-Argoty et al. Microbiome. 2018.

. 2018 Feb 1;6(1):23.

doi: 10.1186/s40168-018-0401-z.

Authors

Gustavo Arango-Argoty¹, Emily Garner², Amy Pruden², Lenwood S Heath¹, Peter Vikesland², Liqing Zhang³

Affiliations

¹ Department of Computer Science, Virginia Tech, Blacksburg, VA, USA.
² Department of Civil and Environmental Engineering, Virginia Tech, Blacksburg, VA, USA.
³ Department of Computer Science, Virginia Tech, Blacksburg, VA, USA. lqzhang@cs.vt.edu.

PMID: 29391044
PMCID: PMC5796597
DOI: 10.1186/s40168-018-0401-z

Abstract

Background: Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively.

Results: Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories.

Conclusions: The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .

Keywords: Antibiotic resistance; Deep learning; Machine learning; Metagenomics.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
Bit score vs. identity distribution, illustrating the relationship between the UNIPROT genes against the CARD and ARDB genes in terms of the percentage identity, bit score, and e-value. Colors depict the exponent of the e-value (e-values below 1e-200 are represented by gray dots)

**Fig. 2**
Preprocessing and UNIPROT ARGs annotation. Antibiotic resistance genes from CARD, ARDB, and UNIPROT were merged and clustered to remove duplicates. Then, sequences from UNIPROT are annotated using the matches between the metadata and the names of antibiotic categories from ARDB and CARD

**Fig. 3**
Validation of UNIPROT annotations. UNIPROT genes were aligned against the CARD and ARDB databases. The alignment with the highest bit score was selected for each UNI-gene (best hit) and a set of filters were applied to determine the UNI-gene annotation factor (AnnFactor)

**Fig. 4**
Classification framework. UNIPROT genes were used for validation and training whereas the CARD and ARDB databases were used as features. The distance between genes from UNIPROT to ARGs databases is computed using the sequence alignment bit score. Alignments are done using DIAMOND with permissive cutoffs allowing a high number of hits for each UNIPROT gene. This distribution is used to train and validate the deep learning models (The panel in the figure provides additional description on the training of the models)

**Fig. 5**
a Distribution of the number of sequences in the 30 antibiotic categories in DeepARG-DB. b The relative contribution of ARG categories in the ARDB, CARD, and UNIPROT databases

**Fig. 6**
a Performance comparison of the DeepARG models with the best hit approach using precision, recall, and F1-score as metrics for the training and testing datasets. The MEGARes bars corresponds to the performance of DeepARG-LS using the genes from the MEGARes database. b Precision and recall of DeepARG models against the best hit approach for each individual category in the testing dataset. *UNIPROT genes are used for testing and not all the ARG categories have genes from the UNIPROT database

**Fig. 7**
a Identity distribution of 76 novel beta lactamase genes against the DeepARG database (DeepARG-DB). Each dot corresponds to the best hit of each novel gene where color indicates the E-value (<1e-10) and size depicts the alignment coverage (> 40%). b Pairwise identity distribution of the beta lactamase genes in the DeepARG database

**Fig. 8**
Prediction result using the DeepARG-SS model to classify ARGs for the spike-in dataset. Results for nonARG reads (eukaryotic reads) are not shown because DeepARG-SS was able to remove them during the alignment step using DIAMOND

**Fig. 9**
Distribution of DeepARG classification probability and the best hit identity. Each point indicates the alignment of each “partial” negative ARG against the DeepARG database. The horizontal line indicates the default setting for DeepARG predictions, i.e., the predictions with a probability higher than 0.8 are considered by DeepARG as high-quality classifications

See this image and copyright information in PMC

References

1. O’Neill J. Tackling drug-resistant infections globally: final report and recommendations. Rev Antimicrob Resist. 2016;1:1-84.
1. Brogan DM, Mossialos E. A critical analysis of the review on antimicrobial resistance report and the infectious disease financing facility. Glob Health. 2016;12:8. doi: 10.1186/s12992-016-0147-y. - DOI - PMC - PubMed
1. O’Neill J. Antimicrobial resistance: tackling a crisis for the health and wealth of nations. Review on antimicrobial resistance. Rev Antimicrob Resist. 2014;
1. Vuong C, Yeh AJ, Cheung GY, Otto M. Investigational drugs to treat methicillin-resistant Staphylococcus Aureus. Expert Opin Investig Drugs. 2016;25:73–93. doi: 10.1517/13543784.2016.1109077. - DOI - PMC - PubMed
1. Gandhi NR, Nunn P, Dheda K, Schaaf HS, Zignol M, Van Soolingen D, Jensen P, Bayona J. Multidrug-resistant and extensively drug-resistant tuberculosis: a threat to global control of tuberculosis. Lancet. 2010;375:1830–1843. doi: 10.1016/S0140-6736(10)60410-2. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data

Affiliations

DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data

Authors

Affiliations

Abstract

Conflict of interest statement

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials

Miscellaneous