Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 18;23(4):bbac233.
doi: 10.1093/bib/bbac233.

Comparative analysis of machine learning algorithms on the microbial strain-specific AMP prediction

Affiliations

Comparative analysis of machine learning algorithms on the microbial strain-specific AMP prediction

Boris Vishnepolsky et al. Brief Bioinform. .

Abstract

The evolution of drug-resistant pathogenic microbial species is a major global health concern. Naturally occurring, antimicrobial peptides (AMPs) are considered promising candidates to address antibiotic resistance problems. A variety of computational methods have been developed to accurately predict AMPs. The majority of such methods are not microbial strain specific (MSS): they can predict whether a given peptide is active against some microbe, but cannot accurately calculate whether such peptide would be active against a particular MS. Due to insufficient data on most MS, only a few MSS predictive models have been developed so far. To overcome this problem, we developed a novel approach that allows to improve MSS predictive models (MSSPM), based on properties, computed for AMP sequences and characteristics of genomes, computed for target MS. New models can perform predictions of AMPs for MS that do not have data on peptides tested on them. We tested various types of feature engineering as well as different machine learning (ML) algorithms to compare the predictive abilities of resulting models. Among the ML algorithms, Random Forest and AdaBoost performed best. By using genome characteristics as additional features, the performance for all models increased relative to models relying on AMP sequence-based properties only. Our novel MSS AMP predictor is freely accessible as part of DBAASP database resource at http://dbaasp.org/prediction/genome.

Keywords: AMP prediction; antimicrobial peptides; machine learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart of MSS prediction algorithms.
Figure 2
Figure 2
Balance accuracies for the models from three subgroups of the MSSPM G2 group: (a) G25, (b) G28 and (c) G29 (abbreviations are described in Section 3.2). Test sets SQTSij were created based on data corresponding to a particular pair (i,j): i-th strain and j-th GF. The first index takes on the following values: i = 1,…8, with i = 1 corresponding to Escherichia coli ATCC 25922, i = 2—Pseudomonas aeruginosa ATCC 27853, i = 3—to Klebsiella pneumoniae ATCC 700603, i = 4—to Salmonella typhimurium ATCC 14028, i = 5—to Acinetobacter baumannii ATCC 19606, i = 6—to Staphylococcus aureus ATCC 25923, i = 7—to Enterococcus faecalis ATCC 29212, and i = 8—to Bacillus subtilis ATCC 6633. The second index takes on the values j = 5, 8, 9, where j = 5 corresponds to mono+di nucleotide compositions and j = 8, 9 correspond to SF: j = 8 corresponds to genome similarity index dDDH and j = 9 corresponds to index that relies on similarity between gyrB genes; training sets SQTScj = SQTS1j ∪ SQTS2j ∪,…,∪ SQTS8j. BAC was evaluated using 10-fold cross-validation.
Figure 3
Figure 3
Balance accuracies for the models from three subgroups of the MSSPM G3 group: (a) G35, (b) G38 and (c) G39 (abbreviations are described in Section 3.2). Test sets SQTSij were created based on data corresponding to a particular pair (i,j): i-th strain and j-th GF. The first index takes on the following values: i = 1,…,8, with i = 1 corresponding to Escherichia coli ATCC 25922, i = 2—Pseudomonas aeruginosa ATCC 27853, i = 3—to Klebsiella pneumoniae ATCC 700603, i = 4—to Salmonella typhimurium ATCC 14028, i = 5—to Acinetobacter baumannii ATCC 19606, i = 6—to Staphylococcus aureus ATCC 25923, i = 7—to Enterococcus faecalis ATCC 29212, and i = 8—to Bacillus subtilis ATCC 6633. The second index takes on the values j = 5, 8, 9, where j = 5 corresponds to mono+di nucleotide compositions and j = 8, 9 correspond to SF: j = 8 corresponds to genome similarity index dDDH and j = 9 corresponds to index that relies on similarity between gyrB genes; training sets i-SQTScj = SQTScj–SQTSij. BAC was evaluated using 10-fold cross-validation.
Figure 4
Figure 4
Average accuracies of the best predictive models proposed on applying the Y-scrambling test with different shuffling percentages of the true activity.

References

    1. Xu J, Li F, Leier A, et al. Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides. Brief Bioinform 2021;22:bbab083. 10.1093/bib/bbab083. - DOI - PubMed
    1. Pinacho-Castellanos SA, García-Jacas CR, Gilson MK, et al. Alignment-free antimicrobial peptide predictors: improving performance by a thorough analysis of the largest available data set. J Chem Inf Model 2021;61:3141–57. - PubMed
    1. Waghu FH, Barai RS, Gurung P, et al. CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides. Nucleic Acids Res 2015;44:D1094–7. - PMC - PubMed
    1. Kavousi K, Bagheri M, Behrouzi S, et al. IAMPE: NMR assisted computational prediction of antimicrobial peptides. J Chem Inf Model 2020;60:4691–701. - PubMed
    1. Akbar S, Hayat AA, , et al. iAtbP-Hyb-EnC: prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Comput Biol Med 2021;137:104778. - PubMed

Publication types

Substances