Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 28;23(1):197.
doi: 10.1186/s12859-022-04727-6.

PDAUG: a Galaxy based toolset for peptide library analysis, visualization, and machine learning modeling

Affiliations

PDAUG: a Galaxy based toolset for peptide library analysis, visualization, and machine learning modeling

Jayadev Joshi et al. BMC Bioinformatics. .

Abstract

Background: Computational methods based on initial screening and prediction of peptides for desired functions have proven to be effective alternatives to lengthy and expensive biochemical experimental methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing programming libraries, access to computational resources, and flexible pipelines are big hurdles to adopting these advanced methods.

Results: To address the above mentioned barriers, we have implemented the peptide design and analysis under Galaxy (PDAUG) package, a Galaxy-based Python powered collection of tools, workflows, and datasets for rapid in-silico peptide library analysis. In contrast to existing methods like standard programming libraries or rigid single-function web-based tools, PDAUG offers an integrated GUI-based toolset, providing flexibility to build and distribute reproducible pipelines and workflows without programming expertise. Finally, we demonstrate the usability of PDAUG in predicting anticancer properties of peptides using four different feature sets and assess the suitability of various ML algorithms.

Conclusion: PDAUG offers tools for peptide library generation, data visualization, built-in and public database peptide sequence retrieval, peptide feature calculation, and machine learning (ML) modeling. Additionally, this toolset facilitates researchers to combine PDAUG with hundreds of compatible existing Galaxy tools for limitless analytic strategies.

PubMed Disclaimer

Conflict of interest statement

Daniel Blankenberg has a significant financial interest in GalaxyWorks, a company that may have a commercial interest in the results of this research and technology. This potential conflict of interest has been reviewed and is managed by the Cleveland Clinic.

Figures

Fig. 1
Fig. 1
Extending peptide library analysis with the PDAUG toolset inside Galaxy. A Tools are created with Python libraries. B Implementing Galaxy tool wrappers and tests for each tool. C PDAUG toolset with 24 individual tools. D Implementing reusable workflows using PDAUG
Fig. 2
Fig. 2
Sequence length distributions for the anticancer peptide and non-anticancer peptides. Mean lengths of anticancer and non-anticancer peptides are 40.06 and 32.25 AA, respectively, with less variability in length shown among the anticancer peptides
Fig. 3
Fig. 3
Sequence similarity network of the ACPs and non-ACPs. In comparison to the non-ACPs peptides, ACPs show two compact clusters that indicate a relatively high sequence similarity. In case of non-ACPs, relatively scattered networks have been observed
Fig. 4
Fig. 4
ACPs and non-ACPs datasets were compared and represented with a summary plot. A AA frequency distribution plot shows a significant difference in the frequency distribution of G, I, K, and L AA between ACPs and non-ACPs. B Global charge distribution shows a higher positive charge among the ACPs, while overall higher negative charge occurs among non-ACPs sequences. C There are no significant differences observed in the length distribution of ACPs and non-ACPs, except few outliers. D ACPs and non-ACPs show differences in global hydrophobicity. E A relatively smaller hydrophobic moment has been observed in the non-ACPs in comparison to the ACPs. F 3D scatter plot of global hydrophobicity, global hydrophobic movement and global charge showed separation between ACPs and non-ACPs
Fig. 5
Fig. 5
Feature space visualization of ACPs and non-ACPs. ACPs and non-ACPs in the feature space represented by their mean hydropathy and AA volume. The sequences with larger hydrophobic AA are more frequent in ACPs in comparison to non-ACPs
Fig. 6
Fig. 6
Assessment of the ML algorithms trained on four descriptor sets. Different performance measures for accuracy, precision, recall, F1 score and mean AUC were calculated for six different algorithms with and without z-scaling normalization. Results suggest that the models trained on the word vector descriptors perform superior to the models trained on other descriptors

Similar articles

Cited by

References

    1. Adermann K, John H, Ständker L, Forssmann W-G. Exploiting natural peptide diversity: novel research tools and drug leads. Curr Opin Biotechnol. 2004;15:599–606. doi: 10.1016/j.copbio.2004.10.007. - DOI - PubMed
    1. Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Cech M, Chilton J, Clements D, Coraor N, Grüning BA, Guerler A, Hillman-Jackson J, Hiltemann S, Jalili V, Rasche H, Soranzo N, Goecks J, Taylor J, Nekrutenko A, Blankenberg D. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 2018;46:W537–W544. doi: 10.1093/nar/gky379. - DOI - PMC - PubMed
    1. Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 2015;10(11):e0141287. 10.1371/journal.pone.0141287 - PMC - PubMed
    1. Atkinson HJ, Morris JH, Ferrin TE, Babbitt PC. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS ONE. 2009;4:e4345. doi: 10.1371/journal.pone.0004345. - DOI - PMC - PubMed
    1. Bhadra P, Yan J, Li J, Fong S, Siu SWI. AmPEP: sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci Rep. 2018;8:1697. doi: 10.1038/s41598-018-19752-w. - DOI - PMC - PubMed

LinkOut - more resources