Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 24;8(1):15688.
doi: 10.1038/s41598-018-33911-z.

Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection

Affiliations

Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection

Jose Liñares Blanco et al. Sci Rep. .

Abstract

Screening and in silico modeling are critical activities for the reduction of experimental costs. They also speed up research notably and strengthen the theoretical framework, thus allowing researchers to numerically quantify the importance of a particular subset of information. For example, in fields such as cancer and other highly prevalent diseases, having a reliable prediction method is crucial. The objective of this paper is to classify peptide sequences according to their anti-angiogenic activity to understand the underlying principles via machine learning. First, the peptide sequences were converted into three types of numerical molecular descriptors based on the amino acid composition. We performed different experiments with the descriptors and merged them to obtain baseline results for the performance of the models, particularly of each molecular descriptor subset. A feature selection process was applied to reduce the dimensionality of the problem and remove noisy features - which are highly present in biological problems. After a robust machine learning experimental design under equal conditions (nested resampling, cross-validation, hyperparameter tuning and different runs), we statistically and significantly outperformed the best previously published anti-angiogenic model with a generalized linear model via coordinate descent (glmnet), achieving a mean AUC value greater than 0.96 and with an accuracy of 0.86 with 200 molecular descriptors, mixed from the three groups. A final analysis with the top-40 discriminative anti-angiogenic activity peptides is presented along with a discussion of the feature selection process and the individual importance of each molecular descriptors According to our findings, anti-angiogenic activity peptides are strongly associated with amino acid sequences SP, LSL, PF, DIT, PC, GH, RQ, QD, TC, SC, AS, CLD, ST, MF, GRE, IQ, CQ and HG.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Flowchart of this study. The authors thank Bioconductor and R (https://www.r-project.org/logo/) for the provision of the logos under CC-BY and CC-BY-SA open access licenses.
Figure 2
Figure 2
Results obtained with the original datasets AAC, TC and DC and their combination. (a) Summary of the performance of the four algorithms (AUC), (b) boxplot of the behavior of each model across experiments (AUC), (c) summary of the performance of the four algorithms (accuracy), and (d) boxplot of the behavior of each model across experiments (accuracy).
Figure 3
Figure 3
Results obtained with the RF and SVM algorithms using AAC and the novel parallel correlation pseudo-amino-acid composition and series correlation pseudo-amino-acid composition.
Figure 4
Figure 4
Results obtained in the feature selection process. (a) Summary of the performance of the four algorithms (AUC), (b) boxplot of the behavior of each model across experiments (AUC), (c) summary of the performance of the four algorithms (accuracy), and (d) boxplot of the behavior of each model across experiments (accuracy). The red line represents the best previously published value in the literature by Ramaprasad et al..
Figure 5
Figure 5
Percentage of the variables of each descriptor in the best-performing dataset before (3058 variables) and after (200 variables) the feature selection approach. An increase in the relative quantity of the AAC and DC descriptors is observed.
Figure 6
Figure 6
Relative proportion of the discarded variables (in blue) of the descriptor after applying the FS approach in the best-performing dataset.
Figure 7
Figure 7
Variable importance of 200 features of the glmnet algorithm.

References

    1. Rosca EV, et al. Anti-angiogenic peptides for cancer therapeutics. Current pharmaceutical biotechnology. 2011;12:1101–16. doi: 10.2174/138920111796117300. - DOI - PMC - PubMed
    1. Coras B, et al. Antiangiogenic therapy with pioglitazone, rofecoxib, and trofosfamide in a patient with endemic Kaposi sarcoma. Archives of dermatology. 2004;140:1504–1507. doi: 10.1001/archderm.140.12.1504. - DOI - PubMed
    1. Quiroz-Mercado H, Martinez-Castellanos MA, Hernandez-Rojas ML, Salazar-Teran N, Chan RVP. Antiangiogenic therapy with intravitreal bevacizumab for retinopathy of prematurity. Retina. 2008;28:S19–S25. doi: 10.1097/IAE.0b013e318159ec6b. - DOI - PubMed
    1. Carmeliet P, Jain RK. Angiogenesis in cancer and other diseases. Nature. 2000;407:249–257. doi: 10.1038/35025220. - DOI - PubMed
    1. Ucuzian AA, Gassman AA, East AT, Greisler HP. Molecular mediators of angiogenesis. Journal of burn care & research: official publication of the American Burn Association. 2010;31:158. doi: 10.1097/BCR.0b013e3181c7ed82. - DOI - PMC - PubMed

Publication types