Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep 24;5(1):42.
doi: 10.1186/1758-2946-5-42.

Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

Affiliations

Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

Gerard Jp van Westen et al. J Cheminform. .

Abstract

Background: While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants.

Results: The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last.

Conclusions: While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still - on average - surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Mean performance of the benchmarked descriptor sets in the ACE inhibitors 70–30 validation experiments. The mean is calculated over ten different experiments and the error bars represent the standard deviation. Shown are the R02(A) and the RMSE (B). It can be seen that Z-scales (Binned), and ProtFP (PCA3) combined with Z-Scales (Binned) performed the best on this dataset, followed by Z-scales (5). The ProtFP (Feature) descriptor set showed worst performance in this case.
Figure 2
Figure 2
PCA plot of ACE inhibitor similarity. (A) Shown are the best performing descriptor set in the ACE inhibitor experiment (Z-Scales (Binned)), and (B) the worst performing descriptor set (ProtFP (Feature)). From the plot the reasons for their respective performance becomes apparent as part A of the figure shows a clear distribution in space correlating with the activity (indicated by the color), which is less the case for (B).
Figure 3
Figure 3
Mean performance of the benchmarked descriptor sets in the GPCR 70–30 validation experiments. The mean is calculated over all 32 receptors (performed 10 times) and the error bar represents the standard deviation. Shown are the MCC (A) and the sensitivity (B). The differences between individual descriptor sets are smaller (MCC difference < 0.030, sensitivity difference < 0.020) than in the ACE inhibitor experiments, likely due to the fact that models are based on both chemical and protein similarity. For individual receptors larger performance differences occur (mean MCC difference 0.712, mean sensitivity difference 0.231) (See Additional file 1: Figure S4 for details). Z-scales (3) perform the best on this dataset, while ProtFP (Feature) performs the worst.
Figure 4
Figure 4
Mean performance of the benchmarked descriptor sets in the GPCR LOSO validation experiments. The mean is calculated over all 32 receptors and the error bar represents the standard deviation. Shown are the MCC (A) and the sensitivity (B). Note that error bars are large due to different performance between models trained on different GPCRs, not between repeats of the individual models. Here extrapolation takes place on the target side as the test set contains unseen targets. The differences between individual descriptor sets are small. Again for individual receptors larger performance differences occur (see main text and Additional file 1: Figure S11 for details). In this case, Z-Scales (3) and Z-Scales (Avg) is the descriptor set exhibiting best performance while ProtFP (Feature) performs the worst.
Figure 5
Figure 5
PCA plot of GPCR data set target space. (A) Shown are the best performing descriptor (Z-Scales (3) and Z-Scales (Avg)), and (B) the worst performing descriptor (ProtFP (Feature)). The (A) panel shows are more explicit clustering compared to ProtFP (Feature) in (B). The red circles indicate the histamine receptors and the black circles the muscarinic acetylcholine receptors. The lower performance of the histamine receptor family can be rationalized in both cases, as no clear clustering is apparent for this family. Conversely, both plots demonstrate clustering for the ACM receptors, which might explain the good performance.
Figure 6
Figure 6
Mean performance of the benchmarked descriptor sets in the NNRTIs 70–30 validation experiments. The mean is calculated over all 14 mutants (performed 10 times) and the error bar represents the standard deviation. Shown are the R02(A) and the RMSE (B). (See Additional file 1: Figure S15 for details.) Slightly more variance is seen compared to the GPCR experiments. In this case BLOSUM performs the worst among all descriptor sets considered, while ProtFP (Feature) performs the best.
Figure 7
Figure 7
Mean performance of the benchmarked descriptor sets in the NNRTIs LOSO validation experiments. The mean is calculated over all 14 mutants and the error bar represents the standard deviation. Shown are the R02(A) and the RMSE (B). Note that error bars are large due to different performance between models trained on different mutants, not between repeats of the individual models. Extrapolation takes place on the target side as the test set contains unseen targets. The differences between individual descriptor sets are still small but the spread of the standard deviation increases. Again for individual receptors larger performance differences occur (see main text and Additional file 1: Figure S10 for details). In this part of the study ProtFP (Feature) shows very good performance, which indicates that a simplified representation on the protein side is favorable for this data set.
Figure 8
Figure 8
PCA plots of the best and worst performing descriptor sets on the NNRTI benchmark. (A) The simplified representation of ProtFP (Feature) proves to be an advantage on this congeneric data set as the distance in PCA space better correlates to the distance in bioactivity space. (B) The ST-Scales on the other hand perform the least well, and it can be hypothesized that the tight clustering in one part of the plot does not correlate to bioactivity space.
Figure 9
Figure 9
Mean performance of the benchmarked descriptor sets in the PIs 70–30 validation experiments. The mean is calculated over all repeats (performed 10 times) and the error bar represents the standard deviation. Shown are the R02(A) and the RMSE (B). Slightly more variance between descriptor sets is seen compared to the GPCR experiments and NNRTI experiments. In this case ProtFP (Feature) performs the worst among all descriptor sets considered, while BLOSUM performs the best.
Figure 10
Figure 10
Mean performance of the benchmarked descriptor sets in the PIs LOSO validation experiments. The mean is calculated over all mutants (leaving out 10% at a time) and the error bar represents the standard deviation. Shown are the R02(A) and the RMSE (B). Again for individual targets larger performance differences occur (see main text for details). In this part of the study ProtFP (Feature) performs poorly, while it performs very well when paired with Z-Scales (3). The best performance is by Z-Scales (3).
Figure 11
Figure 11
PCA plots of target similarity of the protease mutants. Shown are (A) the best and (B) worst performing descriptor sets. The feature based descriptor only codes for presence or absence of features. This leads to points scattered over a smaller area in PCA space and could explain the decreased performance (B). However, the information is shown to have a synergistic effect when combined with a physicochemical property based descriptor (A).
Figure 12
Figure 12
Median rank of the descriptor sets in the bioactivity benchmarks. The median is calculated over all 14 ranks (1 rank per dataset, per experiment, per validation type), also shown the median average deviation (MAD). The best three descriptor sets have a median rank < 5 among which the combinations of Z-scales (3) with other descriptors perform the best. The worst performance is by BLOSUM, ProtFP (PCA8), ST-scales and ProtFP (Feature) with a mean rank > 11. BLOSUM has a large standard deviation due to its inconsistent performance.

Similar articles

Cited by

References

    1. Lapinsh M, Prusis P, Gutcaits A, Lundstedt T, Wikberg JE. Development of proteo-chemometrics: a novel technology for the analysis of drug-receptor interactions. Biochim Biophys Acta. 2001;5:180–190. doi: 10.1016/S0304-4165(00)00187-2. - DOI - PubMed
    1. Wikberg JES, Mutulis F, Mutule I, Veiksina S, Lapinsh M, Petrovska R, Prusis P. In: Annals of the New York Academy of Sciences Volume 994. Braaten D, editor. New York: Blackwell Publishing Ltd; 2003. Melanocortin receptors: ligands and proteochemometrics modeling; pp. 21–26. - PubMed
    1. Kontijevskis A, Prusis P, Petrovska R, Yahorava S, Mutulis F, Mutule I, Komorowski J, Wikberg JE. A look inside HIV resistance through retroviral protease interaction maps. PLoS Comput Biol. 2007;5:e48. doi: 10.1371/journal.pcbi.0030048. - DOI - PMC - PubMed
    1. Van Westen GJP, Wegner JK, Ijzerman AP, Van Vlijmen HWT, Bender A. Proteochemometric modeling as a tool for designing selective compounds and extrapolating to novel targets. Med Chem Commun. 2011;5:16–30. doi: 10.1039/c0md00165a. - DOI
    1. Van Westen GJP, Van den Hoven OO, Van der Pijl R, Mulder-Krieger T, de Vries H, Wegner JK, Ijzerman AP, Van Vlijmen HWT, Bender A. Identifying novel adenosine receptor ligands by simultaneous proteochemometric modeling of Rat and human bioactivity data. J Med Chem. 2012;5(16):7010–7020. doi: 10.1021/jm3003069. - DOI - PubMed

LinkOut - more resources