Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(7):e40654.
doi: 10.1371/journal.pone.0040654. Epub 2012 Jul 16.

Systems biological approach of molecular descriptors connectivity: optimal descriptors for oral bioavailability prediction

Affiliations

Systems biological approach of molecular descriptors connectivity: optimal descriptors for oral bioavailability prediction

Shiek S S J Ahmed et al. PLoS One. 2012.

Abstract

Background: Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties.

Results: The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/-bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction.

Conclusion: The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Systems biological framework for developing molecular descriptors connectivity maps.
The framework consists of five major components: (i) compilation and curation of data sets, (ii) generation of descriptors (iii) multivariate analysis (iv) machine learning and (v) statistical analysis. The first component takes the inputs from literature and outputs the curated data sets. The second component takes the input from the curated data sets and generates molecular descriptors using E-dragon software. In the third component, HIA and caco-2 permeability data sets were subjected to multivariate analysis to obtain the most contributing descriptors involved in classification of groups against each data set. In the fourth component, the contributing descriptors were subjected to machine learning approach to determine the predictive accuracy of the models. The final statistical component was generated for the descriptors associated between data sets showing the interdependence between the descriptors.
Figure 2
Figure 2. Analyzed descriptors.
Four major descriptors (red) and their respective descriptor sub-classes were analyzed in this study.
Figure 3
Figure 3. Multivariate analysis.
PLS-DA plots: HIA (panel A) and caco-2 permeability (panel B) showing a significant differentiation (p≤0.01 by permutation test) between the groups. The observations were coded according to class membership: black  =  positive; gray  =  negative. The descriptors which have a VIP score ≥1 were selected (colored blue) as the most contributing descriptors for HIA (panel C) and caco-2 (panel D). Heat map analysis of descriptors between positive and negative instance of HIA (panel E) and caco-2 (panel F) which depicts high (red) and low (yellow) relative levels of descriptor variations.
Figure 4
Figure 4. Machine learning algorithm.
The performance of 21 machine learning algorithms for the prediction of HIA (panel A) and caco-2 (panel B) data sets were measured as averaged accuracy of 10-fold cross-validation analysis (the algorithm showing highest predictive accuracy indicated in blue). The predictive accuracy of the logistic algorithm was based on individual descriptors compared with the combined descriptors of HIA (panel C) and caco-2 data sets (panel D).
Figure 5
Figure 5. Correlation-based feature selection (CFS).
The network representing the descriptors obtained using CFS algorithm showing common (pink) and unique (blue) descriptors between the data sets.
Figure 6
Figure 6. Comparative performance of the logistic model.
Bar diagram (panel A) representing the comparative performance of the logistic model for the descriptors selected using PLS-DA and CFS algorithms. The histogram (panel B, C and D) shows the accuracy distribution of smaller data sets of HIA (m  = 681), caco-2 (m  = 710) and oral bioavailability (m  = 741).
Figure 7
Figure 7. Descriptors interaction analysis.
Descriptors interaction map showing the unique (pink) and common (blue) descriptors between HIA and caco-2 data sets. The commonly associated 47 descriptors (blue) were considered as the most contributing descriptors for the oral bioavailability prediction.
Figure 8
Figure 8. Oral bioavailability models.
The performance of machine learning algorithms for the prediction of oral bioavailability (panel A) was measured as the average accuracy of 10-fold cross-validation (the algorithm showing highest predictive accuracy indicated in blue). Heat map analysis of descriptors between positive and negative instance of oral bioavailability (panel B) showing significant difference in descriptors depicts the high (red) and low (yellow) relative levels of descriptor variations. The predictive accuracy of the logistic algorithm was based on individual descriptors compared to the 47 combined descriptors of oral bioavailability data set (panel C).

References

    1. Ahmed SS, Ahameethunisa AR, Santosh W, Chakravarthy S, Kumar S. Systems biological approach on neurological disorders: a novel molecular connectivity to aging and psychiatric diseases. BMC Syst Biol. 2011;5:6. - PMC - PubMed
    1. Li J, Zhu X, Chen JY. Building disease-specific drug-protein connectivity maps from molecular interaction networks and PubMed abstracts. PLoS Comput Biol. 2009;5:e1000450. - PMC - PubMed
    1. Hu G, Agarwal P. Human disease-drug network based on genomic expression Profiles. PLoS ONE. 2009;4:e6536. - PMC - PubMed
    1. Graham RJ, Robert ZH, David TL. Pharmacokinetics and Its Role in Small Molecule Drug Discovery Research. Med Res Rev. 2001;21:382–396. - PubMed
    1. Nassar AE, Kamel AM, Clarimont C. Improving the decision-making process in the structural modification of drug candidates: enhancing metabolic stabilit Drug Discov Today. 2004;9:1020–1028. - PubMed

Publication types