Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 7;26(24):7428.
doi: 10.3390/molecules26247428.

Prediction of Blood-Brain Barrier Penetration (BBBP) Based on Molecular Descriptors of the Free-Form and In-Blood-Form Datasets

Affiliations

Prediction of Blood-Brain Barrier Penetration (BBBP) Based on Molecular Descriptors of the Free-Form and In-Blood-Form Datasets

Hiroshi Sakiyama et al. Molecules. .

Abstract

The blood-brain barrier (BBB) controls the entry of chemicals from the blood to the brain. Since brain drugs need to penetrate the BBB, rapid and reliable prediction of BBB penetration (BBBP) is helpful for drug development. In this study, free-form and in-blood-form datasets were prepared by modifying the original BBBP dataset, and the effects of the data modification were investigated. For each dataset, molecular descriptors were generated and used for BBBP prediction by machine learning (ML). For ML, the dataset was split into training, validation, and test data by the scaffold split algorithm MoleculeNet used. This creates an unbalanced split and makes the prediction difficult; however, we decided to use that algorithm to evaluate the predictive performance for unknown compounds dissimilar to existing ones. The highest prediction score was obtained by the random forest model using 212 descriptors from the free-form dataset, and this score was higher than the existing best score using the same split algorithm without using any external database. Furthermore, using a deep neural network, a comparable result was obtained with only 11 descriptors from the free-form dataset, and the resulting descriptors suggested the importance of recognizing the glucose-like characteristics in BBBP prediction.

Keywords: blood-brain barrier penetration (BBBP); deep neural network (DNN); forward search; free-form dataset; in-blood-form dataset; machine learning (ML); molecular descriptor; random forest (RF).

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Chemical forms of an example compound: a free form (A), a frequently used formal salt form (B), an actual salt form (C), and a protonated form in aqueous solution (D).
Figure 2
Figure 2
The ROC-AUC scores in comparison of the three datasets: intact, free-form, and in-blood-form datasets, using a DNN model (a) and a RF model (b) with 200 molecular descriptors.
Figure 3
Figure 3
Importance of the top 50 molecular descriptors obtained by the RF method for the free-form dataset (a) and the in-blood-form dataset (b).
Figure 4
Figure 4
Distribution of positive and negative BBBP properties in the free-form dataset with respect to the number of hydrogen bond donors (a), the number of hydrogen bond acceptors (b), the molecular weight (c), and the MolLogP value (d).
Figure 4
Figure 4
Distribution of positive and negative BBBP properties in the free-form dataset with respect to the number of hydrogen bond donors (a), the number of hydrogen bond acceptors (b), the molecular weight (c), and the MolLogP value (d).
Figure 5
Figure 5
ROC-AUC scores of the prediction for the free-form and in-blood-form datasets obtained by DNN and ensemble methods.
Figure 6
Figure 6
BBBP ratio with respect to the number of aliphatic heterocycles (n) and the number of aliphatic hydroxy groups excluding tertiary alcohol OH, showing BBBP positive (blue) and negative (pink) for n = 0 (a), 1 (b), 2 (c), 3 (d), 4 (e), and 5 (f).
Figure 7
Figure 7
Chemical structures of β-glucose (A), salicin (B), amikacin (C), and plicamycin (D).
Figure 8
Figure 8
ROC-AUC scores of the prediction for the free-form and in-blood-form datasets obtained by RF method. Descriptor set is in parentheses.
Figure 9
Figure 9
ROC curves for RF:Free(Large212) (a), CB:Free(Large212) (b), DNN:Free(FreeV11) (c), and DNN:Blood(BloodTV11) (d).
Figure 10
Figure 10
ROC-AUC scores of the top six prediction results and ensemble results.
Figure 11
Figure 11
Distribution of ROC-AUC scores for RF:Free(Large212) with scaffold split.
Figure 12
Figure 12
Distributions of ROC-AUC scores for the the test set of 31 compounds (a) and for the the test set of 95 compounds (b).

References

    1. Bentivoglio M., Kristensson K. Tryps and trips: Cell trafficking across the 100-year-old blood–brain barrier. Trends Neurosci. 2014;37:325–333. doi: 10.1016/j.tins.2014.03.007. - DOI - PMC - PubMed
    1. Abbott N.J., Patabendige A.A.K., Dolman D.E.M., Yusof S.R., Begley D.J. Structure and function of the blood–brain barrier. Neurobiol. Dis. 2010;37:13–25. doi: 10.1016/j.nbd.2009.07.030. - DOI - PubMed
    1. Graff C.L., Pollack G.M. Drug transport at the blood–brain barrier and the choroid plexus. Curr. Drug Metab. 2004;5:95–108. doi: 10.2174/1389200043489126. - DOI - PubMed
    1. Golden P.L., Pollack G.M. Blood–brain barrier efflux transport. Adv. Drug Deliv. Rev. 2003;92:1739–1753. doi: 10.1002/jps.10424. - DOI - PubMed
    1. Abbott N.J. Astrocyte–endothelial interactions and blood–brain barrier permeability. J. Anat. 2002;200:629–638. doi: 10.1046/j.1469-7580.2002.00064.x. - DOI - PMC - PubMed

LinkOut - more resources