Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Aug 12:19:4538-4558.
doi: 10.1016/j.csbj.2021.08.011. eCollection 2021.

A review on machine learning approaches and trends in drug discovery

Affiliations
Review

A review on machine learning approaches and trends in drug discovery

Paula Carracedo-Reboredo et al. Comput Struct Biotechnol J. .

Abstract

Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.

Keywords: ADMET, Absorption, distribution, metabolism, elimination and toxicity; ADR, Adverse Drug Reaction; AI, Artificial Intelligence; ANN, Artificial Neural Networks; APFP, Atom Pairs 2d FingerPrint; AUC, Area under the Curve; BBB, Blood–Brain barrier; CDK, Chemical Development Kit; CNN, Convolutional Neural Networks; CNS, Central Nervous System; CPI, Compound-protein interaction; CV, Cross Validation; Cheminformatics; DL, Deep Learning; DNA, Deoxyribonucleic acid; Deep Learning; Drug Discovery; ECFP, Extended Connectivity Fingerprints; FDA, Food and Drug Administration; FNN, Fully Connected Neural Networks; FP, Fringerprints; FS, Feature Selection; GCN, Graph Convolutional Networks; GEO, Gene Expression Omnibus; GNN, Graph Neural Networks; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; MACCS, Molecular ACCess System; MCC, Matthews correlation coefficient; MD, Molecular Descriptors; MKL, Multiple Kernel Learning; ML, Machine Learning; Machine Learning; Molecular Descriptors; NB, Naive Bayes; OOB, Out of Bag; PCA, Principal Component Analyisis; QSAR; QSAR, Quantitative structure–activity relationship; RF, Random Forest; RNA, Ribonucleic Acid; SMILES, simplified molecular-input line-entry system; SVM, Support Vector Machines; TCGA, The Cancer Genome Atlas; WHO, World Health Organization; t-SNE, t-Distributed Stochastic Neighbor Embedding.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Stages in the discovery of new drugs in the context of precision medicine.
Fig. 2
Fig. 2
Machine Learning methodology commonly used for drug discovery.
Fig. 3
Fig. 3
Representation of the information coded by the different molecular descriptors according to their dimensions.
Fig. 4
Fig. 4
The number of identified items that have used the most common fingerprints is represented.
Fig. 5
Fig. 5
Counting of identified articles classified according to the biological problem addressed. The sampling of selected articles was during the period from 2016–2020.
Fig. 6
Fig. 6
Timeline of Machine Learning main events in drug discovery field. The figure represents the main events of Machine Learning in drug discovery field. In addition, a line plot was drawn to show the paper counts along the time. Each algorithm is represented by a color line. The y-axis represents the number of papers published in PubMed.

References

    1. Collins F.S., Varmus H. A new initiative on precision medicine. New England J Med. 2015;372(9):793–795. - PMC - PubMed
    1. Curtis C., Shah S.P., Chin S.-F., Turashvili G., Rueda O.M., Dunning M.J., Speed D., Lynch A.G., Samarajiwa S., Yuan Y. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;486(7403):346–352. - PMC - PubMed
    1. Romond E.H., Perez E.A., Bryant J., Suman V.J., Geyer C.E., Jr, Davidson N.E., Tan-Chiu E., Martino S., Paik S., Kaufman P.A. Trastuzumab plus adjuvant chemotherapy for operable her2-positive breast cancer. N Engl J Med. 2005;353(16):1673–1684. - PubMed
    1. Blanco J.L., Porto-Pazos A.B., Pazos A., Fernandez-Lozano C. Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection. Sci Rep. 2018;8(1):1–11. - PMC - PubMed
    1. Munteanu C.R., Fernández-Blanco E., Seoane J.A., Izquierdo-Novo P., Angel Rodriguez-Fernandez J., Maria Prieto-Gonzalez J., Rabunal J.R., Pazos A. Drug discovery and design for complex diseases through qsar computational methods. Current Pharmaceutical Des. 2010;16(24):2640–2655. - PubMed