Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 17;26(6):2680.
doi: 10.3390/ijms26062680.

Enhancing HCV NS3 Inhibitor Classification with Optimized Molecular Fingerprints Using Random Forest

Affiliations

Enhancing HCV NS3 Inhibitor Classification with Optimized Molecular Fingerprints Using Random Forest

Sema Atasever. Int J Mol Sci. .

Abstract

The classification of Hepatitis C virus (HCV) NS3 inhibitors is essential for identifying potential antiviral agents through computational methods. This study aims to develop an optimized machine learning (ML) model using random forest (RF) and molecular fingerprints to accurately classify HCV NS3 inhibitors. A dataset of 965 molecules was retrieved from the ChEMBL database, and 290 bioactive compounds were selected for model training. Twelve molecular fingerprint descriptors were tested, and the CDK graph-only fingerprint yielded the best performance. In addition to RF, performance comparisons of other classifiers such as instance-based k-nearest neighbor (IBk), logistic regression (LR), AdaBoost, and OneR were conducted using WEKA with various molecular fingerprint descriptors. The optimized RF model achieved an accuracy of 89.6552%, a mean absolute error (MAE) of 0.2114, a root mean square error (RMSE) of 0.3304, and a Matthews correlation coefficient (MCC) of 0.7950 on the test set. These results highlight the effectiveness of optimized molecular fingerprints in enhancing virtual screening (VS) for HCV inhibitors. This approach offers a data-driven method for drug discovery.

Keywords: HCV NS3 inhibitors; QSAR; computational drug design; machine learning; molecular descriptor optimization.

PubMed Disclaimer

Conflict of interest statement

The author declares no conflicts of interest.

Figures

Figure 1
Figure 1
(a) Frequency plot of bioactivity classes. (b) Scatter plot of MW vs. LogP. Active and inactive compounds are shown in blue and orange colors, respectively.
Figure 2
Figure 2
(ae): A box plot illustrating the comparison of bioactivity classes between active and inactive compounds.
Figure 2
Figure 2
(ae): A box plot illustrating the comparison of bioactivity classes between active and inactive compounds.
Figure 3
Figure 3
A schematic overview of the workflow for QSAR modeling.
Figure 4
Figure 4
(a,b): Distribution of IC50 and pIC50 values.
Figure 5
Figure 5
Main steps of EDA.
Figure 6
Figure 6
Overview of the RF method.

Similar articles

Cited by

References

    1. World Health Organization Hepatitis C. [(accessed on 20 December 2024)]. Available online: https://www.who.int/news-room/fact-sheets/detail/hepatitis-c.
    1. Salam K.A., Akimitsu N. Hepatitis C Virus NS3 Inhibitors: Current and Future Perspectives. Biomed. Res. Int. 2013;2013:467869. doi: 10.1155/2013/467869. - DOI - PMC - PubMed
    1. Bunally S.B., Luscombe C.N., Young R.J. Using Physicochemical Measurements to Influence Better Compound Design. SLAS Discov. Adv. Life Sci. R&D. 2019;24:791–801. - PubMed
    1. Malik A.A., Phanus-umporn C., Schaduangrat N., Shoombuatong W., Isarankura-Na-Ayudhya C., Nantasenamat C. HCVpred: A Web Server for Predicting the Bioactivity of Hepatitis C Virus NS5B Inhibitors. J. Comput. Chem. 2020;41:1820–1834. doi: 10.1002/jcc.26223. - DOI - PubMed
    1. Atasever S. In Silico Drug Discovery: A Machine Learning-Driven Systematic Review. Med. Chem. Res. 2024;33:1465–1490. doi: 10.1007/s00044-024-03260-w. - DOI

MeSH terms

LinkOut - more resources