Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 22;15(1):17851.
doi: 10.1038/s41598-025-00873-y.

Optimizing credit card fraud detection with random forests and SMOTE

Affiliations

Optimizing credit card fraud detection with random forests and SMOTE

P Sundaravadivel et al. Sci Rep. .

Abstract

In the financial world, Credit card fraud is a budding apprehension in the banking sector, necessitating the development of efficient detection methods to minimize financial losses. The usage of credit cards is experiencing a steady increase, thereby leading to a rise in the default rate that banks encounter. Although there has been much research investigating the efficacy of conventional Machine Learning (ML) models, there has been relatively less emphasis on Deep Learning (DL) techniques. In this article, a machine learning-based system to detect fraudulent transactions using a publicly available dataset of credit card transactions. The dataset, highly imbalanced with fraudulent transactions representing less than 0.2% of the total, was processed using techniques like Synthetic Minority Over-sampling Technique (SMOTE) to handle class imbalance. To predict credit card default, this study evaluates the efficacy of a DL (Deep Learning) model and compares it to other ML models, such as Decision Tree (DT) and Adaboost. The objective of this research is to identify the specific DL parameters that contribute to the observed enhancements in the accuracy of credit card default prediction. This research makes use of the UCI ML repository to access the credit card defaulted customer dataset. Subsequently, various techniques are employed to pre-process the unprocessed data and visually present the outcomes through the use of exploratory data analysis (EDA). Furthermore, the algorithms are hyper tuned to evaluate the enhancement in prediction. We used standard evaluation metrics to evaluate all the models. The evaluation indicates that the Adaboost and DT exhibit the highest accuracy rate of 82 ​% in predicting credit card default, surpassing the accuracy of the ANN model, which is 78 ​%. Several classification algorithms, comprising Logistic Regression, Random Forest, and Neural Networks, were evaluated to determine their effectiveness in identifying fraudulent activities. The Random Forest model emerged as the best performing algorithm with an accuracy of 99.5% and a high recall score, indicating its robustness in detecting fraudulent transactions. This system can be deployed in real-time financial systems to enhance fraud prevention mechanisms and ensure secure financial transactions.

Keywords: Anomaly detection; Classification algorithms; Credit card fraud detection; Financial security; Imbalanced data; Machine learning; Random forest; SMOTE.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Fraud detection system workflow.
Fig. 2
Fig. 2
Fraud detection decision flowchart.
Fig.3
Fig.3
Machine learning-based fraud detection flowchart.
Fig. 4
Fig. 4
Dataset class distribution.
Fig. 5
Fig. 5
Sigmoid function in logistic regression.
Fig. 6
Fig. 6
Performance contribution of logistic regression and random forest.
Algorithm 1
Algorithm 1
Credit card fraud detection using random forest & SMOTE.
Fig. 7
Fig. 7
Web interface for an AI fraud detection system.
Fig. 8
Fig. 8
Fraud detection results page.
Fig. 9
Fig. 9
Performance metrics of different machine learning models.
Fig. 10
Fig. 10
. Model performance comparison chart.
Fig. 11
Fig. 11
. Confusion matrix and ROC curve for fraud detection model.

References

    1. Liu, Y., Zhao, Y. & Nehorai, A. Risk modeling and fraud detection for consumer credit card data. IEEE Trans. Inf. Forensics Secur.15, 2340–2351. 10.1109/TIFS.2020.2988640 (2020).
    1. T. Nguyen, R. Patel, and J. Kim, A comparative study of machine learning algorithms for credit risk assessment In Proc. of the 2023 IEEE Symposium on Computational Finance Singapore 78–90. 10.1109/SCF.2023.7654321. (2023)
    1. Y. Bengio, Learning deep architectures for AI In foundations and trends® in machine learning 2 (1) 1-127 10.1561/2200000006 (2009)
    1. S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang, and C. Jiang Random forest for credit card fraud detection In 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC) Zhuhai 27–29 March 1–6 (2018).
    1. Chawla, N., Bowyer, K., Hall, L. & Kegelmeyer, W. SMOTE: Synthetic minority oversampling technique. J. Artif. Intell. Res.16, 321–357. 10.1613/jair.953 (2002).

LinkOut - more resources