. 2025 May 22;15(1):17851.

doi: 10.1038/s41598-025-00873-y.

Optimizing credit card fraud detection with random forests and SMOTE

P Sundaravadivel¹, R Augustian Isaac², D Elangovan², D KrishnaRaj², V V Lokesh Rahul², R Raja²

Affiliations

¹ Saveetha Engineering College, Chennai, 602105, Tamilnadu, India. sundar.me2009@gmail.com.
² Saveetha Engineering College, Chennai, 602105, Tamilnadu, India.

PMID: 40404766
PMCID: PMC12098799
DOI: 10.1038/s41598-025-00873-y

Optimizing credit card fraud detection with random forests and SMOTE

P Sundaravadivel et al. Sci Rep. 2025.

. 2025 May 22;15(1):17851.

doi: 10.1038/s41598-025-00873-y.

Authors

P Sundaravadivel¹, R Augustian Isaac², D Elangovan², D KrishnaRaj², V V Lokesh Rahul², R Raja²

Affiliations

¹ Saveetha Engineering College, Chennai, 602105, Tamilnadu, India. sundar.me2009@gmail.com.
² Saveetha Engineering College, Chennai, 602105, Tamilnadu, India.

PMID: 40404766
PMCID: PMC12098799
DOI: 10.1038/s41598-025-00873-y

Abstract

In the financial world, Credit card fraud is a budding apprehension in the banking sector, necessitating the development of efficient detection methods to minimize financial losses. The usage of credit cards is experiencing a steady increase, thereby leading to a rise in the default rate that banks encounter. Although there has been much research investigating the efficacy of conventional Machine Learning (ML) models, there has been relatively less emphasis on Deep Learning (DL) techniques. In this article, a machine learning-based system to detect fraudulent transactions using a publicly available dataset of credit card transactions. The dataset, highly imbalanced with fraudulent transactions representing less than 0.2% of the total, was processed using techniques like Synthetic Minority Over-sampling Technique (SMOTE) to handle class imbalance. To predict credit card default, this study evaluates the efficacy of a DL (Deep Learning) model and compares it to other ML models, such as Decision Tree (DT) and Adaboost. The objective of this research is to identify the specific DL parameters that contribute to the observed enhancements in the accuracy of credit card default prediction. This research makes use of the UCI ML repository to access the credit card defaulted customer dataset. Subsequently, various techniques are employed to pre-process the unprocessed data and visually present the outcomes through the use of exploratory data analysis (EDA). Furthermore, the algorithms are hyper tuned to evaluate the enhancement in prediction. We used standard evaluation metrics to evaluate all the models. The evaluation indicates that the Adaboost and DT exhibit the highest accuracy rate of 82 % in predicting credit card default, surpassing the accuracy of the ANN model, which is 78 %. Several classification algorithms, comprising Logistic Regression, Random Forest, and Neural Networks, were evaluated to determine their effectiveness in identifying fraudulent activities. The Random Forest model emerged as the best performing algorithm with an accuracy of 99.5% and a high recall score, indicating its robustness in detecting fraudulent transactions. This system can be deployed in real-time financial systems to enhance fraud prevention mechanisms and ensure secure financial transactions.

Keywords: Anomaly detection; Classification algorithms; Credit card fraud detection; Financial security; Imbalanced data; Machine learning; Random forest; SMOTE.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Fraud detection system workflow.

**Fig. 2**
Fraud detection decision flowchart.

**Fig.3**
Machine learning-based fraud detection flowchart.

**Fig. 5**
Sigmoid function in logistic regression.

**Fig. 6**
Performance contribution of logistic regression and random forest.

**Algorithm 1**
Credit card fraud detection using random forest & SMOTE.

**Fig. 7**
Web interface for an AI fraud detection system.

**Fig. 8**
Fraud detection results page.

**Fig. 9**
Performance metrics of different machine learning models.

**Fig. 10**
. Model performance comparison chart.

**Fig. 11**
. Confusion matrix and ROC curve for fraud detection model.

See this image and copyright information in PMC

References

1. Liu, Y., Zhao, Y. & Nehorai, A. Risk modeling and fraud detection for consumer credit card data. IEEE Trans. Inf. Forensics Secur.15, 2340–2351. 10.1109/TIFS.2020.2988640 (2020).
1. T. Nguyen, R. Patel, and J. Kim, A comparative study of machine learning algorithms for credit risk assessment In Proc. of the 2023 IEEE Symposium on Computational Finance Singapore 78–90. 10.1109/SCF.2023.7654321. (2023)
1. Y. Bengio, Learning deep architectures for AI In foundations and trends® in machine learning 2 (1) 1-127 10.1561/2200000006 (2009)
1. S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang, and C. Jiang Random forest for credit card fraud detection In 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC) Zhuhai 27–29 March 1–6 (2018).
1. Chawla, N., Bowyer, K., Hall, L. & Kegelmeyer, W. SMOTE: Synthetic minority oversampling technique. J. Artif. Intell. Res.16, 321–357. 10.1613/jair.953 (2002).

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimizing credit card fraud detection with random forests and SMOTE

Affiliations

Optimizing credit card fraud detection with random forests and SMOTE

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources