RESEARCH ARTICLES
Published 2025-04-20
Keywords
- Healthcare fraud,
- imbalanced data,
- machine learning,
- Synthetic Minority Oversampling Technique (SMOTE),
- Area Under the Precision-Recall Curve (AUPRC)
How to Cite
A Krishnapriya, S Arshiya, M Shabnam, D L Deekshith, S M D Rasheed, & R Manikanta Reddy. (2025). Machine Learning For Medicare Fraud Detection: Tackling Class Imbalance With SMOTE-ENN. International Journal of Computational Learning & Intelligence, 4(4), 716–724. https://doi.org/10.5281/zenodo.15251088
Copyright (c) 2025 A Krishnapriya, S Arshiya, M Shabnam, D L Deekshith, S M D Rasheed, R Manikanta Reddy

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
The realm of healthcare fraud detection is continually changing and encounters substantial obstacles, especially when dealing with data imbalance problems. Earlier research primarily concentrated on standard machine learning (ML) methods, which often have difficulty with imbalanced data. This issue manifests in several ways. It involves the danger of overfitting with Random Oversampling (ROS), the creation of noise by the Synthetic Minority Oversampling Technique (SMOTE), and the possible loss of vital information with Random Undersampling (RUS). Furthermore, enhancing model performance, examining hybrid resampling techniques, and refining evaluation metrics are essential for achieving greater accuracy with imbalanced datasets. In this study, we introduce a new technique to address the problem of imbalanced datasets in healthcare fraud detection, specifically focusing on the Medicare Part B dataset. Initially, we carefully remove the categorical feature ‘‘Provider Type’’ from the dataset. This enables us to create new, synthetic instances by randomly copying existing types, thus increasing the diversity within the minority class. Subsequently, we implement a hybrid resampling method called SMOTE ENN, which combines the Synthetic Minority Over-sampling Technique (SMOTE) with Edited Nearest Neighbours (ENN).References
- Agrawal, N., & Panigrahi, S. (2023). A comparative analysis of fraud detection in healthcare using data balancing and machine learning techniques. In Proceedings of the International Conference on Communication, Circuits, and Systems (IC3S) (pp. 1–4). https://doi.org/10.1109/IC3S57876.2023.10119726
- Ahmed, S. T., Basha, S. M., Ramachandran, M., Daneshmand, M., & Gandomi, A. H. (2023). An edge-AI-enabled autonomous connected ambulance-route resource recommendation protocol (ACA-R3) for eHealth in smart cities. IEEE Internet of Things Journal, 10(13), 11497-11506.
- Alanazi, A. (2022). Using machine learning for healthcare challenges and opportunities. Informatics in Medicine Unlocked, 30, 100924. https://doi.org/10.1016/j.imu.2022.100924
- Bauder, R. A., & Khoshgoftaar, T. M. (2017). Medicare fraud detection using machine learning methods. In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 858–865). https://doi.org/10.1109/ICMLA.2017.00-89
- Bauder, R. A., & Khoshgoftaar, T. M. (2018). The detection of Medicare fraud using machine learning methods with excluded provider labels. In Proceedings of the Thirty-First International FLAIRS Conference (pp. 1–6).
- Bauder, R. A., & Khoshgoftaar, T. M. (2018). The effects of varying class distribution on learner behaviour for Medicare fraud detection with imbalanced big data. Health Information Science and Systems, 6(1), 1–14. https://doi.org/10.1007/s13755-018-0052-7
- Bauder, R., da Rosa, R., & Khoshgoftaar, T. (2018). Identifying Medicare provider fraud with unsupervised machine learning. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI) (pp. 285–292). https://doi.org/10.1109/IRI.2018.00051
- Brennan, P. (2012). A comprehensive survey of methods for overcoming the class imbalance problem in fraud detection. Institute of Technology Blanchardstown Dublin, Technical Report.
- Centers for Medicare and Medicaid Services. (2017). Research, statistics, data, and systems. https://www.cms.gov/research-statistics-data-and-systems/research-statistics-data-and-systems.html
- Dua, P., & Bais, S. (2014). Supervised learning methods for fraud detection in healthcare insurance. In S. Dua, U. Acharya, & P. Dua (Eds.), Machine Learning in Healthcare Informatics (Intelligent Systems Reference Library, Vol. 56) (pp. 263–284). Springer. https://doi.org/10.1007/978-3-642-40017-9_12
- Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239. https://doi.org/10.1016/j.eswa.2016.12.034
- Hancock, J. T., Bauder, R. A., Wang, H., & Khoshgoftaar, T. M. (2023). Explainable machine learning models for Medicare fraud detection. Journal of Big Data, 10(1), 154. https://doi.org/10.1186/s40537-023-00803-z
- Hancock, J., & Khoshgoftaar, T. M. (2022). Optimizing ensemble trees for big data healthcare fraud detection. In Proceedings of the IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI) (pp. 243–249). https://doi.org/10.1109/IRI54793.2022.00056
- Ahmed, S. T., Fathima, A. S., Nishabai, M., & Sophia, S. (2024). Medical ChatBot assistance for primary clinical guidance using machine learning techniques. Procedia Computer Science, 233, 279-287.
- Hancock, J., Khoshgoftaar, T. M., & Johnson, J. M. (2022). The effects of random undersampling for big data Medicare fraud detection. In Proceedings of the IEEE International Conference on Service-Oriented System Engineering (SOSE) (pp. 141–146). https://doi.org/10.1109/SOSE55442.2022.00031
- Herland, M., Bauder, R. A., & Khoshgoftaar, T. M. (2019). The effects of class rarity on the evaluation of supervised healthcare fraud detection models. Journal of Big Data, 6(1), 33. https://doi.org/10.1186/s40537-019-0190-0
- Johnson, J. M., & Khoshgoftaar, T. M. (2023). Data-centric AI for healthcare fraud detection. Social Network Analysis and Computer Science, 4(4), 389. https://doi.org/10.1007/s13278-023-01156-z
- Kumar, S. S., Ahmed, S. T., Sandeep, S., Madheswaran, M., & Basha, S. M. (2022). Unstructured Oncological Image Cluster Identification Using Improved Unsupervised Clustering Techniques. Computers, Materials & Continua, 72(1).
- Kumaraswamy, N., Ekin, T., Park, C., Markey, M. K., Barner, J. C., & Rascati, K. (2024). Using a Bayesian belief network to detect healthcare fraud. Expert Systems with Applications, 238, 122241. https://doi.org/10.1016/j.eswa.2023.122241
- Kumaraswamy, N., Markey, M. K., Barner, J. C., & Rascati, K. (2022). Feature engineering to detect fraud using healthcare claims data. Expert Systems with Applications, 210, 118433. https://doi.org/10.1016/j.eswa.2022.118433
- Mehbodniya, A., Alam, I., Pande, S., Neware, R., Rane, K. P., Shabaz, M., & Madhavan, M. V. (2021). Financial fraud detection in healthcare using machine learning and deep learning techniques. Security and Communication Networks, 2021, 1–8. https://doi.org/10.1155/2021/5552467
- Morris, L. (2009). Combating fraud in health care: An essential component of any cost containment strategy. Health Affairs, 28(5), 1351–1356.
- Nalluri, V., Chang, J.-R., Chen, L.-S., & Chen, J.-C. (2023). Building prediction models and discovering important factors of health insurance fraud using machine learning methods. Journal of Ambient Intelligence and Humanized Computing, 14(7), 9607–9619. https://doi.org/10.1007/s12652-022-04101-y
- Pasha, A., Ahmed, S. T., Painam, R. K., Mathivanan, S. K., Mallik, S., & Qin, H. (2024). Leveraging ANFIS with Adam and PSO optimizers for Parkinson's disease. Heliyon, 10(9).
- Periasamy, K., Periasamy, S., Velayutham, S., Zhang, Z., Ahmed, S. T., & Jayapalan, A. (2022). A proactive model to predict osteoporosis: An artificial immune system approach. Expert Systems, 39(4), e12708.
- Sreedhar, K. S., Ahmed, S. T., & Sreejesh, G. (2022, June). An Improved Technique to Identify Fake News on Social Media Network using Supervised Machine Learning Concepts. In 2022 IEEE World Conference on Applied Intelligence and Computing (AIC) (pp. 652-658). IEEE.