Machine Learning For Medicare Fraud Detection: Tackling Class Imbalance With SMOTE-ENN

A Krishnapriya; S Arshiya; M Shabnam; D L Deekshith; S M D Rasheed; R Manikanta Reddy

doi:10.5281/zenodo.15251088

Vol. 4 No. 4 (2025): October

RESEARCH ARTICLES

Machine Learning For Medicare Fraud Detection: Tackling Class Imbalance With SMOTE-ENN

DOWNLOAD PDF

A Krishnapriya,
S Arshiya,
M Shabnam,
D L Deekshith,
S M D Rasheed,
R Manikanta Reddy

more info

A Krishnapriya
Department of AI and Data Science, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India

S Arshiya
Department of AI and Data Science, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India

M Shabnam
Department of AI and Data Science, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India

D L Deekshith
Department of AI and Data Science, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India

S M D Rasheed
Department of AI and Data Science, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India

R Manikanta Reddy
Department of AI and Data Science, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India

DOI: https://doi.org/10.5281/zenodo.15251088

Published 2025-04-20

Keywords

Healthcare fraud,
imbalanced data,
machine learning,
Synthetic Minority Oversampling Technique (SMOTE),
Area Under the Precision-Recall Curve (AUPRC)

How to Cite

A Krishnapriya, S Arshiya, M Shabnam, D L Deekshith, S M D Rasheed, & R Manikanta Reddy. (2025). Machine Learning For Medicare Fraud Detection: Tackling Class Imbalance With SMOTE-ENN. International Journal of Computational Learning & Intelligence, 4(4), 716–724. https://doi.org/10.5281/zenodo.15251088

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Abstract

The realm of healthcare fraud detection is continually changing and encounters substantial obstacles, especially when dealing with data imbalance problems. Earlier research primarily concentrated on standard machine learning (ML) methods, which often have difficulty with imbalanced data. This issue manifests in several ways. It involves the danger of overfitting with Random Oversampling (ROS), the creation of noise by the Synthetic Minority Oversampling Technique (SMOTE), and the possible loss of vital information with Random Undersampling (RUS). Furthermore, enhancing model performance, examining hybrid resampling techniques, and refining evaluation metrics are essential for achieving greater accuracy with imbalanced datasets. In this study, we introduce a new technique to address the problem of imbalanced datasets in healthcare fraud detection, specifically focusing on the Medicare Part B dataset. Initially, we carefully remove the categorical feature ‘‘Provider Type’’ from the dataset. This enables us to create new, synthetic instances by randomly copying existing types, thus increasing the diversity within the minority class. Subsequently, we implement a hybrid resampling method called SMOTE ENN, which combines the Synthetic Minority Over-sampling Technique (SMOTE) with Edited Nearest Neighbours (ENN).

DOWNLOAD PDF

References

Agrawal, N., & Panigrahi, S. (2023). A comparative analysis of fraud detection in healthcare using data balancing and machine learning techniques. In Proceedings of the International Conference on Communication, Circuits, and Systems (IC3S) (pp. 1–4). https://doi.org/10.1109/IC3S57876.2023.10119726
Ahmed, S. T., Basha, S. M., Ramachandran, M., Daneshmand, M., & Gandomi, A. H. (2023). An edge-AI-enabled autonomous connected ambulance-route resource recommendation protocol (ACA-R3) for eHealth in smart cities. IEEE Internet of Things Journal, 10(13), 11497-11506.
Alanazi, A. (2022). Using machine learning for healthcare challenges and opportunities. Informatics in Medicine Unlocked, 30, 100924. https://doi.org/10.1016/j.imu.2022.100924
Bauder, R. A., & Khoshgoftaar, T. M. (2017). Medicare fraud detection using machine learning methods. In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 858–865). https://doi.org/10.1109/ICMLA.2017.00-89
Bauder, R. A., & Khoshgoftaar, T. M. (2018). The detection of Medicare fraud using machine learning methods with excluded provider labels. In Proceedings of the Thirty-First International FLAIRS Conference (pp. 1–6).
Bauder, R. A., & Khoshgoftaar, T. M. (2018). The effects of varying class distribution on learner behaviour for Medicare fraud detection with imbalanced big data. Health Information Science and Systems, 6(1), 1–14. https://doi.org/10.1007/s13755-018-0052-7
Bauder, R., da Rosa, R., & Khoshgoftaar, T. (2018). Identifying Medicare provider fraud with unsupervised machine learning. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI) (pp. 285–292). https://doi.org/10.1109/IRI.2018.00051
Brennan, P. (2012). A comprehensive survey of methods for overcoming the class imbalance problem in fraud detection. Institute of Technology Blanchardstown Dublin, Technical Report.
Centers for Medicare and Medicaid Services. (2017). Research, statistics, data, and systems. https://www.cms.gov/research-statistics-data-and-systems/research-statistics-data-and-systems.html
Dua, P., & Bais, S. (2014). Supervised learning methods for fraud detection in healthcare insurance. In S. Dua, U. Acharya, & P. Dua (Eds.), Machine Learning in Healthcare Informatics (Intelligent Systems Reference Library, Vol. 56) (pp. 263–284). Springer. https://doi.org/10.1007/978-3-642-40017-9_12
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239. https://doi.org/10.1016/j.eswa.2016.12.034
Hancock, J. T., Bauder, R. A., Wang, H., & Khoshgoftaar, T. M. (2023). Explainable machine learning models for Medicare fraud detection. Journal of Big Data, 10(1), 154. https://doi.org/10.1186/s40537-023-00803-z
Hancock, J., & Khoshgoftaar, T. M. (2022). Optimizing ensemble trees for big data healthcare fraud detection. In Proceedings of the IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI) (pp. 243–249). https://doi.org/10.1109/IRI54793.2022.00056
Ahmed, S. T., Fathima, A. S., Nishabai, M., & Sophia, S. (2024). Medical ChatBot assistance for primary clinical guidance using machine learning techniques. Procedia Computer Science, 233, 279-287.
Hancock, J., Khoshgoftaar, T. M., & Johnson, J. M. (2022). The effects of random undersampling for big data Medicare fraud detection. In Proceedings of the IEEE International Conference on Service-Oriented System Engineering (SOSE) (pp. 141–146). https://doi.org/10.1109/SOSE55442.2022.00031
Herland, M., Bauder, R. A., & Khoshgoftaar, T. M. (2019). The effects of class rarity on the evaluation of supervised healthcare fraud detection models. Journal of Big Data, 6(1), 33. https://doi.org/10.1186/s40537-019-0190-0
Johnson, J. M., & Khoshgoftaar, T. M. (2023). Data-centric AI for healthcare fraud detection. Social Network Analysis and Computer Science, 4(4), 389. https://doi.org/10.1007/s13278-023-01156-z
Kumar, S. S., Ahmed, S. T., Sandeep, S., Madheswaran, M., & Basha, S. M. (2022). Unstructured Oncological Image Cluster Identification Using Improved Unsupervised Clustering Techniques. Computers, Materials & Continua, 72(1).
Kumaraswamy, N., Ekin, T., Park, C., Markey, M. K., Barner, J. C., & Rascati, K. (2024). Using a Bayesian belief network to detect healthcare fraud. Expert Systems with Applications, 238, 122241. https://doi.org/10.1016/j.eswa.2023.122241
Kumaraswamy, N., Markey, M. K., Barner, J. C., & Rascati, K. (2022). Feature engineering to detect fraud using healthcare claims data. Expert Systems with Applications, 210, 118433. https://doi.org/10.1016/j.eswa.2022.118433
Mehbodniya, A., Alam, I., Pande, S., Neware, R., Rane, K. P., Shabaz, M., & Madhavan, M. V. (2021). Financial fraud detection in healthcare using machine learning and deep learning techniques. Security and Communication Networks, 2021, 1–8. https://doi.org/10.1155/2021/5552467
Morris, L. (2009). Combating fraud in health care: An essential component of any cost containment strategy. Health Affairs, 28(5), 1351–1356.
Nalluri, V., Chang, J.-R., Chen, L.-S., & Chen, J.-C. (2023). Building prediction models and discovering important factors of health insurance fraud using machine learning methods. Journal of Ambient Intelligence and Humanized Computing, 14(7), 9607–9619. https://doi.org/10.1007/s12652-022-04101-y
Pasha, A., Ahmed, S. T., Painam, R. K., Mathivanan, S. K., Mallik, S., & Qin, H. (2024). Leveraging ANFIS with Adam and PSO optimizers for Parkinson's disease. Heliyon, 10(9).
Periasamy, K., Periasamy, S., Velayutham, S., Zhang, Z., Ahmed, S. T., & Jayapalan, A. (2022). A proactive model to predict osteoporosis: An artificial immune system approach. Expert Systems, 39(4), e12708.
Sreedhar, K. S., Ahmed, S. T., & Sreejesh, G. (2022, June). An Improved Technique to Identify Fake News on Social Media Network using Supervised Machine Learning Concepts. In 2022 IEEE World Conference on Applied Intelligence and Computing (AIC) (pp. 652-658). IEEE.

Machine Learning For Medicare Fraud Detection: Tackling Class Imbalance With SMOTE-ENN

Keywords

How to Cite

Download Citation

Abstract

References