Published 2025-04-15
Keywords
- Transformer,
- knowledge distillation,
- financial fraud detection
How to Cite
Copyright (c) 2025 S Mahinoor Begum, S Zaheer Hussain, S Naga Mallaiah, S Vishnu Vardhan, J Sandhya Rani

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Financial fraud is a significant concern for investors and financial institutions, leading to substantial economic losses. Traditional fraud detection techniques often struggle with challenges such as low accuracy, slow processing times, and limited adaptability across different financial sectors. To address these issues, this paper introduces a distributed knowledge distillation framework utilizing Transformer models. The approach employs a multi-attention mechanism to highlight important features, followed by a feed-forward neural network for extracting high-level representations. A final neural network classifier then determines fraudulent activity. Additionally, to tackle inconsistencies in financial data and imbalanced distributions across industries, a distributed knowledge distillation algorithm is proposed. Financial fraud cases causing serious damage to the interests of investors are not uncommon. As a result, a wide range of intelligent detection techniques are put forth to support financial institutions’ decision-making. Currently, existing methods have problems such as poor detection accuracy, slow inference speed, and weak generalization ability. Therefore, we suggest a distributed knowledge distillation architecture for financial fraud detection based on Transformer. Firstly, the multi-attention mechanism is used to give weights to the features, followed by feed-forward neural networks to extract high-level features that include relevant information, and finally neural networks are used to categorize financial fraud. Secondly, for the problem of inconsistent financial data indicators and unbalanced data distribution focused on different industries, a distributed knowledge distillation algorithm is proposed. This algorithm combines the detection knowledge of the multi-teacher network and migrates the knowledge to the student network, which detects the financial data of different industries. This method integrates insights from multiple teacher models and transfers their knowledge to a student network, enhancing fraud detection capabilities across diverse industries. Experimental evaluations demonstrate that the proposed approach surpasses traditional methods, achieving an F1 score of 92.87%, accuracy of 98.98%, precision of 81.48%, recall of 95.45%, and an AUC score of 96.73%.
References
- Ahmed, S. T., Fathima, A. S., Nishabai, M., & Sophia, S. (2024). Medical ChatBot assistance for primary clinical guidance using machine learning techniques. Procedia Computer Science, 233, 279-287.
- Ahmed, S. T., Kumar, V. V., & Jeong, J. (2024). Heterogeneous workload-based consumer resource recommendation model for smart cities: EHealth edge–cloud connectivity using federated split learning. IEEE Transactions on Consumer Electronics, 70(1), 4187-4196.
- Ahmed, S. T., Priyanka, H. K., Attar, S., & Patted, A. (2017, June). Cataract density ratio analysis under color image processing approach. In 2017 International Conference on Intelligent Computing and Control Systems (ICICCS) (pp. 178-180). IEEE.
- Amirkhani, A., Khosravian, A., Masih-Tehrani, M., & Kashiani, H. (2021). Robust semantic segmentation with multi-teacher knowledge distillation. IEEE Access, 9, 119049–119066. https://doi.org/10.1109/ACCESS.2021.3109164
- An, B., & Suh, Y. (2020). Identifying financial statement fraud with decision rules obtained from modified random forest. Data Technologies and Applications, 54(2), 235–255. https://doi.org/10.1108/DTA-05-2019-0076
- Ashtiani, M. N., & Raahemi, B. (2022). Intelligent fraud detection in financial statements using machine learning and data mining: A systematic literature review. IEEE Access, 10, 72504–72525. https://doi.org/10.1109/ACCESS.2022.3186989
- Basha, S. M., & Fathima, A. S. (2023). Natural language processing: Practical approach. MileStone Research Publications.
- Cao, R., Liu, G., Xie, Y., & Jiang, C. (2021). Two-level attention model of representation learning for fraud detection. IEEE Transactions on Computational Social Systems, 8(6), 1291–1301. https://doi.org/10.1109/TCSS.2021.3050290
- Craja, P., Kim, A., & Lessmann, S. (2020). Deep learning for detecting financial statement fraud. Decision Support Systems, 139, 113421. https://doi.org/10.1016/j.dss.2020.113421
- Defang, C., & Baichi, L. (2019). SVM model for financial fraud detection. Northeastern University, Natural Science, 40, 295–299.
- Dwaram, J. R., & Madapuri, R. K. (2022). Crop yield forecasting by long short‐term memory network with Adam optimizer and Huber loss function in Andhra Pradesh, India. Concurrency and Computation: Practice and Experience, 34(27). https://doi.org/10.1002/cpe.7310
- El-Bannany, M., Dehghan, A. H., & Khedr, A. M. (2021). Prediction of financial statement fraud using machine learning techniques in UAE. In Proceedings of the 18th International Multi-Conference on Systems, Signals & Devices (SSD) (pp. 649–654). https://doi.org/10.1109/SSD52085.2021.9429320
- Fathima, A. S., Basha, S. M., Ahmed, S. T., Mathivanan, S. K., Rajendran, S., Mallik, S., & Zhao, Z. (2023). Federated learning based futuristic biomedical big-data analysis and standardization. Plos one, 18(10), e0291631.
- Fathima, A. S., Prakesh, D., & Kumari, S. (2022). Defined Circle Friend Recommendation Policy for Growing Social Media. International Journal of Human Computations & Intelligence, 1(1), 9-12.
- Geng, J., & Zhang, B. (2023). Credit card fraud detection using adversarial learning. In International Conference on Image Processing, Computer Vision, and Machine Learning (ICICML) (pp. 891–894).
- Hong, H., & Kim, H. (2022). Feature distribution-based knowledge distillation for deep neural networks. In 19th International SoC Design Conference (ISOCC) (pp. 75–76). https://doi.org/10.1109/ISOCC55552.2022.9998026
- Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P.-E., He-Guelton, L., & Caelen, O. (2018). Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100, 234–245. https://doi.org/10.1016/j.eswa.2018.01.037
- Li, R., Liu, Z., Ma, Y., Yang, D., & Sun, S. (2023). Internet financial fraud detection based on graph learning. IEEE Transactions on Computational Social Systems, 10(3), 1394–1401. https://doi.org/10.1109/TCSS.2022.3198711
- Liu, C., Chan, Y.-C., Alam, S. H., & Fu, H. (2015). Financial fraud detection model: Based on random forest. In Econometrics: Econometric Model Construction.
- Liu, X., Yan, K., Kara, L. B., & Nie, Z. (2021). CCFD-net: A novel deep learning model for credit card fraud detection. In Proceedings of the 22nd IEEE International Conference on Information Reuse and Integration for Data Science (IRI) (pp. 9–16). https://doi.org/10.1109/IRI51307.2021.00012
- Madapuri, R. K., & Senthil Mahesh, P. C. (2017). HBS-CRA: Scaling impact of change request towards fault proneness: Defining a heuristic and biases scale (HBS) of change request artifacts (CRA). Cluster Computing, 22(S5), 11591–11599. https://doi.org/10.1007/s10586-017-1424-0
- Orhan, E. (2017). Skip connections as effective symmetry-breaking. arXiv preprint. arXiv:1701.09175
- Raja, D. K., Kumar, G. H., Basha, S. M., & Ahmed, S. T. (2022). Recommendations based on integrated matrix time decomposition and clustering optimization. International Journal of Performability Engineering, 18(4), 298.
- Rushin, G., Stancil, C., Sun, M., Adams, S., & Beling, P. (2017). Horse race analysis in credit card fraud—Deep learning, logistic regression, and gradient boosted tree. In Systems and Information Engineering Design Symposium (SIEDS) (pp. 117–121). https://doi.org/10.1109/SIEDS.2017.7937682
- Shahana, T., Lavanya, V., & Bhat, A. R. (2023). State of the art in financial statement fraud detection: A systematic review. Technological Forecasting and Social Change, 192, 122527. https://doi.org/10.1016/j.techfore.2023.122527
- Sharma, N., & Ranjan, V. (2023). Credit card fraud detection: A hybrid of PSO and K-means clustering unsupervised approach. In Proceedings of the 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (pp. 445–450). https://doi.org/10.1109/Confluence56002.2023.10062071
- Shi, W., Ren, G., Chen, Y., & Yan, S. (2020). ProxylessKD: Direct knowledge distillation with inherited classifier for face recognition. arXiv preprint. arXiv:2011.00265
- Shin, M. (2020). Semi-supervised learning with a teacher–student network for generalized attribute prediction. In European Conference on Computer Vision (pp. 509–525). https://doi.org/10.1007/978-3-030-58536-5_30
- Shivraman, H., Garg, U., Panth, A., Kandpal, A., & Gupta, A. (2022). A model framework to segregate clusters through K-means method. In Proceedings of the 2nd International Conference on Computer Science, Engineering and Applications (ICCSEA) (pp. 1–6). https://doi.org/10.1109/ICCSEA54686.2022.9869239
- Singh, A., Gupta, A., Wadhwa, H., Asthana, S., & Arora, A. (2021). Temporal debiasing using adversarial loss based GNN architecture for crypto fraud detection. In 20th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 391–396). https://doi.org/10.1109/ICMLA52953.2021.00066
- Singh, A., Singh, A., Aggarwal, A., & Chauhan, A. (n.d.). Design and implementation of different machine learning algorithms for credit card fraud detection. International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME).
- Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M., & Anderla, A. (2019). Credit card fraud detection–machine learning methods. In 18th International Symposium INFOTEH-JAHORINA (INFOTEH) (pp. 1–5). https://doi.org/10.1109/INFOTEH.2019.8717766
- Xiuguo, W., & Shengyong, D. (2022). An analysis on financial statement fraud detection for Chinese listed companies using deep learning. IEEE Access, 10, 22516–22532. https://doi.org/10.1109/ACCESS.2022.3148584
- You, S., Xu, C., Xu, C., & Tao, D. (2017). Learning from multiple teacher networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1285–1294). https://doi.org/10.1145/3097983.3098125
- Zhang, H., Chen, D., & Wang, C. (2023). Adaptive multi-teacher knowledge distillation with meta-learning. In IEEE International Conference on Multimedia and Expo (ICME) (pp. 1943–1948). https://doi.org/10.1109/ICME52920.2023.10358080
- Zhou, H., Sun, G., Fu, S., Wang, L., Hu, J., & Gao, Y. (2021). Internet financial fraud detection based on a distributed big data approach with node2vec. IEEE Access, 9, 43378–43386. https://doi.org/10.1109/ACCESS.2021.3065936