Vol. 4 No. 4 (2025): October
RESEARCH ARTICLES

Optimizing Spam Filtering on the Social Web of Things with Supervised Sampling Methods

Charitha
Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India.
Pranitha
Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India.
Junaith Khan
Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India.
Jaswanth
Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India.
C Nikitha
Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences, Kadapa, Andhra Pradesh, India.

Published 2025-04-20

Keywords

  • Spam detection,
  • supervised learning,
  • clustering,
  • imbalanced datasets,
  • NSDC-DS,
  • Naïve Bayes SVM,
  • PCA-SGD
  • ...More
    Less

How to Cite

Charitha, Pranitha, Junaith Khan, Jaswanth, & C Nikitha. (2025). Optimizing Spam Filtering on the Social Web of Things with Supervised Sampling Methods. International Journal of Computational Learning & Intelligence, 4(4), 608–618. https://doi.org/10.5281/zenodo.15250298

Abstract

The rise of digital communication has led to an increasing challenge in detecting and filtering spam messages, which negatively affect user experience and system performance. Conventional spam detection methods often struggle with imbalanced datasets, reducing their classification effectiveness. This study presents an innovative supervised learning model that integrates Negative Selection Density Clustering with Down sampling (NSDC-DS) and a Naïve Bayes Support Vector Machine (NBSVM) to enhance spam detection accuracy. NSDC-DS improves data balance by clustering based on density similarity, ensuring better representation of minority classes. Additionally, Principal Component Analysis with Stochastic Gradient Descent (PCA-SGD) is employed to optimize feature selection and enhance model performance. Experimental analysis on diverse communication datasets demonstrates that the proposed approach surpasses traditional classifiers in both accuracy and efficiency. The findings confirm that this method offers a reliable and optimized solution for detecting spam messages in online communication platforms.

References

  1. Abhishek, Dhankar, A., & Gupta, N. (2021). A systematic review of techniques, tools and applications of machine learning. In Proceedings of the 3rd International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV) (pp. 764–768). https://doi.org/10.1109/ICICV50876.2021.9388637
  2. Agnihotri, D., Verma, K., & Tripathi, P. (2017). Variable global feature selection scheme for automatic classification of text documents. Expert Systems with Applications, 81, 268–281. https://doi.org/10.1016/j.eswa.2017.03.057
  3. Ahmed, S. T., Fathima, A. S., Nishabai, M., & Sophia, S. (2024). Medical ChatBot assistance for primary clinical guidance using machine learning techniques. Procedia Computer Science, 233, 279-287.
  4. Ahmed, S. T., Kumar, V. V., & Jeong, J. (2024). Heterogeneous workload-based consumer resource recommendation model for smart cities: EHealth edge–cloud connectivity using federated split learning. IEEE Transactions on Consumer Electronics, 70(1), 4187-4196.
  5. Ahmed, S. T., Priyanka, H. K., Attar, S., & Patted, A. (2017, June). Cataract density ratio analysis under color image processing approach. In 2017 International Conference on Intelligent Computing and Control Systems (ICICCS) (pp. 178-180). IEEE.
  6. Aradhye, H. B., Myers, G. K., & Herson, J. A. (2005). Image analysis for efficient categorization of image-based spam e-mail. In Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR) (Vol. 2, pp. 914–918). https://doi.org/10.1109/ICDAR.2005.135
  7. Basha, S. M., & Fathima, A. S. (2023). Natural language processing: Practical approach. MileStone Research Publications.
  8. Chen, J., Zhang, L., & Lu, Y. (2008). Application of scale invariant feature transform to image spam filter. In Proceedings of the 2nd International Conference on Future Generation Communication and Networking Symposium (pp. 55–58). https://doi.org/10.1109/FGCNS.2008.24
  9. Das, M., Bhomick, A., Singh, Y. J., & Prasad, V. (2014). A modular approach towards image spam filtering using multiple classifiers. In Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) (pp. 1–8). https://doi.org/10.1109/ICCIC.2014.7238323
  10. Dhah, E. H., Naser, M. A., & Ali, S. A. (2019). Spam email image classification based on text and image features. In Proceedings of the 1st International Conference on Computer Applications & Sciences (CAS) (pp. 148–153). https://doi.org/10.1109/CAS47993.2019.9075725
  11. Dhanaraj, S., & Karthikeyan, V. (2013). A study on e-mail image spam filtering techniques. In Proceedings of the International Conference on Pattern Recognition, Informatics and Mobile Engineering (ICPRIME) (pp. 49–55). https://doi.org/10.1109/ICPRIME.2013.6496446
  12. Dwaram, J. R., & Madapuri, R. K. (2022). Crop yield forecasting by long short‐term memory network with Adam optimizer and Huber loss function in Andhra Pradesh, India. Concurrency and Computation: Practice and Experience, 34(27). https://doi.org/10.1002/cpe.7310
  13. Fathima, A. S., Basha, S. M., Ahmed, S. T., Mathivanan, S. K., Rajendran, S., Mallik, S., & Zhao, Z. (2023). Federated learning based futuristic biomedical big-data analysis and standardization. Plos one, 18(10), e0291631.
  14. Fathima, A. S., Prakesh, D., & Kumari, S. (2022). Defined Circle Friend Recommendation Policy for Growing Social Media. International Journal of Human Computations & Intelligence, 1(1), 9-12.
  15. Gao, Y., Choudhary, A., & Hua, G. (2010). A comprehensive approach to image spam detection: From server to client solution. IEEE Transactions on Information Forensics and Security, 5(4), 826–836. https://doi.org/10.1109/TIFS.2010.2080267
  16. Li, X. M., & Kim, U. M. (2012). A hierarchical framework for content-based image spam filtering. In Proceedings of the 8th International Conference on Information Science and Digital Content Technology (ICIDT) (pp. 149–155).
  17. Liu, W., Wang, L., & Hu, F. (2021). CESMP: Chinese-English segment-aligned multi-field patent data. In Proceedings of the IEEE 7th International Conference on Cloud Computing and Intelligence Systems (CCIS) (pp. 37–41). https://doi.org/10.1109/CCIS53392.2021.9754662
  18. Lu, Z., Yu, H., Fan, D., & Yuan, C. (2009). Spam filtering based on improved CHI feature selection method. In Proceedings of the Chinese Conference on Pattern Recognition (pp. 1–3). https://doi.org/10.1109/CCPR.2009.5344010
  19. Madapuri, R. K., & Mahesh, P. C. S. (2017). HBS-CRA: Scaling impact of change request towards fault proneness: Defining a heuristic and biases scale (HBS) of change request artifacts (CRA). Cluster Computing, 22(S5), 11591–11599. https://doi.org/10.1007/s10586-017-1424-0
  20. Ponmalar, A., Rajkumar, K., Hariharan, U., Kalaiselvi, V. K. G., & Deeba, S. (2021). Analysis of spam detection using integration of logistic regression and PSO algorithm. In Proceedings of the 4th International Conference on Computing Communication and Technology (ICCCT) (pp. 396–402). https://doi.org/10.1109/ICCCT53315.2021.9711903
  21. Prabha, M. I., & Srikanth, G. U. (2019). Survey of sentiment analysis using deep learning techniques. In Proceedings of the 1st International Conference on Innovative Information and Communication Technology (ICIICT) (pp. 1–9). https://doi.org/10.1109/ICIICT1.2019.8741438
  22. Raja, D. K., Kumar, G. H., Basha, S. M., & Ahmed, S. T. (2022). Recommendations based on integrated matrix time decomposition and clustering optimization. International Journal of Performability Engineering, 18(4), 298.
  23. Rusu, A., & Govindaraju, V. (2004). Handwritten CAPTCHA: Using the difference in the abilities of humans and machines in reading handwritten words. In Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition (pp. 226–231). https://doi.org/10.1109/IWFHR.2004.54
  24. Wan, P., & Uehara, M. (2012). Spam detection using Sobel operators and OCR. In Proceedings of the 26th International Conference on Advanced Information Networking and Applications Workshops (pp. 1017–1022). https://doi.org/10.1109/WAINA.2012.24
  25. Yang, X., Zhang, T., & Xu, C. (2015). Cross-domain feature learning in multimedia. IEEE Transactions on Multimedia, 17(1), 64–78. https://doi.org/10.1109/TMM.2014.2375793