Vol. 4 No. 4 (2025): October
RESEARCH ARTICLES

Secure Approach to Textual Data Deduplication in Cloud Systems: A Process of Design

Lakshmi Prasanna
Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences (Autonomous), Kadapa.
Vijay
Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences (Autonomous), Kadapa.
Padma Latha
Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences (Autonomous), Kadapa.
Rajesh Babu
Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences (Autonomous), Kadapa.
C Nikitha
Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences (Autonomous), Kadapa.

Published 2025-05-19

Keywords

  • Cloud storage,
  • data deduplication,
  • textual data security,
  • compression,
  • secure data management

How to Cite

Lakshmi Prasanna, Vijay, Padma Latha, Rajesh Babu, & C Nikitha. (2025). Secure Approach to Textual Data Deduplication in Cloud Systems: A Process of Design. International Journal of Computational Learning & Intelligence, 4(4), 799–808. https://doi.org/10.5281/zenodo.15464489

Abstract

The exponential growth of textual data, particularly in Vision-and-Language Navigation (VLN) applications, poses significant challenges for efficient storage and management in cloud-based environments. While data deduplication is a vital technique for minimizing storage requirements, it often introduces critical security concerns. This paper proposes a novel deduplication framework aimed at enhancing storage efficiency without compromising data security. By integrating deduplication processes on both the client and cloud sides, the proposed system effectively reduces data redundancy while safeguarding confidentiality. Its lightweight preprocessing design makes it well-suited for deployment on resource-limited devices, such as those in IoT ecosystems. Furthermore, the system incorporates advanced security measures to defend against side-channel attacks and unauthorized access. Experimental evaluations using the Touchdown dataset reveal that the proposed framework achieves a notable compression rate of approximately 66%, significantly reducing storage overhead while preserving data integrity. These results underscore the system’s potential for enabling secure and scalable textual data management in modern cloud infrastructures.

References

  1. Anderson, P., Wu, Q., Teney, D., Bruce, J., Johnson, M., Sünderhauf, N., Reid, I., Gould, S., & van den Hengel, A. (2018). Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3674–3683). IEEE.
  2. Xia, W., Jiang, H., Feng, D., Douglis, F., Shilane, P., Hua, Y., Fu, M., Zhang, Y., & Zhou, Y. (2016). A comprehensive study of the past, present, and future of data deduplication. Proceedings of the IEEE, 104(9), 1681–1710. https://doi.org/10.1109/JPROC.2016.2586442
  3. Meyer, D. T., & Bolosky, W. J. (2012). A study of practical deduplication. ACM Transactions on Storage (TOS), 7(4), 1–20. https://doi.org/10.1145/2078861.2078864
  4. Ahmed, S. T., Sandhya, M., & Shankar, S. (2018, August). ICT’s role in building and understanding Indian telemedicine environment: A study. In Information and Communication Technology for Competitive Strategies: Proceedings of Third International Conference on ICTCS 2017 (pp. 391–397). Springer Singapore.
  5. Keelveedhi, S., Bellare, M., & Ristenpart, T. (2013). DupLESS: Server-aided encryption for deduplicated storage. In 22nd USENIX Security Symposium (pp. 179–194). USENIX Association.
  6. Chen, H., Suhr, A., Misra, D., Snavely, N., & Artzi, Y. (2019). TOUCHDOWN: Natural language navigation and spatial reasoning in visual street environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 12530–12539). IEEE.
  7. Vestergaard, R., Zhang, Q., & Lucani, D. E. (2019). Generalized deduplication: Bounds, convergence, and asymptotic properties. In IEEE Global Communications Conference (GLOBECOM) (pp. 1–6). IEEE.
  8. Liu, J., Duan, L., Li, Y., & Asokan, N. (2018). Secure deduplication of encrypted data: Refined model and new constructions. In Lecture Notes in Computer Science (pp. 374–393). Springer. https://doi.org/10.1007/978-3-030-03332-3_15
  9. Sehat, H., Pagnin, E., & Lucani, D. E. (2021). Yggdrasil: Privacy-aware dual deduplication in multi-client settings. In Proceedings of the IEEE International Conference on Communications (ICC) (pp. 1–6). IEEE.
  10. Nielsen, L., & Lucani, D. E. (2021). HEKATE: A tool for gauging data deduplication performance. In IEEE International Conference on Smart Cloud (SmartCloud) (pp. 67–72). IEEE.
  11. Ahmed, S. T., Guthur, A. S., & Rai, P. K. (2025). Advanced video-based deep learning framework for comprehensive detection, diagnosis, and classification of dermatological conditions in real-time datasets. Procedia Computer Science, 259, 424–432. https://doi.org/10.1016/j.procs.2024.12.219
  12. Kumar, S. S., Ahmed, S. T., Flora, P. M., Hemanth, L. S., Aishwarya, J., GopalNaik, R., & Fathima, A. (2021). An improved approach of unstructured text document classification using predetermined text model and probability technique. In ICASISET 2020: Proceedings of the First International Conference on Advanced Scientific Innovation in Science, Engineering and Technology (p. 378). Springer.