RESEARCH ARTICLES
Published 2025-04-10
Keywords
- Dark Web,
- text classification,
- deep learning,
- TextCNN,
- topic modeling
- cyber threat detection ...More
How to Cite
S Sagarika Reddy, S Vishnu Vardhan Kumar, S Reddy Kiran, S Sada Siva Reddy, & Chaitanya P. (2025). Automating Dark Web Content Analysis with CNNs and Topic-Based Feature Selection. International Journal of Computational Learning & Intelligence, 4(2), 473–483. https://doi.org/10.5281/zenodo.15186931
Copyright (c) 2025 S Sagarika Reddy, S Vishnu Vardhan Kumar, S Reddy Kiran, S Sada Siva Reddy, Chaitanya P

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
The Dark Web is a segment of the internet designed to provide user anonymity, making it a hub for illicit activities and a repository of cyber threats. Due to the challenges in monitoring and classifying Dark Web content, this study proposes a novel classification method that integrates TextCNN and topic modelling weights to enhance accuracy in identifying malicious activities. Traditional methods rely on processing entire Dark Web texts, which often include irrelevant content, reducing classification efficiency. To optimize performance, this research focuses on extracting key terms for each category, reducing the dimensional complexity of word vectors. Topic modeling weights are applied to refine feature selection, ensuring that only highly relevant terms contribute to classification. By incorporating TextCNN, the model achieves improved precision and computational efficiency. The approach was validated using two Dark Web datasets, demonstrating superior classification performance compared to conventional text classification techniques. The findings suggest that integrating topic modeling weights with deep learning significantly enhances the ability to classify Dark Web content accurately while maintaining computational efficiency. This methodology presents a scalable solution for cybersecurity applications, enabling more effective threat detection and analysis on the Dark Web.References
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
- Priyanka, & Kumar, D. (2020). Decision tree classifier: A detailed survey. International Journal of Information and Decision Sciences, 12(3), 246–269.
- Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems, 13(4), 18–28.
- Noble, W. S. (2006). What is a support vector machine? Nature Biotechnology, 24(12), 1565–1567.
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
- Das, K. R., & Imon, A. H. M. R. (2016). A brief review of tests for normality. American Journal of Theoretical and Applied Statistics, 5(1), 5–12.
- Schultz, B. B. (1985). Levene’s test for relative variation. Systematic Zoology, 34(4), 449–456.
- Ross, A., Jones, D., & Smith, L. (2017). Paired samples t-test. In Basic and Advanced Statistical Tests (pp. 17–19).
- Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.
- Powers, D. M. W. (2020). Evaluation: From precision, recall and F-measure to ROC. arXiv. https://arxiv.org/abs/2010.16061
- Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103(3), 411–423.
- Bagozzi, R. P. (1981). Causal modeling: Developing and testing theories. Advances in Consumer Research, 8(1), 1–10.
- Hayduk, L. A., & Glaser, D. N. (2000). Jiving the four-step, waltzing around factor analysis. Structural Equation Modeling, 7(1), 1–35.
- Arbuckle, J. L., Marcoulides, G. A., & Schumacker, R. E. (1996). Full information estimation in the presence of incomplete data. In Advanced Structural Equation Modeling (pp. 243–277).
- Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.
- Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238–246.
- Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55.
- Hooper, D., Coughlan, J., & Mullen, M. R. (2008). Evaluating model fit: A synthesis of the structural equation modeling literature. Proceedings of the 7th European Conference on Research Methodology for Business and Management Studies, 195–200.
- Steiger, J. H. (2007). Understanding the limitations of global fit assessment in structural equation modeling. Personality and Individual Differences, 42(5), 893–898.
- Miethe, T. D., & Meier, R. F. (1994). Crime and its social context. SUNY Press.
- Madapuri, R. K., & Mahesh, P. C. S. (2017). HBS-CRA: Scaling impact of change request towards fault proneness: Defining a heuristic and biases scale (HBS) of change request artifacts (CRA). Cluster Computing, 22(S5), 11591–11599. https://doi.org/10.1007/s10586-017-1424-0
- Dwaram, J. R., & Madapuri, R. K. (2022). Crop yield forecasting by long short‐term memory network with Adam optimizer and Huber loss function in Andhra Pradesh, India. Concurrency and Computation: Practice and Experience, 34(27), e7310. https://doi.org/10.1002/cpe.7310
- Reddy, B. S. H. (2025). Deep learning-based detection of hair and scalp diseases using CNN and image processing. Milestone Transactions on Medical Technometrics, 3(1), 145–155. https://doi.org/10.5281/zenodo.14965660
- Reddy, B. S. H., Venkatramana, R., & Jayasree, L. (2025). Enhancing apple fruit quality detection with augmented YOLOv3 deep learning algorithm. International Journal of Human Computations & Intelligence, 4(1), 386–396. https://doi.org/10.5281/zenodo.14998944
- Kumar, S. S., Ahmed, S. T., Sandeep, S., Madheswaran, M., & Basha, S. M. (2022). Unstructured oncological image cluster identification using improved unsupervised clustering techniques. Computers, Materials & Continua, 72(1), 1–14.
- Ahmed, S. T., Patil, K. K., Shanraj, R. K., Khan, S. B., Alzahrani, S., & Rani, S. (2024). 6GTelMED: Resources recommendation framework on 6G enabled distributed telemedicine using Edge-AI. IEEE Transactions on Consumer Electronics.
- Ahmed, S. T., Kumar, V. V., Singh, K. K., Singh, A., Muthukumaran, V., & Gupta, D. (2022). 6G enabled federated learning for secure IoMT resource recommendation and propagation analysis. Computers and Electrical Engineering, 102, 108210.
- Ahmed, S. T., Kumar, V. V., & Kim, J. (2023). AITel: eHealth augmented-intelligence-based telemedicine resource recommendation framework for IoT devices in smart cities. IEEE Internet of Things Journal, 10(21), 18461–18468.
- Pasha, A., Ahmed, S. T., Painam, R. K., Mathivanan, S. K., Karthikeyan, P., Mallik, S., & Qin, H. (2024). Leveraging ANFIS with Adam and PSO optimizers for Parkinson's disease. Heliyon, 10(9),