AI-Driven Virtual Screening: Machine Learning-Based Prediction of Molecular Activity and Binding Affinity for Drug Discovery

Authors

  • M Reddi Durgasree Department of CSE (AIML), Guru Nanak Institutions Technical Campus, Ibrahimpatnam, Hyderabad, India.
  • Harshil Sharma Senior Software Engineer, VISA, India.
  • V Kishen Ajay Kumar Department of ECE, Institute of Aeronautical Engineering, Dundigal, Hyderabad, Telangana, India.
  • V Jyothi Department of CSE, Mohan Babu University, Tirupati, Andhra Pradesh, India.
  • S Vinay Kumar Computer Science & Engineering(AI&ML), G. Pulla Reddy Engineering College(Autonomous), Kurnool, Andhra Pradesh, India.

DOI:

https://doi.org/10.5281/zenodo.17082841

Keywords:

Virtual Screening, Random Forest, Bioactivity Prediction, Machine Learning, Drug Discovery, Molecular Descriptors, Feature Importance, Cheminformatics

Abstract

Accurately predicting molecular bioactivity and binding affinity is a cornerstone of modern drug discovery, where early-stage virtual screening significantly reduces time and cost. This study evaluates the performance of multiple machine learning classifiers—Logistic Regression, Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbors (KNN), and Random Forest—in predicting compound activity based on physicochemical and interaction-based features. A comprehensive feature engineering pipeline was applied, including scaling, imputation, and mutual information analysis to identify highly predictive variables such as binding_affinity, logp_pi_interaction, and logp. Among the models, Random Forest emerged as the most effective, achieving a 99.89% accuracy, 100% precision, and 99.82% F1-score, outperforming all baseline classifiers while maintaining generalization. The confusion matrix revealed perfect classification with zero false positives and false negatives, highlighting the model's robustness. Feature importance analysis further confirmed that compound binding strength is the dominant driver of activity classification. While simpler models suffered from overfitting or underfitting, the Random Forest model effectively captured non-linear feature dependencies, making it a reliable tool for virtual screening. Future work will focus on improving interpretability, validating across external datasets, and exploring advanced neural architectures and graph-based models to scale predictive capacity in real-world drug discovery applications.

References

Reddy, A. S., Pati, S. P., Kumar, P. P., Pradeep, H. N., & Sastry, G. N. (2007). Virtual screening in drug discovery: A computational perspective. Current Protein and Peptide Science, 8(4), 329–351. https://doi.org/10.2174/138920307781369427

McInnes, C. (2007). Virtual screening strategies in drug discovery. Current Opinion in Chemical Biology. https://doi.org/10.1016/j.cbpa.2007.08.033

Liu, X., Jiang, S., Duan, X., Vasan, A., Liu, C., Tien, C.-C., Ma, H., Brettin, T., Xia, F., Foster, I. T., & Stevens, R. L. (2024). Binding affinity prediction: From conventional to machine learning-based approaches. arXiv Preprint, arXiv:2410.00709. https://doi.org/10.48550/arXiv.2410.00709

Gorantla, R. (2025). Machine learning in drug discovery: Advancing protein–ligand binding affinity predictions (Doctoral dissertation, University of Edinburgh). https://doi.org/10.7488/era/6206

Otun, M. O. (2025). Artificial intelligence and machine learning approaches for target-based drug discovery: A focus on GPCR-ligand interactions. Journal of Applied Sciences and Environmental Management, 29(3). https://doi.org/10.4314/jasem.v29i3.7

Wang, X.-y., Chen, Y., Li, Y.-f., Wei, C.-y., Liu, M.-y., Yuan, C.-x., Zheng, Y.-y., Qin, M.-h., Sheng, Y.-f., Tong, X.-c., Zheng, M.-y., & Li, X.-t. (2025). Advancing active compound discovery for novel drug targets: Insights from AI-driven approaches. Acta Pharmaceutica Sinica B. https://doi.org/10.1038/s41401-025-01591-x

Catacutan, D. B., Alexander, J., Arnold, A., & Stokes, J. M. (2024). Machine learning in preclinical drug discovery. Nature Chemical Biology. https://doi.org/10.1038/s41589-024-01679-1

Che, X., Liu, Q., Yu, F., Zhang, L., & Gani, R. (2024). A virtual screening framework based on the binding site selectivity for small molecule drug discovery. Computers & Chemical Engineering, 180, 108626. https://doi.org/10.1016/j.compchemeng.2024.108626

Udegbe, F. C., Ebulue, O. R., Ebulue, C. C., & Ekesiobi, C. S. (2024). Machine learning in drug discovery: A critical review of applications and challenges. [Review Paper].

Obaido, G., Mienye, I. D., Egbelowo, O. F., Emmanuel, I. D., Ogunleye, A., Ogbuokiri, B., Mienye, P., & Aruleba, K. (2024). Supervised machine learning in drug discovery and development: Algorithms, applications, challenges, and prospects. Machine Learning with Applications, 15, 100576. https://doi.org/10.1016/j.mlwa.2024.100576

Patel, L., Shukla, T., Huang, X., Ussery, D. W., & Wang, S. (2020). Machine learning methods in drug discovery. Molecules, 25(22), 5277. https://doi.org/10.3390/molecules25225277

Elbadawi, M., Gaisford, S., & Basit, A. W. (2020). Advanced machine-learning techniques in drug discovery. Drug Discovery Today. https://doi.org/10.1016/j.drudis.2020.12.003

Manne, R. (2021). Machine learning techniques in drug discovery and development. International Journal of Applied Research, 7(4), 1–5. https://doi.org/10.22271/allresearch.2021.v7.i4a.8455

Afrose, N., Chakraborty, R., Hazra, A., Bhowmick, P., & Bhowmick, M. (2024). AI-driven drug discovery and development. In Future of AI in Biomedicine and Biotechnology (pp. 19–40). IGI Global. https://doi.org/10.4018/979-8-3693-3629-8.ch013

Garg, P., Singhal, G., Kulkarni, P., Horne, D., Salgia, R., & Singhal, S. S. (2024). Artificial intelligence–driven computational approaches in the development of anticancer drugs. Cancers, 16(22), 3884. https://doi.org/10.3390/cancers16223884

Jaiswal, V. K. (2025). Indian sign language understanding through deep transfer learning and vision models. International Journal of Human Computations & Intelligence, 4(5), 550–565.

Jaiswal, V. K., & Seshakagari, H. R. B. (2025). Automated detection of large animals in road scene environments using deep learning. International Journal of Interpreting Enigma Engineers, 2(2), 1–9.

Downloads

Published

2025-09-09

How to Cite

M Reddi Durgasree, Harshil Sharma, V Kishen Ajay Kumar, V Jyothi, & S Vinay Kumar. (2025). AI-Driven Virtual Screening: Machine Learning-Based Prediction of Molecular Activity and Binding Affinity for Drug Discovery. International Journal of Human Computations and Intelligence, 4(6), 598–609. https://doi.org/10.5281/zenodo.17082841