AI-Driven Virtual Screening: Machine Learning-Based Prediction of Molecular Activity and Binding Affinity for Drug Discovery
DOI:
https://doi.org/10.5281/zenodo.17082841Keywords:
Virtual Screening, Random Forest, Bioactivity Prediction, Machine Learning, Drug Discovery, Molecular Descriptors, Feature Importance, CheminformaticsAbstract
Accurately predicting molecular bioactivity and binding affinity is a cornerstone of modern drug discovery, where early-stage virtual screening significantly reduces time and cost. This study evaluates the performance of multiple machine learning classifiers—Logistic Regression, Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbors (KNN), and Random Forest—in predicting compound activity based on physicochemical and interaction-based features. A comprehensive feature engineering pipeline was applied, including scaling, imputation, and mutual information analysis to identify highly predictive variables such as binding_affinity, logp_pi_interaction, and logp. Among the models, Random Forest emerged as the most effective, achieving a 99.89% accuracy, 100% precision, and 99.82% F1-score, outperforming all baseline classifiers while maintaining generalization. The confusion matrix revealed perfect classification with zero false positives and false negatives, highlighting the model's robustness. Feature importance analysis further confirmed that compound binding strength is the dominant driver of activity classification. While simpler models suffered from overfitting or underfitting, the Random Forest model effectively captured non-linear feature dependencies, making it a reliable tool for virtual screening. Future work will focus on improving interpretability, validating across external datasets, and exploring advanced neural architectures and graph-based models to scale predictive capacity in real-world drug discovery applications.References
Reddy, A. S., Pati, S. P., Kumar, P. P., Pradeep, H. N., & Sastry, G. N. (2007). Virtual screening in drug discovery: A computational perspective. Current Protein and Peptide Science, 8(4), 329–351. https://doi.org/10.2174/138920307781369427
McInnes, C. (2007). Virtual screening strategies in drug discovery. Current Opinion in Chemical Biology. https://doi.org/10.1016/j.cbpa.2007.08.033
Liu, X., Jiang, S., Duan, X., Vasan, A., Liu, C., Tien, C.-C., Ma, H., Brettin, T., Xia, F., Foster, I. T., & Stevens, R. L. (2024). Binding affinity prediction: From conventional to machine learning-based approaches. arXiv Preprint, arXiv:2410.00709. https://doi.org/10.48550/arXiv.2410.00709
Gorantla, R. (2025). Machine learning in drug discovery: Advancing protein–ligand binding affinity predictions (Doctoral dissertation, University of Edinburgh). https://doi.org/10.7488/era/6206
Otun, M. O. (2025). Artificial intelligence and machine learning approaches for target-based drug discovery: A focus on GPCR-ligand interactions. Journal of Applied Sciences and Environmental Management, 29(3). https://doi.org/10.4314/jasem.v29i3.7
Wang, X.-y., Chen, Y., Li, Y.-f., Wei, C.-y., Liu, M.-y., Yuan, C.-x., Zheng, Y.-y., Qin, M.-h., Sheng, Y.-f., Tong, X.-c., Zheng, M.-y., & Li, X.-t. (2025). Advancing active compound discovery for novel drug targets: Insights from AI-driven approaches. Acta Pharmaceutica Sinica B. https://doi.org/10.1038/s41401-025-01591-x
Catacutan, D. B., Alexander, J., Arnold, A., & Stokes, J. M. (2024). Machine learning in preclinical drug discovery. Nature Chemical Biology. https://doi.org/10.1038/s41589-024-01679-1
Che, X., Liu, Q., Yu, F., Zhang, L., & Gani, R. (2024). A virtual screening framework based on the binding site selectivity for small molecule drug discovery. Computers & Chemical Engineering, 180, 108626. https://doi.org/10.1016/j.compchemeng.2024.108626
Udegbe, F. C., Ebulue, O. R., Ebulue, C. C., & Ekesiobi, C. S. (2024). Machine learning in drug discovery: A critical review of applications and challenges. [Review Paper].
Obaido, G., Mienye, I. D., Egbelowo, O. F., Emmanuel, I. D., Ogunleye, A., Ogbuokiri, B., Mienye, P., & Aruleba, K. (2024). Supervised machine learning in drug discovery and development: Algorithms, applications, challenges, and prospects. Machine Learning with Applications, 15, 100576. https://doi.org/10.1016/j.mlwa.2024.100576
Patel, L., Shukla, T., Huang, X., Ussery, D. W., & Wang, S. (2020). Machine learning methods in drug discovery. Molecules, 25(22), 5277. https://doi.org/10.3390/molecules25225277
Elbadawi, M., Gaisford, S., & Basit, A. W. (2020). Advanced machine-learning techniques in drug discovery. Drug Discovery Today. https://doi.org/10.1016/j.drudis.2020.12.003
Manne, R. (2021). Machine learning techniques in drug discovery and development. International Journal of Applied Research, 7(4), 1–5. https://doi.org/10.22271/allresearch.2021.v7.i4a.8455
Afrose, N., Chakraborty, R., Hazra, A., Bhowmick, P., & Bhowmick, M. (2024). AI-driven drug discovery and development. In Future of AI in Biomedicine and Biotechnology (pp. 19–40). IGI Global. https://doi.org/10.4018/979-8-3693-3629-8.ch013
Garg, P., Singhal, G., Kulkarni, P., Horne, D., Salgia, R., & Singhal, S. S. (2024). Artificial intelligence–driven computational approaches in the development of anticancer drugs. Cancers, 16(22), 3884. https://doi.org/10.3390/cancers16223884
Jaiswal, V. K. (2025). Indian sign language understanding through deep transfer learning and vision models. International Journal of Human Computations & Intelligence, 4(5), 550–565.
Jaiswal, V. K., & Seshakagari, H. R. B. (2025). Automated detection of large animals in road scene environments using deep learning. International Journal of Interpreting Enigma Engineers, 2(2), 1–9.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 M Reddi Durgasree, Harshil Sharma, V Kishen Ajay Kumar, V Jyothi, S Vinay Kumar

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
