FUSED-A: A MULTI-STREAM ATTENTION-BASED MODEL FOR EARLY DETECTION OF SCHOOL VIOLENCE

Nguyen Viet Hung1,
1 Ho Chi Minh City University of Education, Vietnam

Main Article Content

Abstract

School violence is a serious issue that affects students’ well-being and the overall quality of the educational environment. However, most research focuses on violence in public or cinematic contexts, which significantly differ from school-based violence, often subtle and difficult to detect. Moreover, the lack of specialized datasets remains a major barrier to developing effective surveillance systems. To address these limitations, this study proposes FUSED-A, a multi-stream deep learning architecture that integrates spatio-temporal features from RGB image sequences and 2D skeleton data through a Guided Dot-Product Attention (GDPA) mechanism. The model enables learning correlations between body motion and visual context, thereby enhancing the accuracy of behavior recognition. Additionally, the EduSafe-Early dataset is introduced, comprising 10 action classes specifically designed for early detection of abnormal behaviors. Experimental results demonstrate that FUSED-A outperforms several state-of-the-art methods, offering a promising and practical approach for intelligent school violence surveillance systems.

Article Details

References

Andrade, J. P. F., Si, T., Cavalcanti, A. P., Nascimento, A. C., & Miranda, P. B. (2025). SUSAN: A deep learning-based architecture for violence detection against women in surveillance videos. Expert Systems with Applications, 280, 127337. https://doi.org/10.1016/j.eswa.2025.127337
Government of Vietnam. (2017). Decree No. 80/2017/ND-CP dated July 17, 2017 on a safe, healthy and friendly education environment which prevents and stops school violence. Government Portal of Vietnam.
Dündar, N., Keçeli, A. S., Kaya, A., & Sever, H. (2024). A shallow 3D convolutional neural network for violence detection in videos. Egyptian Informatics Journal, 26, 100455. https://doi.org/10.1016/j.eij.2024.100455
Haque, M., Nyeem, H., & Afsha, S. (2024). BrutNet: A novel approach for violence detection and classification using DCNN with GRU. The Journal of Engineering, 2024(4), e12375. https://doi.org/10.1049/tje2.12375
Islam, Z., Rukonuzzaman, M., Ahmed, R., Kabir, M. H., & Farazi, M. (2021). Efficient two-stream network for violence detection using separable convolutional LSTM. In 2021 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE. https://doi.org/10.1109/IJCNN52387.2021.9533633
Nguyen, L. (2023). Trên .600 vụ bạo lực học đường có tính chất phức tạp, chuyên gia đề xuất giải pháp [Over 2,600 Violent School Incidents of a Complex Nature, Experts Propose Solutions]. People's Representative Newspaper. https://daibieunhandan.vn/giao-duc--y-te1/tren-2-600-vu-bao-luc-hoc-duong-co-tinh-chat-phuc-tap-chuyen-gia-de-xuat-giai-phap-i331004/
Mishra, S., Jain, V., Saraf, Y. A., Kandasamy, I., & WB, V. (2025). Deep neuro-fuzzy system for violence detection. Neurocomputing, 619, 129007. https://doi.org/10.1016/j.neucom.2024.129007
Nguyen, V. H., Ta, C. P., Le, T. L., Ngo, Q. K., & Tran, T. N. (2025). C-ViDNet: a model for supporting violence detection in schools. Ho Chi Minh City University of Education Journal of Science, 22(5), 801-813. https://doi.org/10.54607/hcmue.js.22.5.4699(2025)
Omarov, B., Narynov, S., Zhumanov, Z., Gumar, A., & Khassanova, M. (2022). A skeleton-based approach for campus violence detection. Computers, Materials & Continua, 72(1). https://doi.org/10.32604/cmc.2022.024566
Perseghin, E., & Foresti, G. L. (2023). A shallow system prototype for violent action detection in Italian public schools. Information, 14(4), 240. https://doi.org/10.3390/info14040240
Ta, P., Tran, N., Nguyen, H., & Nguyen, H. D. (2025). Detecting signs of depression on social media: A machine learning analysis and evaluation. Sustainable Futures, 100827. https://doi.org/10.1016/j.sftr.2025.100827
Tang, Y., Chen, Y., Sharifuzzaman, S. A., & Li, T. (2024). An automatic fine-grained violence detection system for animation based on modified faster R-CNN. Expert Systems with Applications, 237, 121691. https://doi.org/10.1016/j.eswa.2023.121691
Tran, N., Nguyen, H., Ly, D., & Nguyen, H. D. (2024). Violence detection using skeleton data with graph convolutional networks. In International Conference on Intelligent Systems and Data Science (pp. 86–97). Springer. https://doi.org/10.1007/978-981-97-9616-8_7
Tran, N., Nguyen, H., Ly, D., Ngo, K., & Nguyen, H. D. (2025b). Advancing violence detection with graph-based skeleton motion analysis. SN Computer Science, 6(6), 1-18. Springer. https://doi.org/10.1007/s42979-025-04118-7
Tran, N., Ta, P., Nguyen, H., Nguyen, H. D., & Le, A.-C. (2025a). Hybrid contextual and sentiment-based machine learning model for identifying depression risk in social media. Expert Systems with Applications, 291, 128505. https://doi.org/10.1016/j.eswa.2025.128505
UNICEF Vietnam. (2018, September 6). More than 150 million adolescents worldwide are subjected to school violence [Press release]. UNICEF. Retrieved July 22, 2025, from https://www.unicef.org/vietnam/vi/thông-cáo-báo-chí/hơn-150-triệu-thanh-thiếu-niên-trên-thế-giới-bị-bạo-lực-học-đường
Ye, L., Wang, L., Ferdinando, H., Seppänen, T., & Alasaarela, E. (2020). A video-based DT–SVM school violence detecting algorithm. Sensors, 20(7), 2018. https://doi.org/10.3390/s20072018