FUSED-A: MÔ HÌNH ĐA LUỒNG DỰA TRÊN CƠ CHẾ CHÚ Ý ĐỂ PHÁT HIỆN SỚM BẠO LỰC HỌC ĐƯỜNG

Hưng Nguyễn Viết Hưng

Main Article Content

Abstract

School violence is a serious issue that affects students' well-being and the overall quality of the educational environment. However, most existing research focuses on violence in public or cinematic contexts, which significantly differ from school-based violence, often subtle and difficult to detect. Moreover, the lack of specialized datasets remains a major barrier to developing effective surveillance systems. To address these limitations, this study proposes FUSED-A, a multi-stream deep learning architecture that integrates spatio-temporal features from RGB image sequences and 2D skeleton data through a Guided Dot-Product Attention (GDPA) mechanism. The model enables learning correlations between body motion and visual context, thereby enhancing the accuracy of behavior recognition. Additionally, the EduSafe-Early dataset is introduced, comprising 10 action classes specifically designed for early detection of abnormal behaviors. Experimental results demonstrate that FUSED-A outperforms several state-of-the-art methods, offering a promising and practical approach for intelligent school violence surveillance systems.

Article Details