KHAI THÁC MẪU CHIẾM DỤNG CAO TRÊN CƠ SỞ DỮ LIỆU TRỌNG SỐ

Lê Tấn Long

doi:10.54607/hcmue.js.23.1.5278(2026)

PDF

Số xuất bản: Tập 23, Số 1 (2026)

Chuyên mục: Bài viết

DOI: 10.54607/hcmue.js.23.1.5278(2026)

Ngày xuất bản: 31/01/2026

Lượt xem 283

Lượt tải xuống 39

Trích dẫn bài báo

Lê, T. L. (2026). KHAI THÁC MẪU CHIẾM DỤNG CAO TRÊN CƠ SỞ DỮ LIỆU TRỌNG SỐ. Tạp chí Khoa học Trường Đại học Sư phạm Thành phố Hồ Chí Minh, 23(1), 189-200. https://doi.org/10.54607/hcmue.js.23.1.5278(2026)

Định dạng trích dẫn:

KHAI THÁC MẪU CHIẾM DỤNG CAO TRÊN CƠ SỞ DỮ LIỆU TRỌNG SỐ

Lê Tấn Long^1,
¹ Trường Đại học Sài Gòn, Việt Nam

Tóm tắt

Khai thác mẫu chiếm dụng cao (High Occupancy Itemset – HOI) là một hướng nghiên cứu mới, hiện đang thu hút nhiều sự quan tâm trong lĩnh vực khai phá dữ liệu. Không giống như các mẫu phổ biến vốn dựa trên tần suất xuất hiện, HOI được định nghĩa là những tập danh mục chiếm tỉ lệ lớn trong độ dài của các giao dịch. So với các mẫu phổ biến, số lượng HOI thường ít hơn nhưng lại mang những đặc trưng có ý nghĩa hơn. Tuy nhiên, HOI chỉ chú trọng đến sự có mặt của các danh mục, mà chưa phản ánh sự khác biệt về trọng số giữa chúng. Để khắc phục hạn chế này, bài báo giới thiệu khái niệm mẫu chiếm dụng trọng số cao (High Weighted Occupancy Pattern – HWOP) và đề xuất thuật toán HWOP-ROL nhằm khai thác HWOP. Ngoài ra, chúng tôi cũng xây dựng một ngưỡng chặn trên UBWO để cắt tỉa không gian tìm kiếm. Kết quả thực nghiệm trên nhiều bộ dữ liệu có trọng số chứng minh tính hiệu quả vượt trội của phương pháp đề xuất so với Baseline.

Từ khóa

mẫu chiếm dụng cao, mẫu chiếm dụng trọng số cao, thuật toán HWOP-ROL, ngưỡng chặn trên UBWO

Tài liệu tham khảo

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), 487–499.
Bui, H., Vo, B., Nguyen, H., Nguyen-Hoang, T. A., & Hong, T. P. (2018). A weighted N-list-based method for mining frequent weighted itemsets. Expert Systems with Applications, 96, 388–405. https://doi.org/10.1016/j.eswa.2017.10.039
Deng, Z. H. (2020). Mining high occupancy itemsets. Future Generation Computer Systems, 102, 222–229. https://doi.org/10.1016/j.future.2019.07.039
Deng, Z. H., Wang, Z. H., & Jiang, J. J. (2012). A new algorithm for fast mining frequent itemsets using N-lists. Science China Information Sciences, 55(9), 2008–2030. https://doi.org/10.1007/s11432-012-4638-z
Grahne, G., & Zhu, J. (2005). Fast algorithms for frequent itemset mining using FP-trees. IEEE Transactions on Knowledge and Data Engineering, 17(10), 1347–1362. https://doi.org/10.1109/TKDE.2005.166
Nguyen, H., Le, T., Nguyen, M., Fournier-Viger, P., Tseng, V. S., & Vo, B. (2022). Mining frequent weighted utility itemsets in hierarchical quantitative databases. Knowledge-Based Systems, 237, 107709. https://doi.org/10.1016/j.knosys.2021.107709
Nguyen, H., Vo, B., Nguyen, M., & Pedrycz, W. (2016). An efficient algorithm for mining frequent weighted itemsets using interval word segments. Applied Intelligence, 45(4), 1008–1020. https://doi.org/10.1007/s10489-016-0799-6
Nguyen, L. T., Mai, T., Pham, G. H., Yun, U., & Vo, B. (2023). An efficient method for mining high occupancy itemsets based on equivalence class and early pruning. Knowledge-Based Systems, 267, 110441. https://doi.org/10.1016/j.knosys.2023.110441
Ramkumar, G. D., Ranka, S., & Tsur, S. (1998). Weighted association rules: Model and algorithm. In Proceedings of the Fourth ACM International Conference on Knowledge Discovery and Data Mining (KDD’98) (pp. 1–13).
Tang, L., Zhang, L., Luo, P., & Wang, M. (2012). Incorporating occupancy into frequent pattern mining for high-quality pattern recommendation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12) (pp. 75–84). https://doi.org/10.1145/2396761.2396775
Vo, B., Coenen, F., & Le, B. (2013). A new method for mining frequent weighted itemsets based on WIT-trees. Expert Systems with Applications, 40(4), 1256–1264. https://doi.org/10.1016/j.eswa.2012.08.065
Zaki, M. J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), 372–390. https://doi.org/10.1109/69.846291
Zhang, C., Yang, Y., & Du, Z. (2024). HUSP-SP: Faster utility mining on sequence data. ACM Transactions on Knowledge Discovery from Data, 18, 1–21. https://doi.org/10.1145/359793

Thanh bên bài viết

Nội dung chính của bài viết

Tóm tắt

Từ khóa

Chi tiết bài viết

Tài liệu tham khảo