HIGH OCCUPANCY PATTERN MINING ON WEIGHTED DATABASES

Le Tan Long1,
1 Saigon University

Main Article Content

Abstract

High Occupancy Itemset (HOI) mining is an emerging research direction in data mining that has garnered considerable attention. In contrast to frequent patterns, which are measured by their occurrence frequency, HOIs are defined as itemsets that occupy a significant proportion of the lengths of transactions in which they appear. While typically less numerous than frequent patterns, HOIs often possess more meaningful characteristics, effectively supporting tasks such as data analysis and visualization in intelligent systems. However, a key limitation of HOIs is that they only consider the presence of items, failing to reflect the differences in importance or weight among them.


To address this limitation, this paper introduces the concept of High Weighted Occupancy Patterns (HWOPs) and proposes the HWOP-ROL algorithm for their efficient discovery. Furthermore, we introduce a tight upper-bound, named UBWO, to effectively prune the search space. Experimental results on various benchmark datasets demonstrate the superior efficiency of the proposed approach when compared to a baseline algorithm.

Article Details

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), 487–499.
Zaki, M. J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), 372–390. DOI: 10.1109/69.846291
Grahne, G., & Zhu, J. (2005). Fast algorithms for frequent itemset mining using FP-trees. IEEE Transactions on Knowledge and Data Engineering, 17(10), 1347–1362. DOI:10.1109/TKDE.2005.166
Deng, Z. H., Wang, Z. H., & Jiang, J. J. (2012). A new algorithm for fast mining frequent itemsets using N-lists. Science China Information Sciences, 55(9), 2008–2030. https://doi.org/10.1007/s11432-012-4638-z.
Ramkumar, G. D., Ranka, S., & Tsur, S. (1998). Weighted association rules: Model and algorithm. In Proceedings of the Fourth ACM International Conference on Knowledge Discovery and Data Mining (KDD’98) (pp. 1–13).
Vo, B., Coenen, F., & Le, B. (2013). A new method for mining frequent weighted itemsets based on WIT-trees. Expert Systems with Applications, 40(4), 1256–1264. https://doi.org/10.1016/j.eswa.2012.08.065.
Nguyen, H., Vo, B., Nguyen, M., & Pedrycz, W. (2016). An efficient algorithm for mining frequent weighted itemsets using interval word segments. Applied Intelligence, 45(4), 1008–1020. DOI: 10.1007/s10489-016-0799-6
Bui, H., Vo, B., Nguyen, H., Nguyen-Hoang, T. A., & Hong, T. P. (2018). A weighted N-list-based method for mining frequent weighted itemsets. Expert Systems with Applications, 96, 388–405. https://doi.org/10.1016/j.eswa.2017.10.039
Tang, L., Zhang, L., Luo, P., & Wang, M. (2012). Incorporating occupancy into frequent pattern mining for high-quality pattern recommendation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12) (pp. 75–84). DOI: 10.1145/2396761.2396775
Deng, Z. H. (2020). Mining high occupancy itemsets. Future Generation Computer Systems, 102, 222–229. https://doi.org/10.1016/j.future.2019.07.039
Nguyen, L. T., Mai, T., Pham, G. H., Yun, U., & Vo, B. (2023). An efficient method for mining high occupancy itemsets based on equivalence class and early pruning. Knowledge-Based Systems, 267, 110441. DOI:10.1016/j.knosys.2023.110441