Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

来源 :计算机科学技术学报(英文版) | 被引量 : 0次 | 上传用户:w168730018
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patts and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patts and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a fiat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patts: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patts and associations; but with large threshold values, some interesting patts and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.
其他文献
期刊
本试验研究了台湾毛豆 2 92、苏早 1号、大粒王、合丰 2 5和宁蔬 6 0等 5个春毛豆品种 ,在与春玉米同钵苗两段膜种植新型方式下的生育进程、植株特点和经济性状 ,比较了不同
该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥
期刊
该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥
期刊
精量播种条件下小麦纹枯病发生规律与常量播种条件下基本相同 ,病情差异也不显著 Under the condition of precision sowing, the occurrence rule of wheat sheath blight
教学内容:人教版5年级下册语文活动课。教学目标:知识目标:使学生了解时代背景,读懂有关梅花的古诗,理解诗意及体会诗的意境。能力目标:培养学生创新思维能力、探究能力、朗
Bt氟铃脲复配剂抑虫琳对甜菜夜蛾的LC50 为 2 73mg/kg ,与 5%抑太保的 2 4 1mg/kg相近 ,田间平均防效抑虫琳为 84 98% ,抑太保为 6 8 4 2 % ,差异不显著。对小菜蛾 ,抑虫
在自然变温条件下 ,笔者以长豇豆为寄主 ,测定了美洲斑潜蝇的生长发育速率。对试验结果的直线回归分析表明 ,其卵、幼虫、蛹、一世代的发育起点温度分别为 :12 82℃、13 35
该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥
期刊
期刊