,A non-group parallel frequent pattern mining algorithm based on conditional patterns

来源 :信息与电子工程前沿(英文版) | 被引量 : 0次 | 上传用户:djf344010190
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Frequent itemset mining serves as the main method of association rule mining. With the limitations in computing space and performance, the association of frequent items in large data mining requires both extensive time and effort, particularly when the datasets become increasingly larger. In the process of associated data mining in a big data environment, the MapReduce programming model is typically used to perform task partitioning and parallel processing, which could improve the execution efficiency of the algorithm. However, to ensure that the associated rule is not destroyed during task partitioning and parallel processing, the inner-relationship data must be stored in the computer space. Because inner-relationship data are redundant, storage of these data will significantly increase the space usage in comparison with the original dataset. In this study, we find that the formation of the frequent patte (FP) mining algorithm depends mainly on the conditional patte bases. Based on the parallel frequent patte (PFP) algorithm theory, the grouping model divides frequent items into several groups according to their frequencies. We propose a non-group PFP (NG-PFP) mining algorithm that cancels the grouping model and reduces the data redundancy between sub-tasks. Moreover, we present the NG-PFP algorithm for task partition and parallel processing, and its performance in the Hadoop cluster environment is analyzed and discussed. Experimental results indicate that the non-group model shows obvious improvement in terms of computational efficiency and the space utilization rate.
其他文献
事业有成的恒久特钢公司经理罗吉平在评价自己时,总爱把他比喻成一棵草,一滴水,一粒米。生活中的他一向低调。有人说他资产千万怎么会是一棵草、一滴水、一粒米呢?其实别人
High-performance computing (HPC) is essential for both traditional and emerging scientific fields,enabling scientific activities to make progress.With the devel
小言论是报纸版面上的轻骑兵,发挥着重要作用。好的小言论以其短小精悍,语言生动活泼,读来亲切感人而赢得读者,也得到编辑记者的重视。细心浏览报纸的读者会发现,《人民日报
1999~2000年,利用三个不同棉区(即西北内陆棉区北疆早熟棉区—石河子、南疆中早熟棉区—库尔勒和黄河流域棉区—河北南宫市)自育主栽品种进行双向异地种植,探讨棉花产量及品质性状变异规律,研究主要气象因子对棉花产量及品质的影响,揭示了新疆棉区棉花高产、优质的原因。结果表明:1.同一品种在不同生态棉区种植生育期的变化表现为,北疆自育早熟品种在黄淮海棉区种植生育期缩短8~18天,黄淮海棉区的中早熟品种
该研究以综3×P138的F2:3家系为作图群体,构建了玉米RFLP连锁图谱.田间采用随机区组设计考查了230个家系大斑病病斑长、病斑宽、病斑面积和病斑数四个性状,并和P138自交系进
Extreme-scale numerical simulations seriously demand extreme parallel computing capabilities.To address the challenges of these capabilities toward exascale,we
目前,世界人口数量正在快速增长,粮食供应需求数量也相应激增。其中,亚洲、非洲的发展中国家在这方面的表现则更为突出一些。而这些地区又多以水稻作为主要粮食作物,且水稻也是世界三大主要粮食作物之一,其单产的增长对世界粮食安全及经济等方面的安全都具有极其重要的意义。但是,自20世纪80年代以来,水稻单产的增加便一直没有取得较大突破,包括遗传资源狭窄在内的一些主要因素造成了目前水稻单产一直徘徊不前的现状。目
以转Bt基因抗虫棉品系中心94和常规棉品种苏棉12号为对照,对两个转Bt+CpTI双价基因抗虫棉品系(简称双抗-1和双抗-2)进行了以下几个方面的研究:1 转Bt+CpTI双价基因抗虫棉的时空表达特性研究 用生物抗虫性检测方法,分别在6月10日和8月10日对双抗-1主茎叶、上部果枝叶和下部果枝叶进行了抗虫性测定,结果表明:双抗-1的各部位叶片的抗虫性水平没有明显差...
近几年,小报崛起,其中地市报纸已近三百家。小报虽小,却五脏俱全,办好一张小报也决非易事。在10月召开的全国部分地市报纸经验交流会上,我结识了几位小报的总编辑,听他们谈
1 概述在近海底的地下沉积物中 ,我们发现了一个有规律分布的生物系列 ,它们利用从上覆海水降落到海底的有机质来生存。根据有机质输入强度不同 ,这类生物群落的组织程度可