Clustering feature decision trees for semi-supervised classification from high-speed data streams

来源 :Journal of Zhejiang University-Science C(Computers & Electro | 被引量 : 0次 | 上传用户:TCH376854850
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed. Most stream data classification algorithms apply the supervised learning strategy which require massively labeled data. Both approaches are impractical since labeled data are usually hard to obtain in reality. This paper, we build a clustering feature decision tree model, CFDT, from data streams having both unlabeled and a small number of labeled examples. CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction. Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property. Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed.
其他文献
应该说,没有党组织的培养,就没有我的今天。1994年冬,我穿上崭新的绿军装,来到白雪皑皑的新疆边防,成了一名解放军战士。当连队党支部了解到我有考军校的 It should be said
“北京国际茶业展”由中华人民共和国商务部批准,中华全国供销合作总社支持,中国茶叶流通协会、北京市西城区人民政府联合第三方产茶区人民政府共同主办,北京展览馆承办。“
辛桂梓在全省组织系统深入推进“讲党性、重品行、作表率”活动视频会议上强调3月1日,中央组织部召开全国组织系统深入推进“讲党性、重品行、作表率”活动视频会议。省委常
山坡绿是以舒城群体种中叶型为原始材料,经系统选种,无性繁殖选育而成的。品种比较试验表明:该品种优质、高产、抗寒抗旱、早生,适制名优绿茶,成年茶园适于机械化操作。 Hil
干旱胁迫是全球限制作物生产的主要气象灾害。中国是受干旱严重影响的农业大国,干旱胁迫正严重制约着农作物产量潜力的发挥。因此,选育耐早作物品种、提高作物的耐/抗旱能力是作物生产急需解决的关键问题之一,而挖掘耐早种质、明确作物耐旱机理是培育耐旱新品种和制订抗旱栽培措施的基础。大麦(Hordeum vulgare)适应性强用途广,是全球各地普遍栽培的第四大谷类作物。然而,现代育种方法使得栽培大麦的多样性和
本刊讯(记者陈文静)去年,针对群众反映比较强烈的教辅资料征订、补课收费、择校收费等问题,全省累计检查高校和中小学校11050所,清退违规收费1262.89万元,并查处各类案件(含
百萨偃麦草(Thinopyrum bessarabicum Love,2n=2x=14, JJ or EbEb)携有多种抗病和抗逆基因,是小麦改良的重要基因资源。为转移、定位和利用这些基因,南京农业大学已经选育了多个二体附加(代换)系,但尚未选育出涉及完整1J染色体的二体附加或代换系。本研究在前人工作的基础上,对涉及1J和6J染色体多重添加系的自交后代进行分子标记、贮藏蛋白分析和细胞学鉴定,选育出
近年来,农业面源污染日益加剧,所带来的污染易引起人们广泛关注,特别使造成的水体富营养化问题日益严峻。所以,研究农业面源污染控制具有非常重要的现实意义。对于巢湖流域,农业面
摘要目的评估标准摄取值(SUV)和表观扩散系数(ADC)单独及联合应用的独立预后价值,以评估这两个变量联合是否能够为头颈部鳞状细胞癌(HNSCC)病人提供进一 Abstract Objective
大豆疫霉病是一种土传病害,严重影响大豆的产量,目前对美国、巴西、阿根廷、中国等国的大豆生产都产生了很大影响。大豆疫霉遗传变异丰富,新的生理小种不断涌现,不同地区间具