Mining and Integrating Reliable Decision Rules for Imbalanced Cancer Gene Expression Data Sets

来源 :Tsinghua Science and Technology | 被引量 : 0次 | 上传用户:gny637259
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will seriously underestimate the performance of the minority class, leading to inaccurate diagnosis in clinical trails. This paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure. The extracted genes are paired as decision rules to distinguish both classes, with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples; thus avoiding tedious data normalization and classifier construction. The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets. There are been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will can unde undestimate the performance of the minority class, leading to inaccurate diagnosis in clinical trails The paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure that. The extracted genes are paired as decision rules to distinguish both classes, with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples and thus avoiding avoiding tedious data normalization and classifier construction. The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets.
其他文献
植物生长离不开水。那么能不能设法促使农作物更好地保存本身已有的水分呢?为此,苏联柯斯嘉戈夫水利工程与土壤改良科研所设计出一种无水“灌溉”方法。他们在楚伊谷地专门
家庭影院的音响系统配置分几个方面,入门级的即是最简单的杜比定向逻辑环绕模式,这种模式除了杜比环绕外,DSP模式也很少,此时环绕音箱按杜比定向逻辑环绕模式7kHz上限频率选
远诊武汉这是一次老蛇医专家季德胜终生难忘的医治蛇伤出门去武汉的远诊。那是1960年8月28日。北京给南通市政府来电:中央卫生部电报指令,请蛇医专家季德胜速赴武汉空军医院
初次看到惠普今年新推出的DT动感系列音箱时,令人有耳目一新的感觉,小旋风的外形也开始“流行”起来了,整套音箱以2000年流行的银灰色为基调,所有音箱的外形设计均以银灰色
以城镇资源环境承载力的约束为前提,对城镇化建设指导思想从“在发展中保护,在保护中发展”的角度进行全面设计,转变产业结构和产业布局,建设资源节约型、环境友好型社会的新
河北省国营柏各庄农场林业局技术员孟令京,从1981年开始,在0.8亩地里栽植了69株大久保桃树苗,采用2×4米株行距的密植方法,经过科学管理,实现了两年见果,3—4年丰产。去年平
在一次瓜成熟前,在每一植株上选一条有雌花的蔓加以标记,并加强管理和追肥。一次瓜收获后立即清除田间杂草,摘除老、弱、病叶,除 Before a melon matures, select a plant
2000年以来,江川县畜牧业发展过快,导致畜禽粪便污染物过多,严重污染星云湖流域水生生态系,造成星云湖水体污染加重.本文分析2000年以来江川县的畜禽污染变化趋势,探讨畜禽粪
采用营养钵育苗的最大缺点,是每年要花很多的劳力来制造土块或装配营养土。同时,在育苗之前,也要花很多劳力来铺摆营养钵、纸袋等;而且在栽苗时,又不得不一个一个的将苗挖起
农村面源污染问题已引起国家和民众的关注,在农村环境中农药和化肥的不合理施用、养殖业粪便及生活污水随意排放是造成农村面源污染的主要来源.根据农村面源污染产生途径及污