,A feature selection approach based on a similarity measure for software defect prediction

来源 :信息与电子工程前沿(英文版) | 被引量 : 0次 | 上传用户:pinxue
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.
其他文献
传统的课堂教学已经难以适应目前高职院校人才培养的需求。将数学实验融入高职数学教学是进行教学改革的有效手段之一,它可以把学生从复杂的计算中解放出来,帮助他们重新认识
为明确东北地区干旱胁迫对大豆产量、品质及抗旱生理指标的影响,于2011~2013年采用PEG6000营养液浇灌,在苗期、花荚期和鼓粒期模拟干旱胁迫的方法,对大豆品质性状如氨基酸、蛋白
考虑借款限制、交易量限制、交易成本和风险控制,本文提出了多阶段均值-熵投资组合模型。在该模型中,收益水平和风险分别用可能性均值和熵度量。熵值越小,投资组合包含的不确
We propose a novel clustering algorithm using fast global keel fuzzy c-means-F (FGKFCM-F), where F refers to keelized feature space. This algorithm proceeds in
该文研究了耕作制度决策支持系统(FSDSS)数据库、模型库、知识库"三库一体"结构的实现途径,采取"累接法"完成了FSDSS原型的设计.并且在试验研究的基础上,以FSDSS为主要分析工
1997~1998年在山东农业大学农学试验站和校内网室试验地,采用大田试验和田间微区试验相结合的方法,研究了鲁薯7号和徐薯18两个甘薯品种光合产物的积累分配特点、生理生化基础
为正确评价建国以来棉花育种成就,研究不同历史时期棉花品种性状的演变规律,该研究采用两组资料进行分析,一组为不同历史时期代表品种的多年多点试验资料,一组为50年代以来黄
在盆栽和田间条件下,该试验研究了壤土土壤最优施肥方案下不同施N方法(施N时期和施N次数)对甘蔗产量形成和糖分积累的影响.测定了叶片N含量、叶绿素含量、酶活性、叶片出叶速
In the IEEE 802.16e/m standard, three power saving classes (PSCs) are defined to save the energy of a mobile sub-scriber station (MSS). However, how to set the
高校图书馆的宣传工作对于推介图书馆的资源与服务具有重要作用。以南通航运职业技术学院图书馆为例,高校图书馆宣传工作应该强化宣传意识,注重宣传内容及手段。例如,做好新