论文部分内容阅读
数据挖掘中的噪声检测算法评价多以UCI真实数据为基准数据集,加入模拟的随机噪声,以除去噪声后对挖掘算法性能的提升作为检测效果的评价指标.真实数据内部结构的未知性、随机噪声水平的不确定性,评价指标的单一性使噪声检测算法评价缺乏标准,不易实现算法横向对比.基于此,首先对现有的噪声检测算法评价方法进行分析,提出基于人工数据产生器的噪声检测评价框架及组件,设计了一种基于规则的标准数据产生器及引入随机噪声模型的方法,并提供了具体的评价指标,最后对框架的合理性进行了分析.
The evaluation of noise detection algorithm in data mining mostly uses the real data of UCI as the dataset, and adds the simulated random noise to improve the performance of mining algorithm as an evaluation index to remove the noise.It is unknown that the internal structure of real data is random The uncertainty of the noise level and the singleness of the evaluation index make the evaluation of the noise detection algorithm lack of standards and it is not easy to realize the horizontal comparison of the algorithm.Based on this, the existing evaluation methods of the noise detection algorithm are analyzed firstly, and the noise based on the artificial data generator The evaluation framework and components are tested. A standard rule-based data generator and a method of introducing random noise model are designed. The specific evaluation indexes are provided. Finally, the rationality of the framework is analyzed.