论文部分内容阅读
特征选择问题是机器学习和模式识别中的一个重要问题.然而,在实际应用中,由于没有将特征选择与特征提取过程统一考虑,只注重特征本身的分类性能,没有考虑特征提取的费用问题,导致识别系统的效率较低.文中从实际应用角度,提出一种新的特征选择准则,将特征的分类性能与特征的提取费用统一考虑,利用信息增益与特征提取费用综合评价函数作为特征选择准则,并给出了启发式算法ECFS.将该算法应用于实际领域的学习问题并与决策树算法ID3和BP神经网络进行了比较.实验结果表明,ECFS在保证识别精度的同时,大大减少了特征提取的时间消耗,提高了识别速度.
Feature selection is an important issue in machine learning and pattern recognition. However, in practical applications, the feature selection and feature extraction process are not considered in a unified way and only the classification performance of the feature itself is taken into account. The feature extraction cost problem is not considered, resulting in a low efficiency of the identification system. In this paper, a new feature selection criterion is put forward from the perspective of practical application. The classification performance of features and the extraction cost of features are considered in a unified way. The comprehensive evaluation function of information gain and feature extraction cost is taken as the feature selection criterion, and the heuristic algorithm ECFS. The algorithm is applied to learning problems in real world and compared with decision tree algorithm ID3 and BP neural network. Experimental results show that ECFS can greatly reduce the time consumption of feature extraction and improve the recognition speed while ensuring the recognition accuracy.