论文部分内容阅读
邻域粗糙计算模型可以处理信息系统中名义属性和字符属性共存的问题而得到了广泛应用.现有的邻域粗糙计算方法仅讨论了其处理完备信息系统问题,然而实际应用中的数据往往是不完备的.针对这一问题,首先提出了可用于度量不完备信息系统的容差邻域熵;然后得出了一系列相关定义和性质,证明了容差邻域熵是香农熵在不完备信息系统上的自然推广;最后给出了基于容差邻域熵的属性选择算法.实验结果表明,所提出的算法避免了数据过分预处理而带来的冗余信息,使样本在算法选择的特征空间内保持了较高的分类精度,可以更好地处理信息系统不完备的问题.
The neighborhood coarse computing model has been widely used to deal with the coexistence of nominal attributes and character attributes in information systems.The existing neighborhood coarse computing algorithms only deal with the problem of processing complete information systems, however, the data in practical applications are often In this paper, firstly, we propose a tolerance neighborhood entropy that can be used to measure incomplete information systems. Then, a series of related definitions and properties are obtained. It is proved that the tolerance neighborhood entropy is Shannon entropy in the incomplete Information system. Finally, an attribute selection algorithm based on tolerance neighborhood entropy is given. The experimental results show that the proposed algorithm avoids the redundant information brought by excessive data preprocessing, Feature space to maintain a high classification accuracy, can better deal with the incompleteness of the information system.