论文部分内容阅读
在手机用户数据集中,非换机用户和换机用户存在着严重的不平衡,传统的数据挖掘方法在处理不平衡数据时追求整体正确率,导致换机用户的预测精度较低。针对这一问题,提出一种基于分级式代价敏感决策树的换机预测方法。首先利用粗糙集对原始数据集进行属性约简并计算各属性的重要度,然后根据属性重要度对属性分块建立分级结构,最后以基尼系数和误分代价为分裂标准构建代价敏感决策树,作为每一级的基分类器。对某电信运营商客户数据进行3个仿真试验,结果表明:分级式代价敏感决策树在原始的不平衡用户数据集及欠抽样处理后的平衡用户数据集上都有较好的结果。
In the user data set of mobile phones, there is a serious imbalance between users who switch over and users who replace mobile phones. Traditional data mining methods seek the overall correct rate when dealing with unbalanced data, resulting in a lower prediction accuracy for the users changing mobile phones. To solve this problem, this paper proposes a replacement forecasting method based on hierarchical cost sensitive decision tree. Firstly, rough sets are used to reduce the attributes of the original dataset and the importance of each attribute is calculated. Then the hierarchical structure of attribute is established according to the degree of attribute importance. Finally, the cost-sensitive decision tree is constructed based on the Gini coefficient and the misclassification cost as the splitting criterion, As the base classifier for each level. The simulation results of a telecom operator customer data show that the hierarchical cost-sensitive decision tree has good results on both the original unbalanced user data set and the balanced user data set after undersampling.