论文部分内容阅读
本文使用CNSDA中2013年度的中国综合社会调查,首先筛选出可能有助于预测收入的客观因素,进行了异常值处理和类别合并,然后使用含单个隐层的BP神经网络进行建模,比较了隐节点不同的模型效果,发现隐节点为4的神经网络模型在训练集和测试集的综合表现效果最佳。接着根据模型重要变量分析结果剔除了个别冗余变量进行建模发现模型在测试集上的正确率有所提高。最后引进组合模型Boosting和Bagging发现模型虽然在训练集上预测精度有所改善,但是在测试集上的表现效果却不佳。
In this paper, we use CND’s 2013 China General Social Survey (CNSDA) to screen out the objective factors that may help forecast income, conduct outlier processing and category merging, and then use a BP neural network with a single hidden layer to model Hidden nodes different model effects and found that the hidden node of 4 neural network model in the training set and test set the best overall performance. Then based on the results of the analysis of the important variables of the model, the individual redundant variables are eliminated and modeled, and the correctness of the model on the test set is improved. In the end, the combined model Boosting and Bagging found that although the prediction accuracy is improved on the training set, the performance on the test set is not good.