论文部分内容阅读
高送转预案公告发布前,高送转股票具有显著的累计正收益,因此预测高送转股票对于投资具有重要意义。高送转股票的预测是分类预测问题,本文利用上市公司三季度财报数据,采用3种集成学习算法:由K-近邻算法、决策树以及加lasso惩罚项的逻辑斯蒂回归算法构建预测模型——“组合”模型,经典的集成学习算法——Ada Boost算法以及随机森林算法进行建模。本文采用准确率以及G-mean作为模型评价标准,结果显示:“组合”模型的准确率最高,随机森林和“组合”模型的G-mean表现相当,均优于adaboost算法。由于每年高送转股票所占比例小于50%,数据可以看成是非平衡数据,为了改善“组合”模型较差的召回率,本文采用K-Means聚类的欠抽样方法,将此方法用在“组合”模型上,效果显著。最后分别对上面三种模型预测的股票构建投资组合,并以HS300指数做基准。结果显示:“组合”模型预测得到的高送转股票组合表现优于另外两种集成学习模型。
High delivery plan before the announcement, the delivery of stock has a significant cumulative positive returns, so to send high stock transfer forecast for investment is of great significance. This paper uses three kinds of integrated learning algorithms of listed companies, including three kinds of integrated learning algorithms: the K-nearest neighbor algorithm, the decision tree and the Logistic regression algorithm with lasso penalty term to construct the forecasting model - - “Portfolio ” model, classic integrated learning algorithm - Ada Boost algorithm, and random forest algorithm. In this paper, we use the accuracy and G-mean as the evaluation criteria of the model. The results show that the accuracy of “combo ” model is the highest, and the performance of random forest and “combo ” model is equivalent to that of adaboost. Due to the fact that the proportion of high-yielding shares is less than 50%, the data can be regarded as unbalanced data. In order to improve the poor recall rate of the “portfolio” model, this paper adopts the undersampling method of K-Means clustering, Used in “portfolio ” model, the effect is significant. Finally, we construct the investment portfolio for the stocks predicted by the above three models respectively and benchmark them with the HS300 index. The results show that the “portfolio” model is superior to the other two integrated learning models in predicting high delivery-to-stock portfolio.