论文部分内容阅读
目的基于数据的机器学习是现代智能技术中的重要方面,研究从观测数据(样本)出发寻找规律,利用这些规律对未来数据或无法观测的数据进行预测。支持向量机(SVM)方法是建立在统计学习理论的结构风险最小原理基础上的机器学习模式识别方法。本文应用SVM进行预测治疗后5年鼻咽癌患者的生存状态,探索对癌症患者预后研究的新方法,期望在临床上为患者的个体化治疗提供支持。方法通过运用SVM和logistic回归两种方法建立针对鼻咽癌患者5年生存状态的预测模型,再分别对在建立模型过程中未运用的检验数据进行预测,并通过接受者工作特征曲线(ROC)分析对两种模型的预测结果进行比较。结果基于25个原始未经筛选的输入变量建立的模型支持向量机模型1(SVMl)对死亡的预测的敏感性为79.2%,特异性为 94.5%,ROC曲线下面积为0.868;类似的,基于对25个原始变量进行筛选出的9个输入变量建立的模型支持向量机模型2 (SVM2)对死亡的预测的敏感性为79.2%,特异性为95.6%,ROC曲线下面积为0.874;而logistic回归模型的敏感性为66.7%,特异性为83.5%,ROC曲线下面积为0.751。结论对于本组数据,支持向量机和logistic回归在预测性能上相近,但总体上,支持向量机的预测性能更佳。提示支持向量机能够对个体患者的预后作出预测,为临床个体化治疗决策提供支持。
Purpose Machine learning based on data is an important aspect of modern intelligent technology. The research looks for the law from the observed data (samples), and predicts future data or unobservable data by using these rules. Support Vector Machine (SVM) method is a machine learning pattern recognition method based on the principle of least structural risk of statistical learning theory. In this paper, SVM is used to predict the survival of patients with nasopharyngeal carcinoma (NPC) after 5 years of treatment, and to explore new methods for the prognosis of cancer patients. It is hoped that it will provide clinical support for individualized treatment of patients. Methods The prediction models of 5-year survival status of patients with nasopharyngeal carcinoma were established by using SVM and logistic regression. The test data that were not used during the establishment of the model were respectively predicted. The receiver operating characteristic curve (ROC) The analysis compares the predictions of the two models. Results SVMl was 79.2% predictive of death based on 25 raw, unselected input variables with a specificity of 94.5% and an area under the ROC curve of 0.868; similarly, based on The SVM model 2 (SVM2) established by nine input variables screened from 25 original variables had a sensitivity of 79.2%, a specificity of 95.6%, and an area under the ROC curve of 0.874. Logistic The regression model had a sensitivity of 66.7%, a specificity of 83.5% and an area under the ROC curve of 0.751. Conclusion For the data in this group, support vector machine and logistic regression are similar in prediction performance, but overall, SVM has better prediction performance. It is suggested that SVM can predict the prognosis of individual patients and provide support for clinical individualized treatment decisions.