论文部分内容阅读
目的:探讨结合患者临床信息和CT特征构建的基于机器学习的非参数诊断模型判定多发肺结节中实性结节性质的效力。方法:回顾性收集北京大学人民医院2010年1月至2018年12月收治的287例多发肺结节患者切除的446个实性结节的病例资料。患者男性117例,女性170例,年龄(61.4±9.9)岁(范围:33~84岁)。将结节按4∶1的比例随机分为训练集(228例,357个结节)和测试集(59例,89个结节),对比不同机器学习算法并选用最优的极致梯度提升(XGBoost)算法建立预测模型(PKU-ML模型)。在测试集上验证该模型准确性,并与其他模型进行比较。最后使用独立的单发实性结节数据集[155例,男性95例,年龄(62.3±8.3)岁(范围:37~77岁)]验证模型预测单发实性结节性质的准确性。采用受试者工作特征曲线的曲线下面积评估模型诊断效力。结果:PKU-ML模型在训练集中的曲线下面积为0.883(95%n CI:0.849~0.917);在测试集中的曲线下面积为0.838(95%n CI:0.754~0.921),优于用于预测单发实性结节的Brock模型(0.709,95%n CI:0.603~0.816,n P=0.04)、Mayo模型(0.756,95%n CI:0.656~0.856,n P=0.01)和VA模型(0.674,95%n CI:0.561~0.787,n P<0.01),与PKUPH模型相当(0.750,95%n CI:0.649~0.851,n P=0.07)。PKU-ML模型在独立单发实性结节数据集中的表现良好,曲线下面积为0.786(95%n CI:0.701~0.872)。n 结论:基于机器学习构建的PKU-ML模型能够更好地预测多发肺结节中实性结节的性质,其预测效力高于常用的参数模型,并且在预测单发实性肺结节良恶性上也有较好表现。“,”Objective:To examine the efficiacy of a machine learning diagnostic model specifically for solid nodules in multiple pulmonary nodules constructed by combining patient clinical information and CT features.Methods:Totally 446 solid nodules resected from 287 patients with multiple pulmonary nodules in Department of Thoracic Surgery, Peking University People′s Hospital from January 2010 to December 2018 were included. There were 117 males and 170 females, aging (61.4±9.9) yeras (range: 33 to 84 years). The nodules were randomly divided into training set (228 patients with 357 nodules) and test set (59 patients with 89 nodules) by a ratio of 4∶1. The extreme gradient boosting(XGBoost) algorithm was used to generate a predictive model (PKU-ML model) on the training set. The accuracy was verified on the test set and compared with previous published models. Finally, an independent single solid nodule set (155 patients, 95 males, aging (62.3±8.3) years (range: 37 to 77 years)) was used to evaluate the accuracy of the model for predictive value of single solid nodules. Area of receiver operating characteristic curve (AUC) are used to evaluate diagnostic values of models.Results:In the training set, the AUC of the PKU-ML model was 0.883 (95%n CI: 0.849 to 0.917). In the test set, the performance of the PKU-ML model (AUC=0.838, 95%n CI: 0.754 to 0.921) was better than the models designed for single pulmonary nodules (Brock model: AUC=0.709, 95%n CI: 0.603 to 0.816, n P=0.04; Mayo model: AUC=0.756, 95%n CI: 0.656 to 0.856, n P=0.01; VA model: AUC=0.674, 95%n CI: 0.561 to 0.787, n P<0.01), similar with PKUPH model (AUC=0.750, 95%n CI: 0.649 to 0.851, n P=0.07). In the independent single solid nodules set, the PKU-ML model also achieved good performance (AUC=0.786, 95%n CI: 0.701 to 0.872).n Conclusion:The machine learning based PKU-ML model can better predict the malignancy of solid nodules in multiple pulmonary nodules, and also achieved a good performance in predicting the malignancy of single solid pulmonary nodules compared to mathematical models.