论文部分内容阅读
本文建立了一种集成变量筛选方法,并用于玉米油分和蛋白质近红外光谱分析中的波长筛选。以光谱纯度值及回归系数构建变量重要性的评价指标w,将所有波长按w值大小排序后,用偏最小二乘交互检验按前向选择法选择最佳变量子集。最终从700个波长变量中分别选择了30和20个特征波长分别用于油分及蛋白质校正模型的构建,对独立测试集中样品油分和蛋白质预测的相关系数(R)、预测误差均方根(RMSEP)、平均相对误差(MRE)分别为0.9814、0.0329、0.714%和0.9887、0.0811、0.738%。而全谱变量建模对油分及蛋白质预测的R、RMSEP、MRE分别为0.9351、0.0606、1.474%及0.9709、0.1314、I.246%。可见该方法可有效地减少建模的变量数,提高预测精度。
In this paper, an integrated variable screening method was established and applied to the wavelength screening of corn oil and protein near-infrared spectroscopy. The spectral importance and regression coefficients were used to construct the evaluation index w of the importance of the variables. After sorting all the wavelengths according to the value of w, partial least squares cross-validation was used to select the best subset of variables according to the forward selection method. Finally, 30 and 20 eigen wavelengths were selected from 700 wavelength variables to construct oil and protein calibration models respectively. Correlation coefficients (R), root mean square error of prediction (RMSEP) ), Mean relative error (MRE) were 0.9814,0.0329,0.714% and 0.9887,0.0811,0.738% respectively. The R, RMSEP and MRE predicted by the full-spectrum variable model for oil content and protein were 0.9351, 0.0606, 1.474% and 0.9709, 0.1314 and I.246%, respectively. It can be seen that this method can effectively reduce the number of variables modeled and improve the prediction accuracy.