论文部分内容阅读
提出了一种蒙特卡洛-偏最小二乘回归系数法用于近红外光谱的变量筛选。方法主要包含如下几步:(1)采用蒙特卡洛采样方式,建立多个子集;(2)对每个子集建模,计算其回归系数,并按回归系数绝对值大小对各子模型中的变量进行排序;(3)按频数统计方法对波长排序;(4)对上步中排序后的波长以逐步累加进入最佳变量子集的方式进行交互验证,用以选择最佳变量集。将方法用于生物样品溶液和烟草样品近红外光谱的变量筛选,最终分别从原始的1234及1557个变量中选择了27和68个特征变量,对独立测试集进行预测的RMSEP分别从全谱变量的0.02716和0.06411降低为0.02372和0.03977。方法可有效地对近红外光谱进行变量筛选。
A Monte Carlo-partial least-squares regression coefficient method was proposed for the screening of near-infrared spectra. The method mainly includes the following steps: (1) using Monte Carlo sampling method to establish multiple subsets; (2) modeling each subset and calculating the regression coefficient, and calculating the regression coefficient according to the absolute value of the regression coefficient (3) Sorting the wavelengths by the frequency statistics method; (4) Performing mutual verification by sequentially accumulating the sorted wavelengths in the step up to the best variable subset to select the best variable set. The method was applied to the screening of biological samples and tobacco samples by near infrared spectroscopy. Finally, 27 and 68 eigenvectors were selected from the original 1234 and 1557 variables respectively. The RMSEP of the independent test set was estimated from the full-spectrum variables 0.02716 and 0.06411 decreased to 0.02372 and 0.03977. The method can effectively screen the near infrared spectrum for variation.