论文部分内容阅读
基于蒙特卡洛交叉验证(MCCV)建立了一种用于近红外光谱偏最小二乘建模数据质量的评价方法。该方法首先通过蒙特卡洛交叉验证计算交叉验证均方根误差(RMSECV),同时计算交叉验证中建模样本的预测误差,记为建模样本的均方根误差(RMSECVc)。如果数据中部存在奇异样本、噪声、非线性相应等干扰因素,RMSECV和RMSECVc随因子数的变化应该保持一致,否则,二者的变化趋势将不同。因此,利用RMSECV和RMSECVc随因子数的变化趋势即可对数据的质量进行评价。采用模拟数据和12组实际样品的数据对该方法进行了考察,并对四组实际数据中的奇异样本进行分析,说明了方法的效果。本文为偏最小二乘建模方法提供了一种数据质量的评价方法。
A method based on Monte Carlo cross validation (MCCV) was developed to evaluate the data quality of partial least squares modeling in near infrared spectroscopy. The method first calculates cross validation root mean square error (RMSECV) using Monte Carlo cross-validation and calculates the prediction error for the modeled sample in cross-validation as the root mean square error (RMSECVc) of the modeling sample. The RMSECV and RMSECVc should keep the same with the number of factors if the central part of the data has singular samples, noise and non-linear perturbations, otherwise the trend of the two will be different. Therefore, RMSECV and RMSECVc can be used to evaluate the quality of the data with the changing trend of the number of factors. The method was investigated by using the simulated data and the data of 12 groups of real samples. The analysis of the singular samples in the four groups of actual data shows the effect of the method. This article provides a method for evaluating data quality for partial least squares modeling.