论文部分内容阅读
Variable selection is a critical step in data analysis for near infrared spectroscopy.Recently, many studies have been reported on variable selection and researchers have proposed a large number of methods to identify variables(wavelengths) that contribute useful information.In the present study, a key wavelengths selection method named Monte Carlo sampling-recursive partial least squares (MCS-RPLS) is proposed.The method mainly includes three steps: (1) Monte Carlo sampling; (2) feature selection for each subset; and (3) determination of the optimum feature set for the dataset.The method has been used for feature selection and multivariate calibration on four near infrared spectroscopic datasets: corn moisture, corn protein, HSA and γ-globulin of biological samples.And the 10-fold cross validation results are compared with those obtained by full spectra-PLS, Moving Window Partial Least Squares (MWPLS), Monte Carlo-based Uninformative Variable Elimination (MC-UVE) and CARS.The results showed that the data dimensionalities and the RMSECV values of the selected variables are greatly reduced, thus the MCS-RPLS is available for feature selection from NIR data.In addition, the robustness of the proposed method can be enhanced using Monte Carlo strategy.