论文部分内容阅读
特征基因的选择是基因表达谱数据挖掘的关键问题。本文在CARS方法的基础上,提出了基于数据分箱的CARS方法用于特征基因选择。方法基本思路是对数据进行分箱,用CARS方法对各箱变量进行特征选择,所得的特征基因子集合并后再用CARS方法选择最佳特征基因;所选择的最佳特征基因利用支持向量机进行留一交叉检验。本方法在前列腺癌数据集上进行分析,最终选择了7个特征基因,这7个特征基因利用支持向量机进行留一交叉检验所得的样本识别准确率为99.02%。结果表明本方法选择的特征基因分类精度高,且具有良好的稳定性,说明该方法是一种有效的肿瘤特征基因选择方法。
The selection of characteristic genes is a key issue in data mining of gene expression profiles. Based on the CARS method, a CARS method based on data binning is proposed for feature gene selection. The basic idea of the method is to binarize the data, and CARS is used to select the features of each box variable. The resulting subset of feature genes is then combined with the CARS method to select the best feature gene. The best feature gene is selected using support vector machine Leave a cross test. The method was analyzed on the prostate cancer dataset and finally selected seven characteristic genes. The accuracy of identification of the seven characteristic genes using support vector machine for leaving a cross test was 99.02%. The results show that the method selected by the characteristics of gene classification accuracy, and has good stability, indicating that the method is an effective selection of tumor characteristics of genes.