论文部分内容阅读
提出一种基于支持向量机的肿瘤基因表达谱数据挖掘方法。首先采用信噪比方法对白血病、结肠癌、肺癌数据提取特征基因,生成特征基因子集。然后通过支持向量机分类模型对特征基因子集进行机器学习训练分类。实验结果表明:急性白血病、结肠癌只需4个特征基因,均获得100%的10折交叉验证分类准确率。最后为了有效地排除噪声基因进而挑选出精确度更高的分类特征基因,采用多尺度小波阈值法对肺癌数据进行降噪处理,降噪后仅需5个特征基因获得96.61%的分类准确率。
A new method of mining tumor gene expression data based on support vector machine is proposed. First, the signal to noise ratio method for leukemia, colon cancer, lung cancer data extracted characteristic genes, generate characteristic gene subset. Then, the machine learning training classification of feature gene subsets is carried out by SVM classification model. Experimental results show that: acute leukemia, colon cancer only four characteristic genes were obtained 100% 10 fold cross-validation classification accuracy. Finally, in order to effectively exclude the noise gene and then select the more accurate classification of the characteristics of genes, using multi-scale wavelet threshold method for lung cancer data denoising treatment, only five characteristic genes after noise reduction 96.61% of the classification accuracy.