论文部分内容阅读
The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data,however,makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data,it may overemphasize some aspects and ignore some other important information contained in the richly complex data,because it displays only the difference in the first twoor three-dimensional PC subspaces. Based on PCA,a principal component accumulation (PCAcc) method was proposed. It employs the information contained in multiple PC subspaces and improves the class separability of cancers. The effectiveness of the present method was evaluated by four commonly used gene expression datasets,and the results show that the method performs well for cancer classification.
The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data, however, makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data, it may overemphasize some aspects and ignore some other important information contained in the richly complex data, because it displays only the difference in the first twoor three-dimensional PC subspaces. Based on PCA, a principal component accumulation (PCAcc) method was proposed It employs the information contained in multiple PC subspaces and improves the class separability of cancers. The effectiveness of the present method was evaluated by four commonly used gene expression datasets, and the results show that the method performs well for cancer classification.