论文部分内容阅读
基于NSCLC(非小细胞肺癌)子类分类在临床和生物医学研究方面的意义,利用全基因组基因表达水平(GE)和甲基化(ME)水平的微阵列数据对NSCLC子类分类进行全基因组特征基因识别分析。针对全基因组微阵列数据的高噪声、超高维小样本特性,利用弹性正交贝叶斯算法对全基因组基因进行递归筛选,识别分类精度最优的特征基因集。以TCGA的490的基因表达数据和378个甲基化数据为例,分别识别出52个GE特征基因和25个ME特征基因,相应的分类准确率分别为99%和98%。结合特征基因和临床数据建立的多变量Cox模型明确说明了特征基因在病人生存分析方面的重要作用:仅利用相应的基因表达数据和甲基化数据即可对病人样本的“高/低风险”进行正确分类,显著性水平均低于0.05。特征基因参与的代谢通路与p53、TGF-beta、Wnt等重要的癌症分类和发展的代谢通路的密切关系进一步证实了特征基因对NSCLC分类的重要性。
Based on the clinical and biomedical implications of the NSCLC (non-small cell lung cancer) sub-classification, microarray data on genome-wide gene expression (GE) and methylation (ME) levels were used to genotype NSCLC into whole genome Characteristic gene recognition analysis. According to the characteristics of high noise and ultra-high dimensional small sample of genome-wide microarray data, the whole genome gene was screened by elastic quadrature Bayesian algorithm to identify the set of characteristic genes with the best classification accuracy. Taking TCGA 490 gene expression data and 378 methylation data as examples, 52 GE and 25 ME genes were identified respectively, and the corresponding classification accuracy was 99% and 98% respectively. Multivariate Cox models based on signature genes and clinical data clearly demonstrated the important role of signature genes in patient survival analysis: the “high / low risk” of patient samples can be evaluated using only gene expression data and methylation data "For the correct classification, the significance level below 0.05. The close relationship between metabolic pathways involved in the characteristic genes and the metabolic pathways of important cancer classification and development such as p53, TGF-beta, Wnt further confirmed the importance of the characteristic genes in the classification of NSCLC.