肺鳞状细胞癌癌症发展模式识别分类模型及特征基因识别

来源 :生物化学与生物物理进展 | 被引量 : 0次 | 上传用户:ruindown
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
本文利用先进的生物信息学方法,首次从全基因组水平综合基因表达、甲基化水平和拷贝数变异三类数据,寻找与肺鳞状细胞癌(LUSC)发生和发展密切相关的特征基因,为进一步解释其内在机理、开发新的靶向药物和治疗手段提供更加深入的理论依据.为克服全基因组数据超高维高噪声小样本特性对机器学习算法性能的影响,防止信息饱和现象的干扰,本文创新性地组合应用4种特征基因筛选方法,分别从特异性、相关性、生物学功能和对肿瘤分类模型的贡献等多个方面,通过迭代降维技术递归筛选真正的特征基因.研究中,我们以TCGA(The Cancer Genome Atlas project)数据库中的LUSCⅠ~Ⅲ期病人样本为例,对其基因表达数据(GE)、基因甲基化数据(ME)以及拷贝数变异数据(CNV)进行分析.结果筛选出67个GE特征基因,对3类样本分类的平均准确率达到86.29%,70个ME特征基因,相应的分类准确率为90.92%,31个CNV特征基因,相应的分类准确率为69.16%.KEGG(Kyoto Encyclopedia of Genes and Genomes)和IPA(Ingenuity Pathway Analysis)对上述3类特征基因集在代谢通路水平和基因调控网络水平上的分析,证明了其在调控水平上的密切关系.同时也表明,识别的特征基因与LUSC肿瘤进展之间有着重要的直接关系,这对了解肿瘤机理以及新靶向治疗的发展非常重要. In this paper, advanced bioinformatics methods have been used for the first time to comprehensively analyze gene expression, methylation levels, and copy number variation data from the genome level to search for characteristic genes closely related to the occurrence and development of lung squamous cell carcinoma (LUSC). Further explanation of its intrinsic mechanism and the development of new targeted drugs and therapies provide a more in-depth theoretical basis. To overcome the effects of ultra-high-dimensional, high-dimensional, high-noise and small-sample characteristics of whole genome data on the performance of machine learning algorithms, and to prevent the interference of information saturation phenomena, This article innovatively combines the application of four kinds of characteristic gene screening methods, and recursively screens true characteristic genes through iterative dimension reduction techniques in terms of specificity, relevance, biological function, and contribution to tumor classification models. We analyzed the gene expression data (GE), gene methylation data (ME), and copy number variation data (CNV) in LUSCI-III patient samples from the TCGA (The Cancer Genome Atlas project) database. RESULTS: Sixty-seven GE gene mutations were screened out. The average accuracy of the classification of the three types of samples reached 86.29%, and 70 ME gene genes were correspondingly accurately classified. For 90.92% of the 31 CNV signature genes, the corresponding classification accuracy rate was 69.16%. The KEGG (Kyoto Encyclopedia of Genes and Genomes) and IPA (Ingenuity Pathway Analysis) were used to analyze the above three kinds of characteristic gene sets at the metabolic pathway level and gene regulatory network. The analysis at the level has proved its close relationship with the regulatory level. It also shows that there is an important direct relationship between the identified genes and LUSC tumor progression, which is very important for understanding the tumor mechanism and the development of new targeted therapies. .
其他文献
股神沃伦·巴菲特给我上了很多课,其中最精彩的一些就来自对他的观察。第一堂课就是:以谦卑之态而屈人之兵。在和女孩交往方面,沃伦一直是个失败者。他很渴望有一位女朋友,但
胃肠间质瘤是胃肠道发生频率最高的间质来源的恶性肿瘤,彻底手术切除是其获得根治的唯一方法,但术后复发和转移的频率较高。伊马替尼在2002年被美国食品药品管理局(FDA)批准
生活是一个布满障碍的长途旅行,而你就是最大的障碍。“滴答……滴答……滴答……”你是不是总觉得时间不够用?那么,不妨看看成功人士是如何精明地利用时间的吧!只要能消除理
由澳门基金会、澳门特别行政区驻北京办事处和百花文艺出版社《散文海外版》杂志联合举办的第三届“我心中的澳门”全球华文散文大赛自2008年6月30日始,2009年6月30日截稿。
请下载后查看,本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.
请下载后查看,本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.
“行了,你这个小小的试验也该结束了。”妈妈漫不经心地说着,“你试过了,没成功,所以你说我们是不是该换个地方了呢!”她穿了那套拉着架子要干活的衣服:退色的青绿色裙子,棉
文学史上父子相传,各自都写出足以传世的杰作,均取得巨大文学成就的并不鲜见,如我国古代的“三曹”、“三苏”。法国也有这样一对著名的父子作家。父亲是《三剑客》和《基度
我将一根枯枝扔进篝火,没有注意到枯枝中住有满满一窝的蚂蚁。树枝燃着了,噼啪作响,群蚁钻出来,在绝望中奔跑。它们在树枝上方奔跑,被火焰烧得 I threw a dead branch into
期刊
N先生经营着一家大公司,每个周末的清晨,他都会在自家的别墅附近散步。  这天,他像往常一样,独自一人在林间小路散步,突然,从树荫下走出来一个年轻女子。她衣着鲜亮,妆容可人,笑盈盈地跟他打招呼:“您好!”  可N先生并不认识眼前这位女子,他停下脚步,不解地问道:“你是——抱歉,我想不起来了……”  女子的回答让N先生大吃一惊,只听她说:“这是当然啦,因为我们是第一次见面。实际上,我有个小小的请求……