Identification of differential gene expression for microarray data using recursive random forest

来源 :中华医学杂志(英文版) | 被引量 : 0次 | 上传用户:zcm88
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Background The major difficulty in the research of DNA microarray data is the large number of genes compared with the relatively small number of samples as well as the complex data structure. Random forest has received much attention recently; its primary characteristic is that it can form a classification model from the data with high dimensionality. However, optimal results can not be obtained for gene selection since it is still affected by undifferentiated genes. We proposed recursive random forest analysis and applied it to gene selection. Methods Recursive random forest, which is an improvement of random forest, obtains optimal differentiated genes after step by step dropping of genes which, according to a certain algorithm, have no effects on classification. The method has the advantage of random forest and provides a gene importance scale as well. The value of the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, which synthesizes the information of sensitivity and specificity, is adopted as the key standard for evaluating the performance of this method. The focus of the paper is to validate the effectiveness of gene selection using recursive random forest through the analysis of five microarray datasets; colon, prostate, leukemia, breast and skin data. Results Five microarray datasets were analyzed and better classification results have been attained using only a fewgenes after gene selection. The biological information of the selected genes from breast and skin data was confirmed according to the National Center for Biotechnology Information (NCBI). The results prove that the genes associated with diseases can be effectively retained by recursive random forest. Conclusions Recursive random forest can be effectively applied to microarray data analysis and gene selection. The retained genes in the optimal model provide important information for clinical diagnoses and research of the biological mechanism of diseases.
其他文献
A 35-year-old man was admitted due to bloody stool and anemia. The bleeding source could not be detected by esophagogastroduodenoscopy or colonoscopy. Double ba
We report a case of 30-year-old woman with PeutzJeghers syndrome (P.1S).Because of small intestinal obstruction,she received the small intestinal polypectomy in
Background Angiostrongyliasis cantonensis is a worldwide-existing parasitic disease. However, the relevant reports on its radiological appearances are limited.
AIM:To study the candidate tumor suppressor genes (TSG) on chromosome 4p by detecting the high frequency of loss of heterozygosity (LOH) in sporadic colorectal
Background In-stent restenosis (ISR) has become one of the most challenging problems in patients with coronary heart disease. At present, using non-invasive met
Background Late incomplete stent apposition (ISA) may occur after drug-eluting stent implantation, affecting long-term clinical outcomes. This study aimed to ev
AIM:To assess the effect of notoginsenoside R1 on hepatic microcirculatory disturbance induced by gut ischemia/reperfusion (I/R) in mice. METHODS: The superior
背景与目的:探讨汉族、维族和哈族儿童δ-氨基-γ酮戊酸脱水酶(ALAD)、维生素D受体(VDR)基因多态性及其与铅中毒遗传易感性的关系. 材料与方法:采用聚合酶链式反应-限制性片
AIM:To detect the MLH1 gene promoter germlinemethylation in probands of Chinese hereditary nonpolyposis colorectal cancer (HNPCC),and to evaluate the role of me
AIM:To discuss the expression of glactin-3 in liver metastasis of colon cancer and its inhibition by modified citrus pectin (MCP) in mice.METHODS:Seventy-five B