Predicting the Subcellular Localization of Human Proteins Using Machine Learning and Exploratory Dat

来源 :基因组、蛋白质组与生物信息学报(英文版) | 被引量 : 0次 | 上传用户:jqh_0727
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Identifying the subcellular localization of proteins is particularly helpful in the functional annotation of gene products. In this study, we use Machine Learning and Exploratory Data Analysis (EDA) techniques to examine and characterize amino acid sequences of human proteins localized in nine cellular compartments. A dataset of 3,749 protein sequences representing human proteins was extracted from the SWISS-PROT database. Feature vectors were created to capture specific amino acid sequence characteristics. Relative to a Support Vector Machine, a Multi-layer Perceptron, and a Naive Bayes classifier, the C4.5 Decision Tree algorithm was the most consistent performer across all nine compartments in reliably predicting the subcellular localization of proteins based on their amino acid sequences (average Precision=0.88; average Sensitivity=0.86). Furthermore, EDA graphics characterized essential features of proteins in each compartment. As examples,proteins localized to the plasma membrane had higher proportions of hydrophobic amino acids; cytoplasmic proteins had higher proportions of neutral amino acids;and mitochondrial proteins had higher proportions of neutral amino acids and lower proportions of polar amino acids. These data showed that the C4.5 classifier and EDA tools can be effective for characterizing and predicting the subcellular localization of human proteins based on their amino acid sequences.
其他文献
Objective To investigate HLA-A,-B and -DRB1 allele and HLA-A-B-DRB1 haplotype frequencies in Mongolia ethnic group. Methods HLA-A, -B, -DRB1 allele and haplotyp
肿瘤基因放射治疗已成为研究热点,早期生长应答-1(Egr-1)基因启动子为肿瘤基因放射治疗提供一种可能的方法。综述了Egr-1基因启动子及应用Egr-1基因启动子构建辐射诱导调控系
用水提取大黄中的水溶性成分,用比色法测定不同浓度大黄提取物对氧自由基的清除率。结果表明:大黄提取物对氧自由基具有明显的清除作用,且提取物浓度在10%时清除率达到最大值
目的:分析胸主动脉腔内修复术治疗急性主动脉综合征的经验,探讨其安全性和有效性。方法:回顾性分析627例行胸主动脉腔内修复术的急性主动脉综合征患者,统计分析其临床资料、影像
An optical biosensor with a stirred cuvette has been used to monitor the interaction of immobilized wheat germ agglutinin (WGA) with two water-soluble cationic
在温度17、20、25、30、33℃,相对湿度70±5%条件下,观察了李始叶螨Eotetranychuspruni(Ounde-mas)实验种群的发育起点温度和有效积温,组建了实验种群生命表,估测了种群净增
机体维持细胞内锌内稳态具有很重要的意义,因而研究锌转运体的结构、转运机制和功能非常有必要.锌转运体家族种类很多,大体可以分为以下3种:Zrt-Irt样蛋白家族、助阳离子扩散
Rural finance is an issue concerned with rural people, improving rural financial management is important to close the relations of cadres and masses, maintain r
铅中毒是全世界普遍存在但是可以预防的一种疾病,虽然美国儿童血铅水平正在下降,预计到2010年仍会有50万儿童血铅水平≥100μg/L。要达到这个国家目标还需要转变公众的健康意
The underlying principle governing the natural phenomena of life is one of the critical issues receiving due importance in recent years. A key feature of the sc