Identification of MicroRNA Precursors with Support Vector Machine and String Kernel

来源 :基因组蛋白质组与生物信息学报(英文版) | 被引量 : 0次 | 上传用户:xiaoyun1986
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
MicroRNAs (miRNAs) are one family of short (21-23 nt) regulatory non-coding RNAs processed from long (70-110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, premiRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%.
其他文献
为探究吕家坨井田地质构造格局,根据钻孔勘探资料,采用分形理论和趋势面分析方法,研究了井田7
多聚半乳糖醛酸酶(polygalacturonase,PG)广泛存在于细菌、真菌和植物中,目前数据库中已积累了大量的序列资料.为了进一步了解PG基因家族的分子进化,以及其作为系统进化研究
本实验旨在建立高效的性成熟前奶牛幼畜繁殖(JIVET)技术体系.建立了有效的对性成熟前犊牛进行促性腺激素处理的方法,平均每只犊牛可获得卵母细胞31枚:卵母细胞在体外成熟和体
本研究采用RT-PCR结合RACE技术,成功地克隆了一个新的巴西橡胶树(Hevea brasiliensis)K+通道蛋白基因并分析了其结构和表达特征.结果表明,该基因cDNA全长1482 bp,拥有1059 bp
由白粉病菌(Blumeria graminisf.sp.tritici)引起的小麦白粉病是严重影响小麦安全生产的主要病害之一。本研究将来自以色列的野生二粒小麦(Triticum dicoccoides)WE27的坏白
少突胶质细胞(Oligodendrocyte)是中枢神经系统(CNS)的成髓鞘胶质细胞,包绕神经元轴突形成髓鞘,作为绝缘层保证轴突进行正常快速电传导[1].
2006年12月到2007年4月,对内江市几个主要农贸市场两栖类贸易的种类、数量、价格和来源进行了调查.发现内江农贸市场上出售的两栖类主要是青蛙(Rana nigromaculata)和沼蛙(Ra
研究赤霉素(GA3),冷湿和温度 对五个种源的印度冷杉(Abies pindrow)和长叶云杉(Picea smithiana)种子萌发的影响.种子被浸泡在GA3 (10 mg(L-1)中24小时,然后在3(5(C温度的条
利用枯草芽胞杆菌、蜡样芽胞杆菌、植物乳酸菌,酪酸梭状芽孢杆菌这四种茼种进行豆粕发酵实验.观测豆粕中胰蛋白酶抑制因子(TI)、凝集素在发酵前后的变化,筛选出可用于豆粕发
目的 采羊水细胞培养作染色体核型分析以了解唐氏(Downs)综合征出现的频率与产前诊断之间的关系.方法 46例妊娠16~23周的孕妇进行羊膜穿刺术并细胞培养进行核型分析.结果 羊水