论文部分内容阅读
microRNA(miRNA)是一类不编码蛋白的调控小分子RNA,在真核生物中发挥着广泛而重要的调控功能.由于miRNA的表达具有时空特异性,因而通过计算方法预测miRNA而后有针对性的实验验证是miRNA发现的一条重要途径.降低假阳性率是miRNA预测方法面临的重要挑战.本研究采用集成学习方法构建预测miRNA前体的分类器SVMbagging,对训练集、测试集和独立测试集的结果表明,本研究的方法性能稳健、假阳性率低,具有很好的泛化能力,尤其是当阈值取0.9时,特异性高达99.90%,敏感性在26%以上,适合于全基因组预测.采用SVMbagging在人全基因组中预测miRNA前体,当取阈值0.9时,得到14933个可能的miRNA前体.通过与高通量小RNA测序数据的比较,发现其中4481个miRNA前体具有完全匹配的小RNA序列,与理论估计的真阳性数值非常接近.最后,对32个可能的miRNA进行实验验证,确定其中2条为真实的miRNA.
MicroRNAs (miRNAs) are a class of non-coding small RNAs that play important and regulatory roles in eukaryotes.Because miRNAs have spatiotemporal specificity, miRNAs are predicted by computational methods and then targeted Experimental verification is an important way of miRNA discovery.Improving the false positive rate is an important challenge for miRNA prediction methods.In this study, an integrated learning method was used to construct SVMbagging, a classifier for predicting miRNA precursors. Training training sets, test sets and independent test sets The results show that the method is robust, low false positive rate and good generalization ability. Especially when the threshold value is 0.9, the specificity is as high as 99.90% and the sensitivity is above 26%, which is suitable for genome-wide prediction. Using SVMbagging to predict miRNA precursors in the human whole genome, 14933 possible miRNA precursors were obtained when the threshold value was 0.9.Comparing with the high-throughput small RNA sequencing data, 4481 miRNA precursors were found to be perfectly matched The small RNA sequence is close to the theoretical true positive value.Finally, 32 possible miRNAs were verified by experiments and two of them were confirmed as true miRNAs.