论文部分内容阅读
针对MicroRNA(miRNA)靶基因样本数据不平衡导致阳性样本预测准确率低和整体分类效果不佳的问题,提出一种基于欠采样技术的集成学习算法——支持向量机(SVM)-嵌入下采样和权重平滑(IUSW)集成学习算法。算法采用SVM作为基学习算法,以AdaBoost为集成框架,迭代过程中嵌入基于聚类的欠采样以降低阴阳样本数据分布不平衡程度,同时在自适应样本权重调整过程中,以样本权重平滑机制剔除阴性样本中的异常点以避免过学习,最终以带权重的投票机制组合多个弱分类器预测结果作为miRNA集成分类器的预测结果。实验表明,在不平衡数据集上SVM-IUSW算法和其他算法相比,不但有效提高了阳性靶标的预测准确率和整体分类效果,还增强了miRNA靶标分类器的泛化能力。
In order to solve the problem that the accuracy of positive sample prediction is low and the overall classification effect is poor due to the imbalance of sample data of microRNA target genes, an integrated learning algorithm based on undersampling technology, Support Vector Machine (SVM) And weighted smoothing (IUSW) integrated learning algorithm. The algorithm uses SVM as the basic learning algorithm and AdaBoost as the integration framework. In the iterative process, the clustering-based undersampling is embedded to reduce the uneven distribution of data in the yin and yang samples. In the process of adaptive sample weight adjustment, the sample weight smoothing mechanism is removed Negative samples in order to avoid over-learning. Finally, a combination of multiple weak classifier prediction results with the weighted voting mechanism is used as a prediction result of the miRNA integrated classifier. Experiments show that compared with other algorithms, SVM-IUSW algorithm not only effectively improves the prediction accuracy and overall classification of positive targets, but also enhances the generalization ability of miRNA target classifier.