论文部分内容阅读
传统基于单位点的全基因组关联研究存在重复性低、难以解释性等缺陷,而采用基于机器学习的上位性分析中面临计算复杂度高、预测准确度不足等问题.本文提出一种分析全基因组上位性的新方法,该方法采用二阶段框架的上位性分析方法,它包含特征过滤阶段以及上位性组合优化阶段,在特征过滤阶段提出了多准则融合策略,从多个不同角度评价遗传变异位点,以保证易感的弱效位点能被保留,然后采用多准测排序融合策略剔除与疾病状态关联程度低的遗传变异,进一步在上位性组合优化阶段采用贪婪算法启发式地搜索组合空间,以降低时间复杂度,最后采用支持向量机作为上位性评价模型.实验中采用不同的连锁不平衡参数与经典算法SNPruler与ACO的性能进行对比,实验结果表明:本文方法能有效保留弱效位点,一定程度上提高了疾病预测的正确度.
The traditional single genome-wide genome-wide association study has some defects such as low repeatability and difficult to interpret, but it faces many problems such as high computational complexity and poor prediction accuracy in the epistasis analysis based on machine learning.In this paper, Epistasis new method, which adopts the epistasis analysis method of two-stage framework, which includes the feature filtering stage and the epistatic combination optimization stage, proposes the multi-criteria fusion strategy in the feature filtering stage, and evaluates the genetic variation bits from a number of different perspectives Points to ensure that the susceptible sites of weakness can be preserved, and then use the multi-quasi-sequencing fusion strategy to remove the genetic variation associated with the disease state, and further use the greedy algorithm to heuristic search space for combinations , To reduce the time complexity.Finally, the support vector machine (SVM) is used as the ephemeral model.Experimental results show that the proposed method can effectively preserve the weak effect bit by using different linkage disequilibrium parameters compared with the classical algorithm SNPruler and ACO Point, to a certain extent, improved the accuracy of disease prediction.