论文部分内容阅读
蛋白质执行着生物体内各种重要生物活动,对蛋白质功能的准确标注能极大地促进生命科学研究与应用.传统的湿实验法通量低,已无法测定高通量技术产生的海量蛋白质功能,基于计算模型的大规模蛋白质功能预测是后基因时代生物信息学的核心任务之一.当前基于机器学习的方法通常仅关注对完全未标记功能的蛋白质的功能预测,而忽略了已标注功能的蛋白质可能存在的自身功能标记的不完整性,预测精度有限.本文结合基因本体层次结构关系和蛋白质互作网信息,设计了一种有向混合图(directed hybrid graph,d HG)对上述信息进行描述,并在此基础上提出一种基于有向混合图重启动随机游走的蛋白质功能预测方法——d HG.本文提出的d HG方法不仅能补充已知部分功能标记的蛋白质新功能,还能预测功能完全未知的蛋白质新功能.在酵母菌和人类蛋白质上的实验结果表明,d HG在多种评价度量上的预测性能均优于现有方法,且效率更高.
Proteins perform a variety of important biological activities in vivo, and accurate labeling of protein functions can greatly promote the research and application of life sciences.Traditional wet experimental methods have low throughput and are unable to determine the massive protein functions produced by high-throughput technologies. Predicting large-scale protein functions of computational models is one of the core tasks of bioinformatics in the post-generative era.Modern machine-based methods generally focus only on the functional prediction of completely unlabeled proteins and neglect the marked proteins The incompleteness of the existing functional markers and the limited prediction accuracy.In this paper, a directed hybrid graph (dHG) is designed to describe the above information according to the hierarchical relation of gene ontology and the information of protein networks, And based on this, we propose a new method of protein function prediction based on directed mixed graphs, called dHG. The proposed dHG method can not only complement the new protein functions of known partial functional markers, but also predict New functionally unknown protein. Experimental results on yeast and human proteins show , d HG is better than the existing methods in predicting the performance of multiple evaluation metrics and is more efficient.