,Identifying viruses from metagenomic data using deep learning

来源 :定量生物学(英文版) | 被引量 : 0次 | 上传用户:yjzjh225
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Background: The recent development of metagenomic sequencing makes it possible to massively sequence microbial genomes including viral genomes without the need for laboratory culture.Existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences from metagenomic data.Methods: Here we developed a reference-free and alignment-free machine leaing method,DeepVirFinder,for identifying viral sequences in metagenomic data using deep leaing.Results: Trained based on sequences from viral RefSeq discovered before May 2015,and evaluated on those discovered after that date,DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths,achieving AUROC 0.93,0.95,0.97,and 0.98 for 300,500,1000,and 3000 bp sequences respectively.Enlarging the training data with additional millions of purified viral sequences from metavirome samples further improved the accuracy for identifying virus groups that are under-represented.Applying DeepVirFinder to real human gut metagenomic samples,we identified 51,138 viral sequences belonging to 175 bins in patients with colorectal carcinoma (CRC).Ten bins were found associated with the cancer status,suggesting viruses may play important roles in CRC.Conclusions: Powered by deep leaing and high throughput sequencing metagenomic data,DeepVirFinder significantly improved the accuracy of viral identification and will assist the study of viruses in the era of metagenomics.
其他文献
随着世界多元文化的发展,素质教育已成为国与国之间竞争的直接影响因素。国家教育部对中小学音乐教育进行了一系列的课程改革,这也就直接影响了高师教育的办学思想,以及对课
增产菌具有提高作物抗逆性和抗病性,促进并调节体内代谢,增强根部吸收养分,提高叶面光合作用等功能。目前已广泛应用于棉花生产,并取得显著的增产效果。经全县10个点试验和
本文为乐山师范学院2013年度第三批教学模式与方法改革试点课程《视唱练耳》实施试点改革两年来的教学总结,通过对该课程的教学模式与方法改革的最初设想、实施两年来教学实
高职教育是我国教育教学的主要方式,是培养社会人才的主战场,在社会中的重要性不言而喻。所谓的艺术素养是指一个学生,从内心散发的对艺术的看法,体现着学生的综合能力,也是
Background:Since the invention of next-generation RNA sequencing (RNA-seq) technologies,they have become a powerful tool to study the presence and quantity of R
师范类高校音乐教育专业在近20年的快速发展中,迎来了在校人数最多,钢琴教师与钢琴学生比例差距最大的时期.如何通过钢琴教学模式与方法的改革,平衡教与学之间存在的问题是本
基本功技巧在很多京剧剧目中得以广泛的运用,在行内基本功一般分为毯、腿、把、身,本文重点谈腿毯功的教学.
Background:Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions.Meanwhile,as a result of their con
Background:Our understanding of post-transcriptional gene regulation has increased exponentially with the development of robust methods to define protein-RNA in
本研究以西南地区常用的8个玉米骨干自交系为测验种,采用不完全双列杂交设计,对5个玉米人工合成群体新选的15个自交系的配合力、杂种优势以及主要性状遗传参数进行了分析,并以代表我国玉米核心种质的Mo17、黄早四、丹340和478为标准测验种,利用SSR标记,进行了杂优类群和杂优模式研究。结果表明:1.15个性状在各杂交组合间遗传差异真实存在。用不完全双列杂交法对配合力的分析表明,除粒深特殊配合力、测验