Gclust Fast microbial genome sized sequence clustering using suffix array algorithm

来源 :第七届全国生物信息学与系统生物学学术大会 | 被引量 : 0次 | 上传用户:sunhan88
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  An increasing number of microbial genomes are being sequenced and deposited in public databases.Building non-redundant reference sequence database through efficient clustering analysis is important for handling the large amount of available microbial genome sized sequences and assembled contigs.Toward this aim,in this article,we describe Gclust (Genome sequence clustering),a program for clustering the rapid growth of complete or draft genome sequences.Using a sparse suffix array algorithm and a long genome sequence identity criteria based on extension DNA maximal exact matches (MEM),Gclust creates clusters under the given set of genome sequences and extension MEM identity.It takes less than 7 hours for the clustering of the 1560 complete microbial genome sequences with average 3.4MB length on Intel(R) Xeon(R) CPU 2.27GHz with 8 threads parallel computing.It offers the possibility of clustering the rapid growth of complete or draft microbial genomes in the future.This program is freely available for non-commercial use at http://weizhong-lab.ucsd.edu/gclust.
其他文献
  Cancer is mainly caused by heterogeneous somatic genome alterations (SGAs).Genome-scale data from individual patients are now readily available,and it is an
会议
  Motivation: Identifying drug-target interaction is an important task in drug discovery.To reduce heavy time and financial cost in experimental identificatio
  Background: Evidence is accumulating that extracellular microvesicles (MVs) facilitate progression and relapse in cancer.Our previous work demonstrated that
  A comprehensive exploration of common and specific plant responses to biotrophs and necrotrophs is necessary for a better understanding of plant immunity.He
  Ohnologs-paralogous gene pairs generated by whole genome duplication (WGD)-are enriched for dosage sensitive genes that have a phenotype due to copy number
  It becomes a hot spot to design various disease specific target penetrating peptide in the process of drug development.Cell penetrating peptides (CPPs) is a
  Introduction: Human genomes are diploid,with the homologous chromosomes being derived from each parent,respectively.The process of resolving the diploid nat
  The analysis of host-based gene expression shows a great potential on the early diagnosis of infectious diseases,especially for influenza prediction.We have
  Human organ,as the basic structural and functional unit in human body,is made of a large community of different cell types that organically bound together.E
  Motivation: Drop-seq has recently emerged as a powerful technology to analyze gene expression from thousands of individual cells simultaneously.Currently,Dr