Gclust Fast microbial genome sized sequence clustering using suffix array algorithm

来源 :第七届全国生物信息学与系统生物学学术大会 | 被引量 : 0次 | 上传用户：sunhan88

【摘要】

：

　　An increasing number of microbial genomes are being sequenced and deposited in public databases.Building non-redundant reference sequence database through e

【作者】

：

李瑞琳何小雨陈玮郎显宇 Weizhong Li 牛北方

【机构】

：

中国科学院计算机网络信息中心,高性能计算技术与应用发展部,北京100190;中国科学院大学,北京100190中国科学院计算机网络信息中心,高性能计算技术与应用发展部,北京100190J.CraigVe

【出处】

：

第七届全国生物信息学与系统生物学学术大会

【发表日期】

：

2016年10期

【关键词】

：

Genome Sequence Clustering Maximal Exact Matches Sparse Suffix Array Paralleliza

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

　　An increasing number of microbial genomes are being sequenced and deposited in public databases.Building non-redundant reference sequence database through efficient clustering analysis is important for handling the large amount of available microbial genome sized sequences and assembled contigs.Toward this aim,in this article,we describe Gclust (Genome sequence clustering),a program for clustering the rapid growth of complete or draft genome sequences.Using a sparse suffix array algorithm and a long genome sequence identity criteria based on extension DNA maximal exact matches (MEM),Gclust creates clusters under the given set of genome sequences and extension MEM identity.It takes less than 7 hours for the clustering of the 1560 complete microbial genome sequences with average 3.4MB length on Intel(R) Xeon(R) CPU 2.27GHz with 8 threads parallel computing.It offers the possibility of clustering the rapid growth of complete or draft microbial genomes in the future.This program is freely available for non-commercial use at http://weizhong-lab.ucsd.edu/gclust.

其他文献

From Big Data to Bedside (BD2B) Precision Oncology in a Big Data Era

　　Cancer is mainly caused by heterogeneous somatic genome alterations (SGAs).Genome-scale data from individual patients are now readily available,and it is an

会议

DrugE-Rank Improving Drug-Target Interaction Prediction of New Candidate Drugs or Targets by Ensembl

　　Motivation: Identifying drug-target interaction is an important task in drug discovery.To reduce heavy time and financial cost in experimental identificatio

会议

Drug target interaction predictionLearning to RankEnsemble Learning

miR-146b-5p within BCR-ABL1-positive microvesicles promotes leukemic transformation of hematopoietic

　　Background: Evidence is accumulating that extracellular microvesicles (MVs) facilitate progression and relapse in cancer.Our previous work demonstrated that

会议

microvesicles (MVs)BCR-ABL 1miR-146b-5pregulatory networktransformation

Network-Based Comparative Analysis of Arabidopsis Immune Responses to Golovinomyces orontii and Botr

　　A comprehensive exploration of common and specific plant responses to biotrophs and necrotrophs is necessary for a better understanding of plant immunity.He

会议

Systems biologyCo-expressionProtein-protein interaction networkPlant immunity

Spatial Colocalization of Human Ohnolog Pairs Act to Maintain Dosage-Balance

　　Ohnologs-paralogous gene pairs generated by whole genome duplication (WGD)-are enriched for dosage sensitive genes that have a phenotype due to copy number

会议

OhnologsDosage balanceSpatial colocalizationCopy number variationDisease-ass

Cell Penetrating Peptides Prediction Based on a Novel Graphical Representation

　　It becomes a hot spot to design various disease specific target penetrating peptide in the process of drug development.Cell penetrating peptides (CPPs) is a

会议

Cell Penetrating PeptidesCylindrical RepresentationSequence Analysis

A clone-based haplotyping method by overlapping pool sequencing

　　Introduction: Human genomes are diploid,with the homologous chromosomes being derived from each parent,respectively.The process of resolving the diploid nat

会议

Haplotypingclone-based haplotypingOverlapping pool sequencing

Edge-network based early-warning signals of Influenza Infection

　　The analysis of host-based gene expression shows a great potential on the early diagnosis of infectious diseases,especially for influenza prediction.We have

会议

Edge-networkEarly-warningInfluenza Infection

Pattern Genes Suggest Functional Connectivity of Organs

　　Human organ,as the basic structural and functional unit in human body,is made of a large community of different cell types that organically bound together.E

会议

Organ FunctionOrgan ConnectionPattern Genes

Dr.seq a quality control and analysis pipeline for droplet sequencing

　　Motivation: Drop-seq has recently emerged as a powerful technology to analyze gene expression from thousands of individual cells simultaneously.Currently,Dr

会议

BioinformaticsDrop-seqBig DataSingle Cell AnalysisQuality Control

Gclust Fast microbial genome sized sequence clustering using suffix array algorithm

与本文相关的学术论文