A New Method of Short Read Mapping

来源 :第五届全国生物信息学与系统生物学学术大会 | 被引量 : 0次 | 上传用户:lrdg
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Background: Next Generation Sequencing (NGS) methodology has dramatically increased the sequencing datasets and enabled novel biological applications.NGS has been applied to many areas such as RNA-Seq, CHIP-Seq and MeDIP-Seq.In whole genome re-sequencing projects of mammalian genomes, NGS usually generates billions of short read sequences (short reads).The computational cost of aligning such sequences can be very large.Methods: We propose a new method of aligning Illumina short reads to the reference genome.The novelty of our method lies in its way of indexing the reference.We transform all K-mer subsequences of the reference sequence into natural numbers and use a fixed function to randomize all of them.We then sort these randomized numbers to form an index table.During mapping process, the K-mer subsequences of short reads are transformed into numbers and then randomized in exactly the same way of those K-mer subsequences of the reference.We make use of the statistical character of the index table to speed up the process of inserting the randomized numbers to the index table.If a K-mer subsequence is inserted to the index table (we can call it a seed), we then use seed-and-extend method to make the full alignment.Results: We compare our method with Bowtie2 and SOAP2 respectively using 3 short read data sets.The alignment rate of our method is comparable with Bowtie2 and about 10% more than SOAP2.The speed of our method is 2 to 5 times faster than Bowtie2 and comparable with SOAP2.Conclusions: Unlike complex data structures such as Hash table or FM-index, our index table is simply the index of sorted and randomized nature numbers of K-mer subsequences of the reference.Whats more, we can make use of the statistical character of the index table to speed up the process of finding exact matches.Our method is fast and flexible due to the character of the randomized and sorted index table.The result turns out that the performances of speed and sensitivity of our method is comparable or better than Bowtie2 and SOAP2 .
其他文献
会议
会议
会议
会议
会议
  The traditional 16S rRNA sequence analysis and DNA-DNA hybridization experiment lack resolution power at the species level and below.However, in clinic prac
会议
  Single-nucleotide polymorphisms (SNPs) are recognized as one kind of major genetic variants in population scale.However, polymorphisms at the proteome level
会议
  The proteomics is an especial fountain of finding global biological principles.Proteomic datasets could provide a rich ground for the discovery of the funda
会议
  Background: DNA modifications such as DNA methylation and DNA damage can play critical regulatory roles in biological systems.High throughput DNA modificati
  Background: As a large class of endogenous and small non-coding RNAs, miRNAs (miRNAs) play fundamental roles in multiple biological processes.With the devel