论文部分内容阅读
【目的】对卡介苗(Bacillus Calmette-Guerin,BCG)美国株(BCG Tice)进行基因组补缺口(补洞)工作,以得到它的基因组完整序列。【方法】首先对BCG Tice进行高通量测序,使用SOAPdenovo软件对得到的数据进行拼接。由于在高通量测序的过程中基因组某些区域测序覆盖度低,测序质量差会使测序结果经拼接后形成众多的重叠群(contig),相邻的位置关系确定的contig形成一个scaffold,contig之间未测到的区域为缺口序列(gap),在contig末端设计引物进行PCR扩增,得到连接相邻contig的PCR产物,对PCR产物进行测序。通过优化PCR引物设计策略,尝试不同的聚合酶进行聚合反应,调整PCR反应条件并结合PCR产物构建克隆测序等方法,补齐contig之间的缺口序列。【结果】完成了BCG Tice的全基因组测序,得到了它的基因组完整序列,序列已提交到美国国立生物技术信息中心(NCBI)的GenBank数据库。【结论】BCG属于高GC含量的革兰氏阳性细菌,其基因组GC含量高达65.65%。本文以BCG Tice基因组补洞为例,对高GC含量基因组补缺口过程中遇到的问题与采取的策略给予概述,望给相关高GC含量基因组的物种全基因组测序补缺口工作提供一些借鉴。
【Objective】 Bacillus Calmette-Guerin (BCG) American strain (BCG Tice) was used to genomic nicking (hole filling) to obtain its complete genome sequence. 【Method】 Firstly, high-throughput sequencing was performed on BCG Tice and the obtained data was spliced by SOAPdenovo software. Due to the low sequencing coverage in some regions of the genome during high-throughput sequencing, poor sequencing results in many contigs after sequencing and contigs formed by adjacent positional relationships to form a scaffold, contig The unmeasured region is a gap. A primer is designed at the end of the contig for PCR amplification to obtain a PCR product that connects adjacent contigs, and the PCR product is sequenced. By optimizing PCR primer design strategy, different polymerases were tried to polymerize, PCR reaction conditions were adjusted and PCR products were cloned and sequenced to fill the gaps between contigs. 【Result】 The complete genome sequencing of BCG Tice was completed and its complete genome sequence was obtained. The sequence has been submitted to the GenBank database of the National Center for Biotechnology Information (NCBI). 【Conclusion】 BCG is a Gram-positive bacterium with high GC content, with genomic GC content as high as 65.65%. In this paper, we take the case of BCG Tice genome hole as an example to give an overview of the problems encountered in high GC genomic up-nicking and the strategy to be taken. We hope to provide some references for the work of genome-wide nucleotide sequencing to complement the gap in GCG genome.