论文部分内容阅读
在不同物种中,微卫星(simple sequence repeats,SSR)序列的数目、类型及其分布情况有很大差异。本研究利用Perl语言开发用于探寻编码区SSR位点的程序,并用SSRHunter软件验证,来分析马铃薯基因组编码区中SSR位点的分布情况。结果显示:在马铃薯的56 218条编码区序列中,检索到2 920条共含有3 512个SSR位点的序列,其中2 519条序列只含有一个SSR位点(占86%);含三核苷酸、六核苷酸重复单元的SSR数目最多,分别为2 358和1 075个,两者占总数的98%,其他类型的重复单元出现次数较少;构成微卫星序列的不同重复单元有603个;六个核苷酸的重复单元重复次数一般最少为三个,三核苷酸GAA重复单元重复次数最高,为193次。在自然选择规律下,编码区中SSR序列长度趋向于密码子的整数倍。运用Pfam数据库对含有SSR的编码序列进行功能分类,其中最多的是RPW8抗性蛋白功能。可利用SSR序列的特异性,筛选马铃薯不同物种的相关编码序列。
In different species, there are great differences in the numbers, types and distribution of SSR sequences. In this study, Perl language was used to explore the SSR locus coding sequence and verified by SSRHunter software to analyze the distribution of SSR loci in the genome of potato. The results showed that 2 920 sequences containing 3 512 SSR loci were found in 56 218 coding region of potato, of which 2 519 sequences contained only one SSR locus (86% The SSR numbers of nucleotide and hexanucleotide repeat units were the highest, which were 2 358 and 1 075 respectively, accounting for 98% of the total, while other types of repeat units appeared less frequently. The different SSRs constituting microsatellite sequences had 603; the repetition number of six nucleotides is generally at least three, and the highest number of repeat of trinucleotide GAA is 193 times. Under natural selection, the length of the SSR sequence in the coding region tends to be an integer multiple of the codon. Functional classification of SSR-containing coding sequences using the Pfam database, most of which are RPW8-resistant protein functions. The specificity of the SSR sequences can be used to screen for the related coding sequences of different potato species.