论文部分内容阅读
Background: Knowledge of the detailed organization of nucleosomes across genomes and the mechanisms of nucleosome positioning is critical for understanding of DNA transcription, replication, repair, recombination, and disease development, etc.Recently, many prediction algorithms of nucleosome positioning in eukaryotic model organism have been presented.However, most of them were constructed using sequence patterns including k-mer, G+C content, poly A, etc.Why can the algorithm combining sequence patterns obtain higher prediction precision in identifying nucleosome position? Here, we tried to elucidate this question through analyzing average mutual in formation of nucleosome and linker regions in S.cerevisiae genome.Methods: The whole S.cerevisiae genome sequences were downloaded from 2006 assembly of the Yeast Genome Database (http://www.yeastgenome.org/).The experimental maps of nucleosome locations of S.cerevisiae genome were obtained from Penn State Genome Cartography Project (http://atlas.bx.psu.edu/).A total of 54,750 fragments of 147-bp having at least three sequencing reads were selected as the nucleosome region dataset.The flanking 20 bp of nucleosome DNA is regarded as linker DNA and the 70,488 fragments of 20-bp were selected as the linker region dataset.Then, all nucleosome and linker DNAs were separately joined into one sequence, and again split into 2048-bp fragments.The average mutual information function(AMI) was used to measure the information contained in nucleosome and linker regions.AMI is defined as AMI(k) =Σi,jpi(k)j log2 Pi(k)j/PiPj, k =0,1,2,3...where pi denotes the probability of finding the nucleotide ni ∈ (A,G,C,T) and pi(k)j denotes the probability of finding the pair of nucleotides ni and nj separated by a gap of length k.Results: Our analysis showed the value ofAMI (k<=2) in nucleosome and linker DNA regions is obviously larger than random sequence.Furthermore, the value of AMI (k<=2) in nucleosome DNA is larger than linker DNA regions.Conclusions: The results indicated that (i) nucleosome and linker DNA regions contain some sequence information relating with gene regulation and expression (ii) sequence motifs directing nucleosome positioning are enriched in nucleosome DNA regions (iii) short range correlation (k<=2) is the most important characteristic of nucleosome and linker DNA regions.Thus, some sequence patterns including k-mer(k=1,2,3,4) and poly A were often used to construct algorithm of predicting nucleosome positioning .