论文部分内容阅读
词频统计在信息处理中发挥重要作用。维吾尔语作为新疆少数民族语言之一,词频统计作为自然语言处理的基础性课题成为迫切需要解决的问题,首先将输入的连续维吾尔语字符串进行分词处理,输入分割后的维吾尔语词串,得到一个维吾尔语词典。词典中不重复地存储了每次处理中得到的维吾尔语词组,以及这些词语出现的频率,并进行了维吾尔语的统计实验,实验结果说明了维吾尔语词频之间的内在联系,同时验证了频次、频率、词汇长度、文本书的统计。
Word frequency statistics play an important role in information processing. Uyghur language is one of the minority languages in Xinjiang. Word frequency statistics, as a basic subject of natural language processing, becomes an urgent problem to be solved. First, the input continuous Uyghur language strings are processed by word segmentation and the segmented Uyghur word strings are input to obtain a Uyghur dictionary. The dictionary does not repeatedly store the Uyghur phrases obtained in each process, and the frequency of occurrence of these words, and Uygur statistical experiments, the experimental results illustrate the internal relations between the Uyghur words frequency, and verify the frequency , Frequency, vocabulary length, text book statistics.