论文部分内容阅读
本文介绍了一个基于自学习的无需人工编制词典的切词系统.首先用统计方法建立起附有相关度的切词词典.然后,提出将切词问题转化为一个有向图中求解最大加权路径问题,并提出利用词典中的相关度信息切分文本的一个新算法.最后,我们对词典和切词的质量都作了系统的分析,并与其他方法作了性能比较.
In this paper, a self-learning based system for word segmentation without manual compilation is introduced.Firstly, a word segmentation dictionary with relevance is established by statistical method.Then, the problem of word segmentation is transformed into a directed graph to solve the maximum weighted path And put forward a new algorithm that uses the relevancy information in the dictionary to divide the text.Finally, we make a systematic analysis of the quality of the dictionaries and the cut-words, and compare them with other methods.