论文部分内容阅读
从汉语句法分析等后续处理的观点来看,分词错误所造成的不良影响必须予以重视。首先,分词错误,无论是在不该切的地方切了,还是在该切的地方没有切,都将引发后续处理的困难。因此,自动分词不是一个单纯的切分过程,而应当是“分”中有“合”,“合”中有“分”。其次,分词错误暴露出来的“不合法性”实际上反映了汉语词语构成规律的“不合法性”。因此,有必要开展分词错误的研究。基于以上认识,作者认为一个好的分词系统不能只依靠一个貌似完备的词表,而应当同时引入多部功能各异的词典
From the point of view of the follow-up processing such as Chinese syntactic analysis, the bad influence caused by the segmentation mistakes must be paid attention to. First of all, mistakes in the participle, whether they are cut where they should not be cut or whether they are cut where the cut is made, will trigger difficulties in the follow-up process. Therefore, automatic word segmentation is not a simple process of segmentation, but should be “points” in a “fit”, “fit” in the “points.” Second, the “illegitimacy” exposed by the participative mistranslation actually reflects the “illegitimacy” of the law of Chinese words formation. Therefore, it is necessary to carry out the research of the segmentation mistakes. Based on the above understanding, the author believes that a good word-segmentation system can not rely on a seemingly complete vocabulary, but should also introduce a number of different functions of the dictionary