论文部分内容阅读
本文结合藏文各类形态特征 ,首次提出了一种基于格助词和接续特征(BCCF ,BasedonCase auxiliarywordandContinuousFeature)的书面藏文自动分词方案。其总体技术特点是 :在格助词、接续特征、字性知识库以及词典支持下 ,进行逐级定位的确定性分词。初步测试表明 :这一方案在发现和消除切分歧义、解决未登录词问题 ,进而在提高藏文分词精度方面具有很高的实用价值。
In this paper, we first propose a written automatic Tibetan word segmentation scheme based on the BCCF, Based onCase auxiliaryword andContinuousFeature. The overall technical characteristics are: the deterministic participle of level-by-level positioning under the support of lattice adjunct, adjunct characteristic, character knowledge base and dictionary. Preliminary tests show that this program is of great practical value in finding and eliminating the ambiguity in the disambiguation and in solving the problem of unregistered words and thus in improving the precision of Tibetan word segmentation.