论文部分内容阅读
维吾尔语是一种黏着语,基于单词的语言模型不太适合于维吾尔语大词汇连续语音识别任务。该文提出了适合维吾尔语的基于音节的语言模型,引入最大匹配分词算法评价音节语言模型在大词汇连续语音识别任务中的单词识别性能。实验结果表明:基于音节的语言模型在未登录词和模型复杂度等方面表现出比基于单词的语言模型更加优越的性能,并且使识别系统的单元错误率比基于单词的系统减少了50%。因此,在维吾尔语语音识别任务上可以将音节作为识别单元。
Uyghur language is a cohesive language, word-based language model is not suitable for Uyghur vocabulary continuous speech recognition task. In this paper, a syllable-based language model suitable for Uyghur language is proposed, and the maximum matching word segmentation algorithm is introduced to evaluate the performance of syllable language models in word recognition in large vocabulary continuous speech recognition tasks. The experimental results show that the syllable-based language model shows more superior performance than the word-based language model in terms of unregistered words and model complexity, and reduces the unit error rate of the recognition system by 50% compared with the word-based system. Therefore, syllables can be used as recognition units in Uyghur speech recognition tasks.