论文部分内容阅读
虽然汉语和英语的重音自动标注被广泛的研究,但是关于汉语和英语的重音自动标注之间对比的研究还鲜有报道。基于汉语韵律标注库ASCCD和英语韵律标注库Boston University Radio News Corpus,对汉语和英语的重音自动标注的异同进行对比,考察不同的特征在不同语言的语料库上的泛化性能。通过基于集成分类回归树的重音自动标注实验、特征分析及基于互信息的重音自动标注的声学对比,得到如下结论:在相同的条件下,汉语重音自动标注的正确率比英语重音自动标注的正确率要低;在重音自动标注中,词典语法相关特征比声学相关的特征更重要;不同的声学信息源在重音自动标注中所起的作用不同,时长相关的特征对汉语和英语重音自动标注都很重要;英语中大部分特征提供的互信息要比汉语相应的特征提供的互信息要高。
Although the automatic annotation of Chinese and English accents has been extensively studied, the research on the contrast between the automatic annotation of Chinese and English accents is seldom reported. Based on the ASCCD Chinese proscription library and the Boston University Radio News Corpus, the similarities and differences between automatic Chinese and English accent labeling were compared to investigate the generalization performance of different features in different language corpora. Through the acoustical comparison of automatic annotation of accented speech based on integrated classification and regression tree, the feature analysis and the automatic annotation of accented speech based on mutual information, the following conclusions are drawn: under the same conditions, the accuracy of automatic annotation of Chinese stress is more accurate than automatic annotation of English accent In accent auto-annotation, lexical-grammatical features are more important than acoustical-related features. Different acoustical sources play different roles in accent auto-annotation. Long-duration related features automatically mark both Chinese and English accent It is important that most of the features in English provide more mutual information than the corresponding features in Chinese.