论文部分内容阅读
韵律边界标注对于语料库建设和语音合成有着至关重要的作用,而自动韵律标注可以克服人工标注中不一致、耗时的缺点.仿照人工标注流程,本文运用循环神经网络分别对文本和音频两个通道训练子模型,对子模型的输出采用模型融合的方法,从而获得最优标注.本文以词为单位提取了静音时长,与传统以帧为单位的声学特征相比更具有明确的物理意义,与韵律边界的联系更加紧密.实验结果表明,相比于传统声学特征,本文所采用的静音时长特征使自动韵律标注的性能有所提高;相比于直接特征层面的方法,决策融合方法更好地结合了声学和文本的特征,进一步提高了标注的性能.
Prosodic boundary labeling plays a crucial role in corpus construction and speech synthesis, while automatic prosodic labeling can overcome the inconsistency and time-consuming shortcomings of manual annotation.According to the manual annotation process, this paper uses cyclic neural network to separate the text and audio channels Training sub-model, the output of the sub-model using the method of model fusion to obtain the optimal annotation.This paper extracts the mute time as a unit, compared with the traditional frame-based acoustic features more explicit physical meaning, and The prosodic boundary is more closely linked.The experimental results show that compared with the traditional acoustic features, the mute duration feature used in this paper improves the performance of automatic prosodic annotation.Compared with the direct feature level method, the decision fusion method better Combines acoustic and textual features to further enhance the annotation performance.