论文部分内容阅读
基频曲线预测是文语转换系统中韵律控制的重要内容。基于决策树的分析,本文提出了一个采用三个控制参数,修改一组归一化的音节基频曲线模板,从而生成连续语流基频曲线的预测模型。由于前一音节的声调动态目标将在该音节的偏后部分实现,甚至影响到连接紧密的后一个音节的起始部分,在预测当前音节的控制参数时,前一音节的基频终止值也作为语境参数之一,参与预测,其结果既保持了前后音节基频的连续性,也提高了基频曲线整体预测的准确率。测试表明:预测基频与实际基频的音节内标准误差小于10 Hz。将这一方法应用于PSOLA语音合成系统后,合成语音的自然度令人满意。
Prediction of fundamental frequency curve is an important part of prosody control in the text-to-speech conversion system. Based on the analysis of the decision tree, this paper presents a prediction model that uses three control parameters to modify a set of normalized syllable frequency curve templates to generate a continuous speech flow fundamental frequency curve. Since the dynamic target of the tone of the previous syllable will be implemented in the latter part of the syllable and even the beginning part of the latter closely connected syllable, when predicting the control parameter of the current syllable, the ending value of the fundamental frequency of the previous syllable As one of the contextual parameters, it participates in the prediction. The result not only maintains the continuity of the fundamental frequency of the front and back syllables, but also improves the overall prediction accuracy of the fundamental frequency curve. The test shows that the standard error of syllables between the predicted fundamental frequency and the actual fundamental frequency is less than 10 Hz. When this method is applied to the PSOLA speech synthesis system, the naturalness of synthesized speech is satisfactory.