论文部分内容阅读
针对非平行语料非联合训练条件下的语音转换,提出一种基于倒谱本征空间结构化高斯混合模型的方法。提取说话人语音倒谱特征参数之后,根据其散布矩阵计算本征向量构造倒谱本征空间并训练结构化高斯混合模型SGMM-ES(Structured Gaussian Mixture Model in Eigen Space)。源和目标说话人各自独立训练的SGMM-ES根据全局声学结构AUS(Acoustical Universal Structure)原理进行匹配对准,最终得到基于倒谱本征空间的短时谱转换函数。实验结果表明,转换语音的目标说话人平均识别率达到95.25%,平均谱失真度为1.25,相对基于原始倒谱特征空间的SGMM方法分别提高了0.8%和7.3%,而ABX和MOS测评表明转换性能非常接近于传统平行语料方法。这一结果说明采用倒谱本征空间结构化高斯混合模型进行非平行语料条件下的语音转换是有效的。
Aiming at the speech conversion under the condition of non-parallel corpus non-joint training, a method based on cepstrum eigenspace structured Gaussian mixture model is proposed. After extracting the speech cepstrum feature parameters, the eigenvector is calculated according to its scatter matrix to construct the cepstrum eigenspace and to train the Structured Gaussian Mixture Model in Eigen Space (SGMM-ES). The SGMM-ES, independently trained by the source and the target speaker, is matched and aligned according to the Acoustical Universal Structure (AUS) principle, and finally a short-time spectral conversion function based on the cepstrum eigenspace is obtained. The experimental results show that the average speech recognition rate of the target speaker who converted speech reaches 95.25% and the average spectral distortion is 1.25, which is respectively 0.8% and 7.3% higher than that based on the original cepstrum feature space. However, the ABX and MOS tests show that the conversion Performance is very close to the traditional parallel corpus method. This result shows that it is effective to use the cepstrum eigenspace-structured Gaussian mixture model to convert speech under non-parallel corpus.