论文部分内容阅读
传统的非线性频率尺度变换虽然能够反映人类听觉系统(HAS:Human Auditory System)的感知特性,但不能区别对待语音中包含的语义和个性特征,在表达说话人个性特征方面并不充分。通过分析语音信号不同频带短时谱对说话人识别性能的影响,采用最小二乘法多项式曲线拟合技术,提出了一种非线性频率尺度变换。实验表明,与传统的Mel、Bark和ERB频率尺度变换相比,在同样的训练与测试条件下,平均误识率分别降低70.5%,60.8%和70.5%。这一结果说明,本文提出的非线性频率尺度变换有效地增强了短时谱的说话人个性特征,能够提高说话人识别系统的性能。
Although the traditional non-linear frequency scale transform can reflect the perceptual characteristics of human auditory system (HAS), it can not discriminate the semantic and individual features contained in the speech, and it is not sufficient to express the speaker’s personality characteristics. By analyzing the effect of short-time spectrum in different frequency bands on the performance of speaker recognition, a nonlinear frequency scale transform is proposed based on least square polynomial curve fitting technique. Experiments show that under the same training and testing conditions, the average misclassification rate is reduced by 70.5%, 60.8% and 70.5%, respectively, compared with the traditional frequency scaling of Mel, Bark and ERB. This result shows that the nonlinear frequency scaling proposed in this paper can effectively enhance the speaker’s personality characteristics of short-time spectrum and improve the performance of speaker recognition system.