论文部分内容阅读
语音是一种复杂的非线性信号,这使得基于线性系统理论发展起来的传统说话人识别技术性能难以进一步提高。本文提出了多分形谱簇分析方法,用于分析语音信号的非线性特征,并应用于短语音(2秒)说话人识别。通过对Cantor集的仿真实验,发现不同标度区能反映出系统不同阶段的生长规律,因此可用一组连续变化的多分形谱分层次地表征系统的分形特性,即多分形谱簇分析方法。然后结合语信号的分形特点,提出一种语音的多分形谱簇特征(Multifractal SpectrumCluster Feature,MSCF)的提取方法。最后将几种非线性特征与短时谱特征结合用于说话人识别,基于TIMIT数据库50人的实验表明,非线性特征与短时谱特征互补性较强,特别是MSCF与MFCC、LPC特征结合,使得系统的误识率下降到0.8%。
Speech is a complex nonlinear signal, which makes it difficult to further improve the performance of traditional speaker recognition technology developed based on linear system theory. In this paper, a multi-fractal spectral cluster analysis method is proposed to analyze the nonlinear characteristics of speech signals, and applied to short speech (2 seconds) speaker recognition. By simulating the Cantor set, it is found that different scales can reflect the growth of different stages of the system. Therefore, a set of continuous fractal multi-fractal spectra can be used to characterize the fractal characteristics of the system hierarchically, that is, the multi-fractal cluster analysis method. Combined with the fractal characteristics of speech signal, a multi-fractal spectral cluster feature (MSCF) extraction method is proposed. Finally, several non-linear features and short-time spectral features are used in speaker recognition. Based on TIMIT database, 50 human experiments show that the nonlinear features are strongly complementary to the short-time spectral features, especially the combination of MSCF, MFCC and LPC features , Making the system error rate down to 0.8%.