论文部分内容阅读
分数阶Fourier变换在处理非平稳信号尤其是chirp信号方面有着独特的优势,而人耳听觉系统具有自动语音识别系统难以比拟的优良性能。本文采用Gammatone听觉滤波器组对语音信号进行前端时域滤波,然后对输出的各个子带信号用分数阶Fourer变换方法提取声学特征。分数阶Fourier变换的阶数对其性能有着重要影响,本文针对子带时域信号提出了采用瞬时频率曲线拟合求取阶数的方法,并将其与采用模糊函数的方法作了比较。在干净与含噪汉语孤立数字库上的语音识别结果表明,采用新提出的声学特征得到的识别正确率相对MFCC基线系统有了显著提高;根据瞬时频率曲线搜索阶数的算法与模糊函数方法相比,计算量大大减少,并且根据该方法提取的声学特征得到了最高的平均识别正确率。
Fractional Fourier transforms have unique advantages in dealing with non-stationary signals, especially chirp signals, while human hearing systems have superior performance that automatic speech recognition systems can not match. In this paper, the front-end time-domain filtering of the speech signal is performed by the Gammatone auditory filter bank, and then the fractional Fourer transform of the output sub-band signals is used to extract the acoustic features. The order of Fractional Fourier Transform has an important influence on its performance. In this paper, a method of obtaining the order by instantaneous frequency curve fitting is proposed for the subband time-domain signal, and compared with fuzzy method. The results of speech recognition on a clean and noisy Chinese isolated digital library show that the recognition accuracy obtained with the newly proposed acoustic features is significantly improved relative to the MFCC baseline system. The algorithm for searching orders and the fuzzy function method based on the instantaneous frequency curve The computational cost is greatly reduced, and the highest average recognition accuracy is obtained for the acoustic features extracted according to the method.