论文部分内容阅读
针对非特定人语音识别中的声道长度归一化问题 ,首先研究了一种能够去掉基音激励的、基于自相关估计的共振峰 (Formant)频谱恢复方法 ,说明了不同说话人发同一元音时的频谱互为尺度化的关系 ,以及它们与同一说话人发不同元音时频谱的差别 ,然后结合具有尺度不变性的Mellin变换 ,提出了一种适用于非特定人的语音特征提取方法。在实验中 ,对从非特定人收集的 2 0个汉语元音 ,分别提取了其 FFT倒谱、Mel倒谱、FFT- Mellin倒谱及本文 Formant- Mellin倒谱 ,并用一种很直观的 F- ratio分辨率准则进行了性能评价。结果表明 ,无论是对纯净的 ,还是对带附加白噪声的发音样本 ,本文由共振峰恢复和 Mellin变换相结合得到的语音特征都具有较高的分辨率。
In order to solve the problem of channel length normalization in speech recognition of non-specific people, a Formant spectrum restoration method based on auto-correlation estimation, which can remove the pitch excitation, is studied. It shows that different speakers send the same vowel When the frequency spectrum of the same speaker is scaled, and the difference between them when they are different vowels from the same speaker, and then combining the Mellin transform with scale invariance, a speech feature extraction method suitable for non-specific people is proposed. In this experiment, we extracted the FFT cepstrum, Mel cepstrum, FFT-Mellin cepstrum and the Formant-Mellin cepstrum of 20 Chinese vowels collected from non-specific individuals, respectively, and used a very intuitive F - ratio resolution criteria for performance evaluation. The results show that the speech features obtained by the combination of formant restoration and Mellin transformation all have high resolution, both for pure and for whitened samples.