论文部分内容阅读
研究噪声环境下的语音端点检测问题。在低信噪比下 ,虽然噪声和语音的频谱分布不同 ,但是传统语音检测算法使用的时域能量没有描述能量在各频域子带的分布 ,对于语音和噪声没有很好的区分性。以前提出的基于时间 -频率的能量参数利用频域的限带能量加上时域能量来进行噪声中的语音检测。但是它们选择频带的依据是语音信号的高能量子带 ,而没有考虑噪声的子带能量分布。该文提出的语音检测方法同时考虑语音和噪声的频域能量分布 ,采用线性映射的方法将 Mel滤波器组的子带能量特征空间映射到噪声和语音最有区分性的一维子空间 ,得到新的特征参数 EL MBE进行语音检测。实验结果表明 ,在噪声环境下基于线性映射的能量参数比时域能量 ,基于时间 -频率的能量有更好语音检测性能。
Study of voice endpoint detection in noisy environments. At low signal-to-noise ratio (SNR), although the spectral distribution of noise and speech is different, the time-domain energy used by traditional speech detection algorithms does not describe the distribution of energy in each frequency sub-band and does not distinguish between speech and noise well. The previously proposed time-frequency based energy parameters make use of the band-limited energy in the frequency domain plus time-domain energy for speech detection in noise. However, their choice of frequency band is based on the high-energy sub-band of the speech signal without considering the noise sub-band energy distribution. The speech detection method proposed in this paper considers the frequency energy distribution of speech and noise simultaneously. The linear mapping method is used to map the subband energy feature space of Mel filter bank to the most distinguishable one-dimensional subspace of noise and speech. The new characteristic parameter EL MBE performs speech detection. Experimental results show that the energy parameters based on linear mapping have better speech detection performance than time-domain energy and time-frequency based energy under noisy environments.