论文部分内容阅读
该文提出一种适用于各种复杂噪声场景下的鲁棒性活动语音检测方法。采用能量、主频率分量和短时谱熵3种声学参数形成三维特征,这3种参数在各种各样的噪声中表现出很强的互补性;在活动语音脉冲检测中,采用K均值聚类算法自适应地选择特征并且计算语音检测过程中所用到的阈值。在美国国家标准与技术研究院说话人评测2008和2012年任务上进行实验,结果表明:所提出的方法在各种不同噪声环境下均具有较好的性能,相比传统的非监督和有监督活动语音检测算法更加鲁棒高效。
This paper presents a robust activity speech detection method suitable for all kinds of complex noise scenarios. Three kinds of acoustic parameters such as energy, main frequency component and short-time spectral entropy were used to form three-dimensional features. These three kinds of parameters showed strong complementarity in various kinds of noises. In active speech pulse detection, K-means poly The class algorithm adaptively selects features and calculates the thresholds used in the speech detection process. Experiments on the National Institute of Standards and Technology speaker evaluation 2008 and 2012 tasks show that the proposed method has better performance under different noise environments. Compared with traditional unsupervised and supervised Active speech detection algorithms are more robust and efficient.