论文部分内容阅读
本文提出了一种基于分类高斯混合模型和神经网络融合的说话人识别系统,根据能量阈值将每个话者语音的语音帧分为两类,在分类子空间分别为每个话者建立两个分类话者模型(GMM),并为每个话者建立一个用于对这两类模型进行数据融合的神经网络,话者识别的结果是经对各个话者神经网络的输出进行判决后做出的.在100个男性话者的与文本无关的说话人识别实验中,基于分类话者模型的策略在识别性能和噪声鲁棒性上均优于传统的GMM话者识别系统,而采用神经网络进行后端融合的策略又优于直接融合的策略,从而可以用较低的话者模型混合度和较短的测试语音获得较好的识别性能及噪声鲁棒性.
In this paper, a speaker recognition system based on a combination of a Gaussian mixture model and a neural network is proposed. According to the energy threshold, the speech frames of each speaker’s speech are divided into two categories, and for each speaker in the classification subspace, two Classify the speaker model (GMM), and for each speaker to establish a neural network for data fusion of these two types of models, the result of speaker recognition is made by judging the output of each speaker neural network .In 100 text-independent speaker recognition experiments of 100 male speakers, the strategy based on the classified speaker model is superior to the traditional GMM speaker recognition system in terms of recognition performance and noise robustness, but the neural network The strategy of back-end fusion is superior to the direct fusion strategy, so that better recognition performance and noise robustness can be obtained with lower talker mix and shorter test speech.