论文部分内容阅读
在与文本无关的声纹识别研究中,目前性能较好而且较成熟的系统均是基于训练并在测试数据时长较长的情况下获得的,如NIST评测中的核心测试环境下训练和测试语音时长约5分钟.而在实际应用中,由于声纹识别的特殊性,用户一般都不太配合,通常很难获得足够多的训练语音数据,从而限制了经典的话者识别系统,大大降低了其性能.本文针对与实际应用直接相关的短时话者识别,提出了一种采用Parzen Window的非参数估计方法,对目标话者的短时数据进行建模,从而达到提高话者模型推广能力的目标.该方法在NIST SRE2006的短时任务10s训练,测试的实验结果与传统的GMM-UBM得分融合后,在等错误率EER下比基线系统相对降低了10.76%.
In text-independent research of voiceprint recognition, currently well performing and more mature systems are based on training and are obtained with longer test data, such as training and testing of voice in the core test environment of the NIST evaluation The duration of about 5 minutes.In practical applications, because of the particularity of voiceprint recognition, users generally do not cooperate, it is usually difficult to obtain enough training voice data, thus limiting the classic speaker recognition system, greatly reducing its Performance.This paper proposes a Parzen Window-based nonparametric estimation method for short-term speaker recognition, which is directly related to the practical application, to model the short-term data of the target speaker so as to improve the promotion ability of the speaker model This method was trained on the short-term mission of NIST SRE2006 for 10 seconds, and after the experimental results were merged with the traditional GMM-UBM scores, the relative error rate EER was reduced by 10.76% compared with the baseline system.