论文部分内容阅读
Human speech consists of three kinds of information which are linguistic information, emotional state information and speaker-specific information. The speaker-specific in-formation is the key to speaker recognition tasks, if this information is extracted and utilized properly.The underlying challenge in speaker recognition task is directly related to efficient learn-ing of features from speech data. Until recently, the traditional hand crafted features like Mel Frequency Cepstral Coefficients (MFCCs) have been popular for processing speech and/or audio data. With the development in deep learning technology, research has shifted to unsupervised features learning from audio data. The deep learning tech-nology has shown tremendous improvements in performance on machine learning tasks such as object recognition, face recognition, handwritten character recognition, machine translation, etc.In this thesis, we present our work on the use of deep learning techniques for learning of features from audio data for speaker recognition. In particular, we explore the use of Restricted Boltzmann Machines and Deep Belief Networks for unsupervised features learning. We also propose and discuss a deep hybrid features model combining the unsupervised learned features with the traditional Mel Frequency Cesptral Coefficients. We report the evaluation of these hybrid features on speaker recognition task. Our experimental results show that the deep hybrid features give better recognition accuracy on the speaker recognition task.We also discuss new approach for audio data transformation and training a standard Restricted Boltzmann Machine with the transformed data. We refer this to be the convolutional data.Furthermore, we present a simple late fusion approach for the i-vector paradigm. The i-vectors are recently discovered features with great potential on speaker recognition task. We show our results on the i-vector data from the NIST i-vector challenge. The results achieved with the late fusion approach outperform the baseline score.