基于深度学习模型的说话人识别算法研究

被引量 : 2次 | 上传用户：zdx_comeon

【摘要】

：

Human speech consists of three kinds of information which are linguistic information, emotional state information and speaker-specific information. The speaker-

【作者】

：

HAZRAT ALI

【发表日期】

：

2016年01期

【关键词】

：

Audio Data Classification Deep Learning i-uector Restricted Boltzmann Machine Sp

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Human speech consists of three kinds of information which are linguistic information, emotional state information and speaker-specific information. The speaker-specific in-formation is the key to speaker recognition tasks, if this information is extracted and utilized properly.The underlying challenge in speaker recognition task is directly related to efficient learn-ing of features from speech data. Until recently, the traditional hand crafted features like Mel Frequency Cepstral Coefficients (MFCCs) have been popular for processing speech and/or audio data. With the development in deep learning technology, research has shifted to unsupervised features learning from audio data. The deep learning tech-nology has shown tremendous improvements in performance on machine learning tasks such as object recognition, face recognition, handwritten character recognition, machine translation, etc.In this thesis, we present our work on the use of deep learning techniques for learning of features from audio data for speaker recognition. In particular, we explore the use of Restricted Boltzmann Machines and Deep Belief Networks for unsupervised features learning. We also propose and discuss a deep hybrid features model combining the unsupervised learned features with the traditional Mel Frequency Cesptral Coefficients. We report the evaluation of these hybrid features on speaker recognition task. Our experimental results show that the deep hybrid features give better recognition accuracy on the speaker recognition task.We also discuss new approach for audio data transformation and training a standard Restricted Boltzmann Machine with the transformed data. We refer this to be the convolutional data.Furthermore, we present a simple late fusion approach for the i-vector paradigm. The i-vectors are recently discovered features with great potential on speaker recognition task. We show our results on the i-vector data from the NIST i-vector challenge. The results achieved with the late fusion approach outperform the baseline score.

其他文献

知识跳跃思想飞跃——中小学数学教学衔接的两个切入点

实现中小学数学教学的衔接能够有效弥补学生数学学习的断层,要把衔接问题置于"教师的教与学生的学"这一对矛盾中.第一,要寻找数学知识的跳跃点,让教学内容衔接;第二,要突破数

期刊

中小学数学教学衔接

农民专业合作社组织内信任的前因及其对合作意愿的影响研究

目前,我国农业基础相对薄弱,农村市场经济还需进一步深化,“三农”问题依然是摆在政府面前的首要问题,同时也是全面建成小康社会的必然要求。在我国农业发展面临新挑战的背景

学位

农民专业合作社人际信任制度信任合作意愿

超声对人流及药流后宫内残留物的诊断价值

目的　探讨腹部超声诊断宫内残留物的声像图特点及其误诊原因。方法　采用B -KMedical型超声诊断仪对 5 2例人流或药流后阴道不规则出血的患者进行检查。结果　超声诊断与病

期刊

超声检查流产人工宫内残留物

红细胞1型补体受体在系统性红斑狼疮患者中的表达及意义

目的通过检测红细胞1型补体受体(CR1)在系统性红斑狼疮(SLE)患者中的表达,探讨其临床意义。方法采用流式细胞术检测50例SLE患者和30例健康对照者红细胞CR1表达的水平,并对SLE

期刊

红细胞1型补体受体系统性红斑狼疮SLE疾病活动指数积分流式细胞术

临江深基坑施工难点的研究及对策

临江复杂深基坑施工一直以来都是深基坑施工的难题,结合工程实践,介绍了临江深基坑施工的难点及对策,具体阐述了基坑土体加固、承压水降水、支撑栈桥布置、土方开挖及混凝土

期刊

临江深基坑坑内土体加固承压水降水支撑栈桥布置切割拆除施工

我国新三板挂牌企业融资效率的实证研究

新三板市场的建立和不断扩容使得我国金融体制改革迈出了重要一步,毋庸置疑,它对于完善我国多层次资本市场具有里程碑的意义。基于此,本文选取40家新三板挂牌企业研究其融资

学位

中小企业新三板融资效率数据包络分析

司法不公型渎职侵权犯罪问题研究

近年来因司法不公导致人民群众上访上诉案件增多,影响了法律权威在民众心中的地位。司法不公型渎职侵权犯罪作为以司法工作人员为特定主体的渎职犯罪,是最严重的腐败。但司法

期刊

职务徇私侦查

再现新四军昔日辉煌——记叶挺军长在泾县对日作战中缴获的日军望远镜

<正>在云岭新四军军部旧址纪念馆众多馆藏品中,尤其值得一提的是由政协江西省宜春市委员会副秘书长、人口资源环境委员会主任黄维华同志捐献给新四军纪念馆的一架日军望远镜

期刊

新四军望远镜

东风夜放花千树巧教学生学语文

到过泰国旅游的人,经常会看到几千斤重的大象被驯象人用一根细绳牵着踽踽前行,也许我们会纳闷这样的庞然大物为什么会“俯首称臣”?殊不知在大象还是小象的时候,人们就用一根

期刊

思维导图书法课古诗文图书角激发学生

关于反腐倡廉法治化的思考

现阶段我国的腐败的形势依然严峻,要使反腐倡廉取得实质成效,法治反腐是必然选择。法治反腐根在良法。法治反腐重在打"虎"。法治反腐更要关"虎"。法治反腐命在"善治"。

期刊

法治反腐倡廉之路“善治”

基于深度学习模型的说话人识别算法研究

与本文相关的学术论文