,Speech emotion recognition with unsupervised feature learning

来源 :Frontiers of Information Technology & Electronic Engineering | 被引量 : 0次 | 上传用户:lumuming
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Emotion-based features are critical for achieving high performance in a speech emotion recognition(SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this paper, we apply several unsupervised feature learning algorithms(including K-means clustering, the sparse auto-encoder, and sparse restricted Boltzmann machines), which have promise for learning task-related features by using unlabeled data, to speech emotion recognition. We then evaluate the performance of the proposed approach and present a detailed analysis of the effect of two important factors in the model setup, the content window size and the number of hidden layer nodes. Experimental results show that larger content windows and more hidden nodes contribute to higher performance. We also show that the two-layer network cannot explicitly improve performance compared to a single-layer network. Emotion-based features are critical for achieving high performance in a speech emotion recognition (SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this paper, we apply several unsupervised feature learning algorithms (including K-means clustering, sparse auto-encoder, and sparse restricted Boltzmann machines), which have promise for learning task-related features by using unlabeled data, to speech emotion recognition. We then evaluate the performance of the proposed approach and present a detailed analysis of the effect of two important factors in the model setup, the content window size and the number of hidden layer nodes. Experimental results show that larger content windows and more hidden nodes contribute to higher performance. -layer network can not Info improve performance compared to a single-layer network.
其他文献
Quadrature demodulation is used in medical ultrasound imaging to derive the envelope and instan-taneous phase of the received radio-frequency (RF) signal. In qu
We propose a self-adaptive process (SAP) that maintains the software architecture quality using the MAPE-K standard model. The proposed process can be plugged i
学位
We propose a method for histogram equalization using supplement sets to improve the performance of speaker recognition when the training and test utterances are
“环境是指与人类密切相关的、影响人类生活和生产活动的各种自然(包括人工干预下形成的第二自然)力量(物质和能量)或作用的总和.”环境问题是一个复合而复杂的问题,环境问题
As a great challenge of network virtualization, virtual network embedding/mapping is increasingly important. It aims to successfully and efficiently assign the
“振兴教育,教师先行”,教育改革和发展迫切需要培养一批高素质的教师.中小学骨干教师是在基础教育教师群体中起模范带头作用的一个特殊群体.他们的素质与水平,他们的教育思
本试验先用3个多子房小麦作为核心材料,采用单体分析法进行了多子房基因定位;依据增广NCⅡ遗传设计与8个普通小麦进行杂交,研究多子房性状的遗传、配合力和杂种优势;另外与K
党的新闻报刊,是党和人民的喉舌,必须在党的领导下,无条件地宣传党的路线、方针、政策。现在,党中央再一次郑重地向全党和全国人民提出了坚持四项基本原则、反对资产阶级自
We propose an optimal approach to solve the problem of multi-degree reduction of C-Bézier surfaces in the norm L2 with prescribed constraints. The control poin