论文部分内容阅读
作为多媒体媒质之一的音频信号蕴涵了丰富的视觉听觉语义,但是目前多媒体检索主要利用的是视觉信息,音频信息被忽略。为了弥补这一不足,本文介绍了一个音频语义检索原型系统,在这个系统中,音频信号被分层次处理:首先分析音频信息中的短时能量、过零率和基本频率能量比等特征,音频信息流被接层次粗分为静音、和谐音乐、对话和环境背景音四类;由于环境背景音蕴涵了大量语义,环境背景音被继续细分,井用训练好的隐马尔可夫链表示每类环境背景音以进行语义检索。实验数据表明,这样的音频查询处理方式取得了良好效果。
As one of multimedia media, audio signal contains a lot of visual auditory semantics. However, multimedia retrieval mainly uses visual information and audio information is neglected. In order to make up for this problem, this paper introduces a prototype system of audio semantic retrieval. In this system, the audio signals are processed hierarchically. Firstly, the characteristics of short-time energy, zero-crossing rate and basic frequency energy ratio in audio information are analyzed, The level of information flow is roughly divided into four categories: mute, harmonious music, dialogue and environmental background sound. Since the ambient background sound contains a large amount of semantics, the ambient background sound is subdivided and expressed by a well-trained hidden Markov chain Ambient background sounds for semantic search. Experimental data show that such audio query processing has achieved good results.