音频信息检索关键技术研究

来源 :中国科学院自动化研究所 | 被引量 : 0次 | 上传用户：zhenlijinping

【摘要】

：

How to retrieve the vast audio information effectively and efficiently is not only a hotspot for researchers, but also a trend for the industrial community to b

【作者】

：

王磊

【机构】

：

中国科学院自动化研究所

【出处】

：

中国科学院自动化研究所

【发表日期】

：

2009年期

【关键词】

：

音频信息

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

How to retrieve the vast audio information effectively and efficiently is not only a hotspot for researchers, but also a trend for the industrial community to build up newapplications and find new ways to make profits.Through the three years of my Ph.D.study, I have investigated the key technologies of building audio information retrievalsystems.The main research work focused on the following aspects:　　 First of all, this thesis proposed a solution to build up a query bysinging/humming system, from melody database building, melody feature extractionand melody matching.To automatic build melody database, I proposed a mainmelody track extraction algorithm from raw-MIDI files, and a melody phrasesegmentation method; to extract robust feature, two feature extracting methods areadopted: pitch sequence extraction and note sequence extraction; to speed up thematching process, a candidate set reduction method is firstly adopted to filter out theunlikely candidates by faster but less precise methods; then a more accurate butslower strategy is executed on the survival candidate set to perform a finer match.Atthe decision level, I utilize these scores generated during the filtering stage andfine-matching stage to fuse together to get more accurate result.The proposed systemparticipated in the QBSH contest, MIREX2008, and won the 1st place in bothsub-tasks (for Roger Jangs Corpus and ThinkITs Corpus).　　 Second, in the area of audio template searching, this thesis referred to twodifferent methods: fingerprinting-based template searching and audio vector spacemodel-based template searching.This paper proposed a novel method for assigning aweight to an audio word according to the capability to distinguish different audio files.Based on the research work, I implemented an advertisement identification systemand an automatic new advertisement detecting system, the experiment results showthat these two systems could be put into practical use.　　 Thirdly, the paper adopted a GPU-based SVM audio classification trainingmethod to speed up the training process, the result shows the GPU-based trainingcould save 90％ time compared to the CPU-based training.Furthermore, I utilized theaudio classification to the three applications: pre-processing module for speechrecognition, music genre classification and audio-based video retrieval.　　 Finally, a system for automatic news story segmentation is implemented basedon audio and video processing techniques.The system uses key frame clustering,audio classification, audio template searching, speaker change detection methods tolocate potential segmentation points, and it do help a lot for user to explore the newscontent quickly.

其他文献

非完整系统鲁棒控制方法及在水面移动机器人中的应用

近年来，随着机器人技术的发展，各种各样的移动机器人系统正在开始应用于工业、国防安全、公共安全、灾难救援、科学探测等领域；但是，遥控作业仍然是制约移动机器人广泛应用的一个

学位

非完整约束不可预知不确定性非线性控制三体船型水面机器人移动机器人鲁棒控制

基于DSP的疲劳驾驶预警系统的研究

疲劳驾驶是交通事故的一个重要原因，对财产和生命造成巨大的损失，驾驶员疲劳检测对提高交通安全具有非常重要的意义。本文根据疲劳的特征研究结果，在不影响司机驾驶的情况下检测

学位

疲劳驾驶预警系统疲劳驾驶预警系统实时操作系统实时操作系统人脸检测人脸检测人眼定位人眼定位软件开发软件开发

基于扩散张量磁共振成像的个体智力差异研究

智力的生物学基础是长久以来科学家们一直关心的问题。虽然以往的研究已经指出智力可能与多个脑区相关，但是目前还没有研究从基于脑成像的全脑网络角度对智力与脑的关系进行考

学位

脑网络

三维手势数据的语义分类

二十世纪九十年代以来，随着运动捕捉技术的发展，大量的三维人体运动捕捉数据库被建立起来并广泛应用于手势识别的研究当中。正确高效的分析处理这些三维人体运动数据，对大规模三

学位

运动捕捉分类树语义分类人体骨架模型特征提取数据库三维手势数据

非接触式手掌图像识别

随着信息技术和计算机网络技术的发展，人类的生活和工作空间得到了极大的扩展。人们每天不仅进行面对面的交流，同时也会和各种身份的人进行远程沟通。在此背景下，身份的识别变得

学位

生物特征

高速公路网交通流的分层建模和数值模拟

本文提出了一个城市公路交通网络的分层模型。模型认为，车流有自由流动和拥挤流动两种状态。两种状态下的车流密度、流率等车流信息均以波的形式传播，波速大小几乎恒定但方向不

学位

交通流拥挤流高速公路网分层模型数值模拟

基于周期自适应学习补偿和分数阶控制的运动控制研究

运动控制是自动化研究领域的一个重要分支，是推动新产业革命的关键技术之一。运动是机械学科的重要概念，而控制则是控制学科的研究对象和研究内容，因而运动控制具有跨学科的性质

学位

运动控制运动控制永磁同步电机永磁同步电机伺服仿真系统伺服仿真系统分数阶控制分数阶控制自适应学习补偿自适应学习补偿

基于加权Voronoi图的变电站选址定容优化

变电站规划是城市电网规划的重要内容,本文针对现有算法的缺点和不足,提出了基于加权Voronoi图的变电站规划算法,并在此基础上做了进一步的完善和改进,主要包含以下几个方面:1、本文采用了加权Voronoi图的变电站规划算法,并加入了选址过程中已有站容量的变化,增加考虑了对孤立负荷点及孤立站以及规避不可建站区域的处理方案,大大缩短了程序的运行时间,而且算法能保持很好的收敛性。2、本文提出了基于运输模

学位

变电站规划加权Voronoi图中间年选址运输模型

煤矿井下救援机器人的控制系统设计与伺服控制方法研究

防灾、减灾和救灾事关人民生命和财产安全，是国家公共安全的重要组成部分。在危险和恶劣的灾后环境中，救援机器人是一种可以协助救援人员进行相关搜索探测和救援工作的重要辅助

学位

救援机器人CAN总线遗传算法参数整定伺服控制

LXI示波器C/S模式软件的设计与实现

LXI总线是一款相对新型的仪器总线,其结构开放,且不需要专用机箱,为组建分布式自动测试系统提供了十分便利的条件。同时LXI总线与其它仪器总线组建的混合测试系统也为测试复杂的被测对象提供了一种方便、灵活的解决方案。示波器是一种综合的信号特性测试仪,可以形象地显示信号随时间变化的波形,是电子测量仪器中的基本仪器,也是应用最广泛的电子测量仪器。本文首先介绍了仪器总线的研究背景以及国内外示波器的发展现状,

学位

LXI总线示波器C/S模式IEEE1588时钟同步

音频信息检索关键技术研究

与本文相关的学术论文