论文部分内容阅读
How to retrieve the vast audio information effectively and efficiently is not only a hotspot for researchers, but also a trend for the industrial community to build up newapplications and find new ways to make profits.Through the three years of my Ph.D.study, I have investigated the key technologies of building audio information retrievalsystems.The main research work focused on the following aspects:
First of all, this thesis proposed a solution to build up a query bysinging/humming system, from melody database building, melody feature extractionand melody matching.To automatic build melody database, I proposed a mainmelody track extraction algorithm from raw-MIDI files, and a melody phrasesegmentation method; to extract robust feature, two feature extracting methods areadopted: pitch sequence extraction and note sequence extraction; to speed up thematching process, a candidate set reduction method is firstly adopted to filter out theunlikely candidates by faster but less precise methods; then a more accurate butslower strategy is executed on the survival candidate set to perform a finer match.Atthe decision level, I utilize these scores generated during the filtering stage andfine-matching stage to fuse together to get more accurate result.The proposed systemparticipated in the QBSH contest, MIREX2008, and won the 1st place in bothsub-tasks (for Roger Jangs Corpus and ThinkITs Corpus).
Second, in the area of audio template searching, this thesis referred to twodifferent methods: fingerprinting-based template searching and audio vector spacemodel-based template searching.This paper proposed a novel method for assigning aweight to an audio word according to the capability to distinguish different audio files.Based on the research work, I implemented an advertisement identification systemand an automatic new advertisement detecting system, the experiment results showthat these two systems could be put into practical use.
Thirdly, the paper adopted a GPU-based SVM audio classification trainingmethod to speed up the training process, the result shows the GPU-based trainingcould save 90% time compared to the CPU-based training.Furthermore, I utilized theaudio classification to the three applications: pre-processing module for speechrecognition, music genre classification and audio-based video retrieval.
Finally, a system for automatic news story segmentation is implemented basedon audio and video processing techniques.The system uses key frame clustering,audio classification, audio template searching, speaker change detection methods tolocate potential segmentation points, and it do help a lot for user to explore the newscontent quickly.