论文部分内容阅读
本文介绍具有音变等语音学标注的汉语自然口语库(Chinese Annotated Spontaneous SpeechCorpus,简称CASS)。CASS的标注包括三层:第一层是音节层,标注每个音节的正则的拼音和声调;第二层是声母-韵母层,采用SAMPA-C符号标注系统标注声母、韵母的实际发音,声调和音段的音变都在这层上标出;第三层是杂类层,标注一些副语言学和非语言学现象。文中最后简单地分析了音变产生的原因,并给出了普通话口语中音变的统计结果。和标准发音相比,CASS中声母音变42.2%,韵母音变11.8%,音节音变27.2%。CASS标注的目的是建立自然口语语音识别的发音模型,它已经在2000年约翰·霍普金斯大学的语言工程Workshop的项目中得到使用。
This article describes the Chinese Annotated Spontaneous Speech Corpus (CASS) with phonetic annotation such as phonetic transcription. The CASS annotation includes three layers: the first is the syllable layer, which is marked with the regular pinyin and tone of each syllable; the second layer is the initial consonant - vowel layer, and the actual pronunciation and tone of the initials and vowels are marked by the SAMPA-C symbology system And tone changes are marked on this level; the third layer is a miscellaneous layer, marked by some of the linguistic and non-linguistic phenomenology. Finally, the paper simply analyzes the reason of the phonetic change and gives the statistical result of the phonetic variation in Mandarin. Compared with the standard pronunciation, the CASS consonant changes 42.2%, vowel change 11.8%, syllables 27.2%. The purpose of the CASS callout is to establish a phonetic model of natural spoken speech recognition that has been used in the project of the Language Engineering Workshop at Johns Hopkins in 2000.