论文部分内容阅读
个人网页盛行一时,迄今未有系统研究。本研究创建了一个一百万字的个人网页语料库。该语料库的语料分为“日记式语料”及“文学式语料”。本研究从实词的词频与人称代词的词频两方面来进行比较。实词词频比例与人称代词分布可成为区分口语和书面语的特征。口语中常使用第一、第二人称代词,而书面语中常出现客观描述的第三人称代词。分析结果表明:“日记式语料”与“文学式语料”中实词分别占总词数的64.3%与77.0%,第一人称代词与第二人称代词占人称代词的76.9%与67.3%,第三人称代词占人称代词的23.1%与32.7%。以卡方检定进行比较,三者均有显著差异。本文应用量化研究验证了实词比例为口语和书面语的区别特征这个论说。
Personal page popularity for a while, so far no systematic study. This study created a one-page, personal web corpus. The corpus of the corpus is divided into “diary corpus ” and “literary corpus ”. This study compares the word frequency of real words with that of personal pronouns. The ratio of real word frequency and personal pronoun may be the distinguishing features of spoken and written language. The first and second personal pronouns are often used in spoken English, while objectively described third personal pronouns often appear in written language. The analysis results show that the real words in “diary corpus” and “literary corpus” account for 64.3% and 77.0% of the total respectively, the first person pronoun and the second person pronoun account for 76.9% and 67.3% of the personal pronouns, The third person pronouns account for 23.1% and 32.7% of the personal pronouns. Chi-square test for comparison, three significant differences. In this paper, we use quantitative research to verify that the ratio of real words is the distinctive feature of spoken language and written language.