论文部分内容阅读
本研究将英国国家语料库(BNC)和美国国家语料库(ANC)大规模海量笔语语料随机分为60个实验组和41个检验组,总计83,864个语篇对,通过计算机编程的手段对英语词汇重复率进行动态分析,建立了估算词汇重复率的数学模型,并运用60个实验组对此公式进行了检验。研究发现,词汇重复率曲线的分布较有规律,极值较少;词汇重复率变化曲线为非线性;词汇重复率预测公式误差较小,可以用于估算不同长度的真实语篇英语词汇重复率的理论数值。
In this study, large-scale corpus of English national corpus (BNC) and American national corpus (ANC) were randomly divided into 60 experimental groups and 41 test groups, with a total of 83,864 discourse pairs. Computer- Repetition rate of dynamic analysis, the establishment of a mathematical model to estimate the word repetition rate, and the use of 60 experimental groups to test this formula. The study found that the vocabulary repetition rate curve distribution is more regular, less extreme; vocabulary repetition rate curve is nonlinear; vocabulary repetition rate prediction formula error is small, can be used to estimate the different length of the real text English vocabulary repetition rate The theoretical value.