论文部分内容阅读
本文用“的、一、了、不”四个最常用的字对约1 300万字的现代文学作品的句长做了多元回归分析,同时也对中国小学三年级和六年级学生的作文做了同样的分析。经检验,根据现代文学作品得到的回归方程和两个年级学生作文的回归方程均有显著差异,两个年级的学生作文的回归方程也有显著差异。这种差异表明不同群体的写作水平是有差异的。最后,本文用“的、一、了、不”在不同群体作文中分布上的差异作为指标,该指标经转换后被作为作文的分数。结果表明:现代文学作品优于六年级作文、六年级作文优于三年级作文。本文是用计量语言学方法解决作文机器评分问题的一个尝试。
This paper uses the four most commonly used words to make multiple regression analysis on the length of modern literary works of about 13 million words, as well as the composition of the third and sixth graders in elementary schools in China Do the same analysis. According to the test, the regression equation obtained from modern literary works is significantly different from the regression equation of two grade students ’composition, and the regression equation of students’ composition in two grades is also significantly different. This difference shows that different groups have different levels of writing. Finally, the paper uses the differences of the distribution of the different groups in the essay as indicators, which is converted into the score of the essay. The result shows that modern literary works are superior to sixth-grade compositions, while sixth-grade compositions are superior to third-grade compositions. This article is an attempt to solve essay machine scoring problems using econometric methods.