论文部分内容阅读
利用大规模语料库对语体参数进行计量对比是当代语体研究的一个趋势。本文通过选取语料并进行过滤或提取,构建大型语体语料库,对句长、句式、破碎度、话语标记、词性这几个语体参数进行了计量和对比,说明并不是所有的参数都适合用于语体分类;词语仍然是语体区分的重要特征;可以用话语标记对文本从语体角度进行聚类。
The use of large-scale corpus to measure and compare the style parameters is a trend of contemporary style research. In this paper, the corpus is constructed by selecting the corpus and filtering or extracting. The sentence length, sentence pattern, fragmentation, discourse markers and part of speech are measured and compared, which shows that not all the parameters are suitable For the classification of the genre; the words are still the important distinguishing features of the genre; the discourse markers can be used to cluster the texts from the perspective of the genre.