论文部分内容阅读
本文提出了一种根据汉字统计特性和基于实例映射的中文文本自动分类模型。该模型采用汉字字频向量作为文本的表示方法。它的显著特点是引入线性最小二乘方拟合(LinearLeastSquareFit,LLSF)技术建立文本分类器模型,通过对训练集语料的手工分类标引以及对文本和类别间的相关性判定的学习,实现了基于全局最小错误率的汉字———类别两个向量空间的映射函数,并用该函数对测试文本进行分类。
This paper presents an automatic classification model of Chinese texts based on statistical characteristics of Chinese characters and instance-based mapping. The model uses the Chinese word frequency vector as the text representation. Its distinctive feature is the introduction of linear least square fitting (Linear Least SquareFit, LLSF) technology to build a text classifier model. Through the manual classification and classification of the training set corpus and the correlation between the text and the classification of learning to achieve Based on the global minimum error rate of Chinese characters --- Category two vector space mapping function, and use the function of the test text classification.