Non-independent Term Selection for Chinese Text Categorization

来源 :清华大学学报(英文版) | 被引量 : 0次 | 上传用户:sven321
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams),which results in very slow training and working of modern high-performance classifiers.This study assumes that this high-dimensionality problem is related to the redundancy in the term set,which cannot be solved by traditional term selection methods.A greedy algorithm framework named "non-independent term selection" is presented,which reduces the redundancy according to string-level correlations.Several preliminary implementations of this idea are demonstrated.Experiment results show that a good tradeoff can be reached between the performance and the size of the term set.
其他文献
The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved AGO algorithm for
To deal with hidden nodes in ad hoc network, we take throughput as the metric to evaluate the perfonnance of network. Firsdy, we modeled the MAC layer of ad hoc
The definition ofJ-integral of interfacial crack was introduced.The three.point bending tests were carried out to obtain the criticaI lOading values when the in
The construction of B(u)chi automata from linear temporal logic is a significant step in model checking.This paper presents a depth-first construction algorithm
To get the satisfying performance of a PID controller, this paper presents a novel Pareto - based multi-objective genetic algorithm ( MOGA), which can be used t
In order to study how welding parameters affect welding quality and droplet transfer, a synchronous acquisition and analysis system is established to acquire an
武汉城市圈已被国务院批准为全国资源节约型、环境友好型社会建设综合配套改革试验区.阐述了武汉城市圈中的武鄂黄城市发展带的发展脉络,分析了发展和扩张过程中出现的问题,
The passive acoustic locating technology is widely used in military fields.The traditional locating method with single array has low precision of distance estim
在分析已建成大学校园特点的基础上.剖析了现阶段我国大学校园改扩建规划中的有关问题,提出了校园改扩建规划的指导思想和一般做法,并从指标体系、规划管理、公众参与三个方
A new training symbol weighted by pseudo-noise(PN) sequence is designed and an efficient timing and fre quency offset estimation scheme for orthogonal frequency