Non-independent Term Selection for Chinese Text Categorization

来源 :清华大学学报（英文版） | 被引量 : 0次 | 上传用户：sven321

【摘要】

：

Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams),which results in very slow

【作者】

：

LI Jingyang SUN Maosong

【机构】

：

Department of Computer Science and Technology

【出处】

：

清华大学学报（英文版）

【发表日期】

：

2004年期

【关键词】

：

Chinese text categorization term selection dimentionality

【基金项目】

：

国家高技术研究发展计划(863计划);国家自然科学基金

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams),which results in very slow training and working of modern high-performance classifiers.This study assumes that this high-dimensionality problem is related to the redundancy in the term set,which cannot be solved by traditional term selection methods.A greedy algorithm framework named "non-independent term selection" is presented,which reduces the redundancy according to string-level correlations.Several preliminary implementations of this idea are demonstrated.Experiment results show that a good tradeoff can be reached between the performance and the size of the term set.

其他文献

Application of ACO algorithm in protein structure prediction

The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved AGO algorithm for

期刊

protein structure predictionHP lattice medelACO algorithm

The impact of hidden node on model of ad hoc network

To deal with hidden nodes in ad hoc network, we take throughput as the metric to evaluate the perfonnance of network. Firsdy, we modeled the MAC layer of ad hoc

期刊

ad hoc networks802.11 DCFmodelthroughput

J-Integral of Interfacial Crack Between Metal-Base Ceramic Coating and Steel

The definition ofJ-integral of interfacial crack was introduced.The three.point bending tests were carried out to obtain the criticaI lOading values when the in

期刊

interfaciai crackJ-integralcoatingthree-point bending testFEA

Efficient Translation of LTL to B chi Automata

The construction of B(u)chi automata from linear temporal logic is a significant step in model checking.This paper presents a depth-first construction algorithm

期刊

linear temporal logicform-filling algorithmB(u)chi automatastate-based B(u)ch

Multi-objective optimization based on Genetic Algorithm for PID controller tuning

To get the satisfying performance of a PID controller, this paper presents a novel Pareto - based multi-objective genetic algorithm ( MOGA), which can be used t

期刊

malti-objective optimizationgenetic algorithmsPID controller

Analysis of droplet transfer of pulsed MIG welding based on electrical signal and high-speed photogr

In order to study how welding parameters affect welding quality and droplet transfer, a synchronous acquisition and analysis system is established to acquire an

期刊

pulsed MIG weldingdroplet transferhigh-speed photographyimage processingwave

武汉城市圈武鄂黄城市发展带资源空间配置研究

武汉城市圈已被国务院批准为全国资源节约型、环境友好型社会建设综合配套改革试验区.阐述了武汉城市圈中的武鄂黄城市发展带的发展脉络,分析了发展和扩张过程中出现的问题,

期刊

武汉城市圈资源空间配置

Research on Acoustic Localization Algorithm Based on Dual Four-Element Arrays

The passive acoustic locating technology is widely used in military fields.The traditional locating method with single array has low precision of distance estim

期刊

acoustic detectionacoustic passive localizationarrayerror

对大学校园改扩建规划的思考与建议

在分析已建成大学校园特点的基础上.剖析了现阶段我国大学校园改扩建规划中的有关问题,提出了校园改扩建规划的指导思想和一般做法,并从指标体系、规划管理、公众参与三个方

期刊

大学校园改扩建校园规划

Efficient Timing and Frequency Offset Estimation Scheme for OFDM Systems

A new training symbol weighted by pseudo-noise(PN) sequence is designed and an efficient timing and fre quency offset estimation scheme for orthogonal frequency

期刊

orthogonal frequency division multiplexing(OFDM)techniquetime synchronizationf

Non-independent Term Selection for Chinese Text Categorization

与本文相关的学术论文