A Semi-Structured Document Model for Text Mining

来源 :计算机科学技术学报(英文版) | 被引量 : 0次 | 上传用户：zxpmine01

【摘要】

：

A semi-structured document has more structured information compared to anordinary document, and the relation among semi-structured documents can be fully utiliz

【作者】

：

杨建武陈晓鸥

【机构】

：

National Key Laboratory for Text Processing, Institute of Computer Science and Technology Peking Uni

【出处】

：

计算机科学技术学报(英文版)

【发表日期】

：

2002年5期

【关键词】

：

semi-structured document XML text mining vector space model structuredlink vecto

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

A semi-structured document has more structured information compared to anordinary document, and the relation among semi-structured documents can be fully utilized. Inorder to take advantage of the structure and link information in a semi-structured document forbetter mining, a structured link vector model (SLVM) is presented in this paper, where a vectorrepresents a document, and vectors’ elements are determined by terms, document structure andneighboring documents. Text mining based on SLVM is described in the procedure of K-meansfor briefness and clarity: calculating document similarity and calculating cluster center. Theclustering based on SLVM performs significantly better than that based on a conventional vectorspace model in the experiments, and its F value increases from 0.65-0.73 to 0.82-0.86.

其他文献

经桡动脉行冠状动脉造影的疗效及安全性研究

目的探讨经桡动脉行冠状动脉介入治疗的疗效及安全性.方法我院138例拟行冠状动脉造影的患者随机分为经桡动脉组70例,经股动脉组68例,分别给予经桡动脉冠状动脉造影和经股动

期刊

经桡动脉途径经股动脉途径冠状血管造影术

让数学概念的学习有“深度”

数学概念是客观对象的数量关系和空间形式的本质属性的反映,是思维的“细胞”.数学概念的“深度”学习,首先要找准起点,从“前概念”处引入,帮助学生建立起已有知识经验和抽

期刊

短文本情感分析的研究现状——从社交媒体到资源稀缺语言

期刊

央视曝光国Ⅳ重卡造假江淮和东风被点名

5月12日,中央电视台财经频道报道市场存在国Ⅳ重卡造假现象,很多经销商采用国Ⅲ标准的重卡,甚至国Ⅱ标准的车辆冒充国Ⅳ车辆销售。江淮汽车、东风汽车在报道中被点名存在国Ⅳ

期刊

曝光造假现象排放标准车辆经销商中央电视台要求严格江淮汽车东风汽车地区比较财经频道新闻销售销路市场牌照登记

有味诗书苦后甜——语文教学中带领学生入情的点滴收获

情感教育作为教育的重要手段,已引起人们的极大重视。现代教学论认为:“情感的培养是语文教学的重要任务,同时它更是提高语文教学效率的重要手段。”在日常的教育实践中,我时

期刊

子宫颈病变3 469例临床分析

目的综合评价阴道巴氏涂片、新柏氏TCT(液基细胞薄片制备)、阴道镜检、LEEP(宫颈线圈电切)在宫颈病变中的应用价值.方法回顾性研究门诊阴道巴氏涂片1 500例,新柏氏TCT检查1 9

期刊

子宫颈病变TCT阴道镜检LEEP

莫让"合作"流于形式 ——对初中历史合作学习的有效教学探究

本文通过对荣华二采区10

期刊

柳州将投入七十亿元开启“智慧城市”时代

可以用于乘坐公交、出租车，还可以用于购物消费，甚至可以就医养老……对龙城的百姓来说，一卡在手即能享受涵盖衣、食、住、行美好生活的梦想即将实现。记者从市工信委了解到，作为

期刊

柳州智慧城市一卡通美好生活互联互通市民卡出租车养老消费时代启动龙城就医惠民购物公交发卡

糖尿病伴发脑梗死复发影响因素的临床分析

目的探讨糖尿病伴发急性脑梗死患者反复发作的相关因素.方法回顾279例急性脑梗死的患者临床资料,对其中62例糖尿病伴发脑梗死反复发作的临床资料进行分析.结果糖尿病伴发脑梗

期刊

糖尿病脑梗死

《住在衣服里的人》

请下载后查看，本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.

期刊

A Semi-Structured Document Model for Text Mining

与本文相关的学术论文