论文部分内容阅读
随着Internet的发展,互联网上的学术文献数量呈指数增长,很难为科研工作者所利用,因此亟需一种方法对海量的网络学术文献进行自动的搜集、整理、分类。在前期充分的实验论证后,设计实现一个海量网络学术文献自动分类系统,该系统使用模块化设计,包括学术文献自动抓取模块、学术文献词-文档矩阵处理模块、本体集成模块以及基于语义驱动的分类模块。实验证明,该系统可以有效地完成海量学术文献的自动抓取、处理和分类工作。
With the development of the Internet, the number of academic documents on the Internet increases exponentially and it is difficult for scientific researchers to make use of it. Therefore, there is an urgent need for a method to automatically collect, sort and classify vast amounts of online academic documents. After a sufficient period of experiment and demonstration, a massive network academic document automatic classification system is designed and implemented. The system uses modular design including academic document automatic retrieval module, academic document word - document matrix processing module, ontology integration module and semantic driven Classification module. Experiments show that this system can effectively accomplish the automatic retrieval, processing and classification of massive academic documents.