论文部分内容阅读
报纸文献主题标引、分类标引和命名实体抽取是其内容深加工的主要形式,基于知识库的自动标引是报纸文献标引自动化的一种实现方式。在报纸文献自动标引研究现状基础上提炼出报纸文献自动标引一般流程,提出知识库建设是其实现自动标引的前提。结合报纸文献标引的特点,提出报纸文献标引用知识库应由主题标引库、分类知识库和实体标引库三部分多个词表组成,具有多词表融合、规模大、可扩充、简单易行等特点。同时,就知识库构建中的主题规范表、分类主题对照表和命名实体抽取规则库建设等关键技术进行阐述。
The subject indexing, classification indexing and named entity extraction of newspaper documents are the main forms of deep processing of content. Automatic indexing based on knowledge base is one way of automating newspaper document indexing. Based on the current research status of newspaper document automatic indexing, this paper extracts the general process of automatic indexing of newspaper documents, and puts forward that the construction of knowledge base is the premise of automatic indexing. Combined with the characteristics of newspaper literature index, it is proposed that the newspaper reference database should be composed of three parts: the subject index library, the categorical knowledge base and the entity index library, with the integration of multi-word lists, large scale, scalable, Simple and so on. At the same time, the key technologies such as topic specification table, classification topic comparison table and named entity extraction rule base in the construction of knowledge base are elaborated.