论文部分内容阅读
词表是图书馆和信息检索领域最重要的知识组织工具,《中国分类主题词表》是传统词表的一种,它的更新和维护一直依靠手工进行,这制约了它在数字图书馆和网络信息环境下的应用。本文介绍了一项基于统计的、从元数据的标题中抽取关键词并定位在词表中的方法。大致包括三个步骤:从标题中提取关键词;确定抽取出的关键词的专指度;将专指度高的专业词汇定位在词表中。在《中国分类主题词表》和上海图书馆提供的计算机科技领域的元数据上所进行实验,结果证明该方法是可行的。这一方法可以应用到自动标引或编目中,有一定的实用性和广阔的应用前景。
Vocabulary is the most important knowledge organization tool in the field of library and information retrieval. The “China Classified Thesaurus” is a kind of traditional vocabularies. Its updating and maintaining has always been done by hand, which has restricted its use in digital libraries and Network information environment application. This article presents a statistical-based way to extract keywords from metadata headings and locate them in the vocabulary. Roughly includes three steps: extracting keywords from the title; determining the specific degree of the extracted keywords; locating the specialized vocabulary with a high degree of specialization in the vocabulary. Experiments on Metadata in the China Classification Thesaurus and computer science and technology provided by the Shanghai Library prove that this method is feasible. This method can be applied to automatic indexing or cataloging, and has certain practicality and broad application prospect.