论文部分内容阅读
为了更好地利用CERN数据管理与信息共享系统技术平台为广大科研人员提供CERN生态学数据资源服务,CERN需要不断完善平台性能,其中包括提高用户搜索CERN数据资源的效率和可靠性。本文分析了导航式搜索、主题式搜索、关键词搜索等三种不同检索方式的优缺点,着重讨论了在关键词搜索方式中,如何引入叙词表的技术来提高检索结果的查全率、查准率和响应速度。本文介绍了叙词表的概念与CERN生态学叙词表的构建方法,以及如何将开源的叙词表管理系统TemaTres进行汉化,包括关键词浏览功能、关键词扩展功能、关键词自动填完功能、利用扩展后的关键词去搜索CERN生态学数据资源元数据功能的汉化实现过程。通过建设并运行TemaTres汉化版叙词表管理信息系统,增强了CERN生态学元数据中关键词编撰的可控性和规范性,并且在CERN数据资源元数据检索中引入了关键词之间的某些简单的语义关系,比如等级关系、等同关系(即同义词)、相关关系,从而改善了搜索效率,同时为下一步构建生态学本体打下良好基础。
In order to make better use of the CERN data management and information sharing system technology platform to provide researchers with CERN ecological data resources services, CERN needs to constantly improve the platform performance, which includes improving the efficiency and reliability of users searching CERN data resources. This paper analyzes the advantages and disadvantages of three different search methods, such as navigational search, thematic search and keyword search, and focuses on how to introduce thesauri technology in the keyword search method to improve the recall rate of search results, Accuracy and response speed. This paper introduces the concept of thesaurus and the construction of CERN’s ecological thesaurus, and how to translate the open source thesaurus management system TemaTres into Chinese, including keyword browsing, keyword expansion, keyword auto-fill function , Use the expanded keywords to search for CERN ecological data resources metadata function of the localization process. By constructing and running TemaTres Chinese version of thesauri management information system, the controllability and normativity of key words in CERN ecological metadata are enhanced, and some key words in the CERN data source metadata retrieval are introduced Simple semantic relations, such as hierarchical relationships, equivalence relations (ie, synonyms), and related relationships improve search efficiency and provide a good foundation for building ecological ontologies in the future.