论文部分内容阅读
【目的】改善基于海量数据的科技评价中的数据可靠性问题,克服相似度匹配或者频率统计方法在机构名称规范化方面存在的缺陷。【方法】提出基于低词面相似度的机构名称映射算法,该算法采用规则和统计相结合的策略实现多个机构名称到一个机构实体的映射,从而达到机构名规范化的目的。【结果】实验结果表明,基于规则的算法的F值平均为55.50%,高于其他两种技术策略。【局限】对低词面相似度机构名识别存在不足。【结论】在机构名规范方面的综合表现要优于其他两种技术策略,但在检全率方面还需要改进。
【Objective】 To improve the reliability of data in science and technology evaluation based on mass data and to overcome the shortcomings of similarity matching or frequency statistics in the standardization of organization name. 【Method】 A mechanism name mapping algorithm based on low-word similarity is proposed. This algorithm uses the combination of rules and statistics to realize the mapping of multiple institution names to one institutional entity, so as to achieve the goal of institutional name normalization. 【Result】 The experimental results show that the rule-based algorithm has an average F value of 55.50%, which is higher than the other two technical strategies. [Limitations] There is a deficiency in the identification of low-context similarity organization names. [Conclusion] The overall performance in the name of the organization is superior to the other two technical strategies, but improvement in the check-up rate is also needed.