论文部分内容阅读
生物实体名识别对生物医学文献的信息抽取有重要的意义。本文针对如何识别蛋白质名进行了有益的尝试,主要采用了基于词典的方法,其中运用了近似搭配算法和首词查询的方法进行蛋白质名识别,同时结合机器学习方法训练了一个分类器来过滤候选词以提高识别的准确率。
Identification of biological entities is of great significance to the information extraction of biomedical literature. In this paper, a useful attempt is made to identify protein names. The lexicon-based method is mainly used to identify protein names by using the collocation algorithm and the first word query, and a classifier is trained in combination with the machine learning method to filter the candidate Words to improve the recognition accuracy.