论文部分内容阅读
随着互联网技术的快速发展和移动设备的普及,我们每时每刻都被各种各样的信息包围着.如何从海量的数据中挖掘出具有价值的信息一直是国内外研究的热点.其中,关系抽取是信息抽取的一个重要子任务,目的是从文本中识别出实体之间的关系,从而挖掘出文本中的结构化信息,即事实三元组.在文本中,实体重叠和关系重叠是非常普遍的现象,但是现有的联合抽取模型不能够有效地解决这类问题,因此提出一种新的联合抽取模型,将关系抽取任务看作由2个子任务实体识别和关系识别组成,并分别使用序列标注的方法和多分类方法进行识别.在联合抽取过程中,为充分挖掘文本语义信息,在模型的输入层添加词性(POS)和句法依存关系(Deprel)特征,同时为消除随着句子长度增加带来的长距离依赖问题,在模型中引入注意力机制.最后,论文在NYT数据集和WebNLG数据集上进行关系抽取实验,结果表明论文提出的模型能够有效地解决关系重叠的问题,并取得最佳抽取效果.“,”With the rapid developments of Internet technologies and popularization of Internet among daily activities,we are surrounded by all kinds of information every moment.Hence,to mine valuable information from massive data has always been a hotspot of research at home and abroad.In this environment,relationship extraction is an important subtask of information extraction,which purpose is to identify the relationship between entities from the text,so as to mine the structured information in the text,that is,fact triplet.In the text,entity overlapping and relationship overlapping are very common phenomena,but the existing joint extraction model cannot effectively solve such problems,so the paper proposes a new joint extraction model,which regards the relationship extraction task as consisting of entity recognition and relationship recognition of two subtasks.The two subtasks are identified using sequence labeling method and multi-classification method,respectively.In the joint extraction process,in order to fully mine the semantic information of the text,the part of speech(POS)and syntactic dependency(Deprel)features were added to the input layer of the model.Attention mechanism is also introduced in the model,which can eliminate the problem of long-distance dependence as sentence length increases.Finally,the paper conducts relationship extraction experiments on the NYT dataset and the WebNLG dataset.The experimental results show that the model proposed in the paper can effectively solve the problem of overlapping relationships and obtain the best extraction effect.