论文部分内容阅读
Chinese named entity recognition (CNER) aims to identify entity names such as person names and organization names from Chinese raw text and thus can quickly extract the entity information that people are concerned about from large-scale texts.Recent studies attempt to improve performance by integrating lexicon words into char-based CNER models.These existing studies,however,usually focus on leveraging the context-free words in lexicon without considering the contextual information of words and subwords in the sentences.To address this issue,in addition to utilizing the lexicon words,we further propose to construct a hierarchical tree structure representation composed of characters,subwords and context-aware predicted words from segmentor to represent each sentence for CNER.Based on the tree-structure representation,we propose a hierarchical long short-term memory (HiLSTM) framework,which consists of hierarchical encoding layer,fusion layer and CRF layer,to capture linguistic knowledge at different levels.On the one hand,the interactions within each level help to obtain the contextual information.On the other hand,the propagations from the lower-levels to the upper-levels can provide additional semantic knowledge for CNER.Experimental results on three widely used CNER datasets show that our proposed HiLSTM model achieves significant improvement over several strong benchmark methods.