论文部分内容阅读
关键短语提取技术在自动文摘、文本分类、聚类以及信息检索等应用中有至关重要的作用。主要技术流程包括候选短语识别、特征工程、构建机器学习模型或加权提取3个步骤。现有的研究中提出了许多改进的思路,但实证表明抽取算法的准确率和召回率仍然较低。本文通过对具有代表性的技术和研究回顾,对关键短语的词性模式进行了数据分析,讨论了提取流程中存在的问题,为自动化提取高质量的关键短语提供借鉴。
The key phrase extraction technology plays an important role in such applications as automatic summarization, text classification, clustering and information retrieval. The main technical process includes candidate phrase recognition, feature engineering, constructing a machine learning model or weighted extraction of three steps. However, many improvements have been proposed in the existing research, but the empirical results show that the accuracy and recall rate of the extraction algorithm are still low. This article analyzes the part-of-speech patterns of key phrases through representative technologies and research reviews, discusses the problems in the extraction process, and provides references for the automated extraction of high-quality key phrases.