On the Semi-unsupervised Construction of Auto-Keyphrases Corpus from Large-scale Chinese Automobile

来源 :第十八届中国计算语言学大会暨中国中文信息学会2019学术年会 | 被引量 : 0次 | 上传用户:eric7272
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  The long-standing automobile e-commerce websites in China have ac-cumulated huge amounts of auto reviews,and extracting keyphrases of these re-views can assist researchers and practitioners in obtaining online users typical opinions and acquiring their underlying motivations.However,there havent ex-isted any relevant text corpora so far.In this paper,the authors propose a semi-unsupervised scheme to construct a comprehensive auto-keyphrases corpus from online collected reviews in Chinese automobile e-commerce websites by Position Rank,which performs very well in keyphrases extraction from texts in the sce-nario of scarce labeled data.The iterative annotation process consists of three-round labeling and two-round corrections.During the process of the three-round unsupervised labeling,the computing model will extract seven most important words as the keyphrases of the whole paragraph.Between each labeling phase,there are manual check,correction,re-check and arbitration stages,in which the previous labeling errors are corrected and new vocabulary and rules are summa-rized up to further improve the unsupervised model.For comparison,the paper runs the experiments using another two unsupervised approaches: TF-IDF and Text Rank,the experimental results also show that Position Rank is a more effi-cient and effective method for keyphrases extraction.By the time this paper was written,the auto-keyphrases corpus had contained 110,023 entries,and there are still much room for improvement in corpus volume and labeling quality.
其他文献
学位
学位
Learning multi-lingual sentence embeddings usually requires large scale of parallel sentences which are difficult to obtain.We propose a novel self-learning approach which is capable of learning multi
学位
Online news platforms have attracted massive users to read digital news online.The demographic information of these users such as gender is critical for these platforms to provide personalized service
The Chinese Semantic Dependency Graph(CSDG)Parsing reveals the deep and fine-grained semantic relationship of Chinese sentences,and the parsing results have a great help to the downstream NLP tasks.Ho
Event detection(ED)task aims to automatically identify trigger words from unstructured text.In recent years,neural models with attention mechanism have achieved great success on this task.However,exis
学位
学位
Term translation of Chinese historical classics is very difficult and time-consuming work,and using term alignment methods to extract term translation pairs is of great help for historical term transl