论文部分内容阅读
The long-standing automobile e-commerce websites in China have ac-cumulated huge amounts of auto reviews,and extracting keyphrases of these re-views can assist researchers and practitioners in obtaining online users typical opinions and acquiring their underlying motivations.However,there havent ex-isted any relevant text corpora so far.In this paper,the authors propose a semi-unsupervised scheme to construct a comprehensive auto-keyphrases corpus from online collected reviews in Chinese automobile e-commerce websites by Position Rank,which performs very well in keyphrases extraction from texts in the sce-nario of scarce labeled data.The iterative annotation process consists of three-round labeling and two-round corrections.During the process of the three-round unsupervised labeling,the computing model will extract seven most important words as the keyphrases of the whole paragraph.Between each labeling phase,there are manual check,correction,re-check and arbitration stages,in which the previous labeling errors are corrected and new vocabulary and rules are summa-rized up to further improve the unsupervised model.For comparison,the paper runs the experiments using another two unsupervised approaches: TF-IDF and Text Rank,the experimental results also show that Position Rank is a more effi-cient and effective method for keyphrases extraction.By the time this paper was written,the auto-keyphrases corpus had contained 110,023 entries,and there are still much room for improvement in corpus volume and labeling quality.