论文部分内容阅读
文章在概念层面上将标签分为事实型、主观型和个人化3类,根据相关元数据构建词表,并根据标签在用户生成内容中的句法构成制定识别规则,结合二者将标签进行分类。以中国最大的电影标注系统豆瓣网675351位用户的标签数据为例进行实验,实验的召回率为95.01%、准确率为96.19%、F1-measure为95.32%,结果表明这种方法可以较好地实现标签自动分类工作。
The article divides the tags into three categories: factual, subjective and personal on the conceptual level, constructs the vocabulary according to the relevant metadata, formulates the recognition rules according to the syntactic structure of the tags in the user generated content, and combines the two to classify the tags . Taking the tag data of 675,351 users in China’s largest film annotation system, Douban, as an example, the experiment was conducted with the recall rate of 95.01%, the accuracy rate of 96.19% and the F1-measure of 95.32%. The results show that this method can be better Label automatic classification work.