Multilingual Multi-document Summarization with Enhanced hLDA Features

来源 :第十五届全国计算语言学学术会议(CCL2016)暨第四届基于自然标注大数据的自然语言处理国际学术研讨会(NLP-NABD | 被引量 : 0次 | 上传用户:jjjdddlll
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  This paper presents the state of art research progress on multilingual multi-document summarization.Our method utilizes hLDA(hierarchical Latent Dirichlet Allocation)algorithm to model the documents firstly.A new feature is proposed from the hLDA modeling results,which can reflect semantic information to some extent.Then it combines this new feature with different other features to perform sentence scoring.According to the results of sentence score,it extracts candidate summary sentences from the documents to generate a summary.We have also attempted to verify the effectiveness and robustness of the new feature through experiments.After the comparison with other summarization methods,our method reveals better performance in some respects.
其他文献
本文介绍热转印聚酯粉末涂料配方的研究,通过转印效果影响因素的对比,对不同厂家热转印聚酯树脂进行筛选,通过对树脂DSC固化行为进行讨论和分析,选择具有高固化率和高流平性
防腐粉末涂料在钢制管道上的涂装已经得到广泛的使用,但在波纹钢管和板材上的应用还在少数.波纹钢管或者波纹板主要应用于涵洞与桥梁的建筑.波纹钢管(板)通过涂装了防腐粉末
  In this paper,a novel image tag recommendation framework is developed by fusing the deep multimodal feature representation and cross-modal correlation minin
会议
  The problem of automatically labelling the appearances of characters in video with their names is challenging due to the huge variation in the appearance of
会议
  Neural network based Chinese Word Segmentation(CWS)approaches can bypass the burdensome feature engineering comparing with the conventional ones.All previou
会议
  Symptom entities are widely distributed in Chinese electronic medical records.Previous approaches on symptom entity extraction usually extract continuous st
会议
  The availability of labeled corpus is of great importance for emotion classification tasks.Because manual labeling is too time-consuming,hashtags have been
会议
  In this paper,we apply a bidirectional Long Short-Term Memory with a Conditional Random Field to the task of disfluency detection.Long-range dependencies is
会议
  The Chinese language is a character-based language,with no explicit separators between words like English.Traditionally,word segmentation is conducted to co
会议
  Large-scale annotated corpora are a prerequisite for developing high-performance age regression models.However,such annotated corpora are some-times very ex
会议