Learning Multilingual Sentence Embeddings from Monolingual Corpus

来源 :第十八届中国计算语言学大会暨中国中文信息学会2019学术年会 | 被引量 : 0次 | 上传用户:lk123ad
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Learning multi-lingual sentence embeddings usually requires large scale of parallel sentences which are difficult to obtain.We propose a novel self-learning approach which is capable of learning multi-lingual sentence embeddings from monolingual corpora.Our assumption is that,irrelevant to languages,sentences appearing in similar contexts are simi-lar.Thus,we first train monolingual sentence embeddings of different languages with shared parameters as initialization.Then we iteratively extract similar sentence pairs and exchange their positions regardless of languages.Through their relations to their new contexts we predict the similarities between a similar sentence pair.Our experiments show that the proposed approach outperforms existing unsupervised approaches and is competitive to supervised approaches.
其他文献
学位
学位
This paper explores entity embedding effectiveness in ad-hoc entity retrieval,which introduces distributed representation of entities into entity retrieval.The knowledge graph contains lots of knowled
In order to solve the problem of data sparseness caused by less training corpus in Tibetan-Chinese transliteration,this paper ana-lyzes the alignment granularity of Tibetan-Chinese names as the resear
It is widely accepted that part-of-speech(POS)tagging and dependency parsing are highly related.Most state-of-the-art dependency parsing methods still rely on the results of POS tagging,though the tag
Text correction after automatic speech recognition(ASR)is an im-portant method to improve the speech recognition system.We regard the speech error correction as a translation task—from the language of
Online news platforms have gained huge popularity for online news reading.The topic categories of news are very important for these platforms to target user interests and make personalized recommendat
Sentence selection and summary generation are two main steps to generate informative and readable summaries.However,most previous works treat them as two separated subtasks.In this paper,we propose a
学位
学位