Learning Multilingual Sentence Embeddings from Monolingual Corpus

来源 :第十八届中国计算语言学大会暨中国中文信息学会2019学术年会 | 被引量 : 0次 | 上传用户：lk123ad

【摘要】

：

【作者】

：

Shuai Wang Lei Hou Juanzi Li Meihan Tong Jiabo Jiang

【机构】

：

DCST,Tsinghua University,Beijing 100084,China;KIRC,Institute for Artificial Intelligence,Tsinghua Un

【出处】

：

第十八届中国计算语言学大会暨中国中文信息学会2019学术年会

【发表日期】

：

2019年8期

【关键词】

：

Sentence Representation Multilingual Unsupervised Learning

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

　　Learning multi-lingual sentence embeddings usually requires large scale of parallel sentences which are difficult to obtain.We propose a novel self-learning approach which is capable of learning multi-lingual sentence embeddings from monolingual corpora.Our assumption is that,irrelevant to languages,sentences appearing in similar contexts are simi-lar.Thus,we first train monolingual sentence embeddings of different languages with shared parameters as initialization.Then we iteratively extract similar sentence pairs and exchange their positions regardless of languages.Through their relations to their new contexts we predict the similarities between a similar sentence pair.Our experiments show that the proposed approach outperforms existing unsupervised approaches and is competitive to supervised approaches.

其他文献

朱集西矿下保护层开采条件下上覆煤层巷道围岩控制技术研究

学位

大南湖五号井弱胶结地层大断面岩巷围岩控制技术研究

学位

Explore Entity Embedding Effectiveness in Entity Retrieval

This paper explores entity embedding effectiveness in ad-hoc entity retrieval,which introduces distributed representation of entities into entity retrieval.The knowledge graph contains lots of knowled

会议

Entity retrievalEntity embeddingKnowledge Graph

Research for Tibetan-Chinese Name Transliteration Based on Multi-granularity

In order to solve the problem of data sparseness caused by less training corpus in Tibetan-Chinese transliteration,this paper ana-lyzes the alignment granularity of Tibetan-Chinese names as the resear

会议

TransliterationSegmentation GranularityTibetan-Chinese

How Important is POS to Dependency Parsing?Joint POS Tagging and Dependency Parsing Neural Networks

It is widely accepted that part-of-speech(POS)tagging and dependency parsing are highly related.Most state-of-the-art dependency parsing methods still rely on the results of POS tagging,though the tag

会议

Dependency parsingPart-of-speech taggingJoint learning

Pinyin as a feature of neural machine translation for Chinese speech recognition error correction

Text correction after automatic speech recognition(ASR)is an im-portant method to improve the speech recognition system.We regard the speech error correction as a translation task—from the language of

会议

Automatic Speech RecognitionNeural Machine TranslationAttention MechanismPiny

Title-Aware Neural News Topic Prediction

Online news platforms have gained huge popularity for online news reading.The topic categories of news are very important for these platforms to target user interests and make personalized recommendat

会议

News Topic PredictionMulti-view LearningAttention Mechanism

Sharing Pre-trained BERT Decoder for a Hybrid Summarization

Sentence selection and summary generation are two main steps to generate informative and readable summaries.However,most previous works treat them as two separated subtasks.In this paper,we propose a

会议

Text SummarizationExtractive and AbstractivePretrained Based

浅埋煤层群开采出上覆煤柱载荷传递机制研究

学位

基于电化学技术的矿区道路复合抑尘剂的制备与性能研究

学位

Learning Multilingual Sentence Embeddings from Monolingual Corpus

与本文相关的学术论文