Enhancing Chinese Word Embeddings from Relevant Derivative Meanings of Main-Components in Characters

来源 :第十八届中国计算语言学大会暨中国中文信息学会2019学术年会 | 被引量 : 0次 | 上传用户:oskarguan
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Word embeddings have a significant impact on natural lan-guage processing.In morpheme writing systems,most Chinese word em-beddings take a word as the basic unit,or directly use the internal structure of words.However,these models still neglect the rich rele-vant derivative meanings in the internal structure of Chinese charac-ters.Based on our observations,the relevant derivative meanings of the main-components in Chinese characters are very helpful for improving Chinese word embeddings learning.In this paper,we focus on employing the relevant derivative meanings of the main-components in the Chinese characters to train and enhance the Chinese word embeddings.To this end,we propose two main-component enhanced word embedding models named MCWE-SA and MCWE-HA respectively,which incorporate the relevant derivative meanings of the main-components during the training process based on the attention mechanism.Our models can fine-grained enhance the precision of word embeddings without generating additional vectors.Experiments on word similarity and syntactic analogy tasks are conducted to validate the feasibility of our models.Furthermore,the re-sults show that our models have a certain improvement in the similarity task over most baselines,and have nearly 3%improvement in Chinese analogical reasoning dataset compared with the state-of-the-art model.
其他文献
Hashtag recommendation aims to recommend hashtags when social media users show the intention to insert a hashtag by typing in the hashtag symbol “#” while writing a microblog.Previous methods usually
Distant supervision is an effective way to collect large-scale training data for relation extraction.To better solve the wrong labeling problem accompanied by distant supervision,some methods have bee
会议
性别偏见是社会学研究的热点.近年来,机器学习算法从数据中学到偏见使之得到更广泛的关注,但目前尚无基于语料库的方法对文本数据中职业性别偏见的研究.该文基于标记理论,利用BCC和DCC语料库,从共时和历时两个层面考察了63个职业的性别无意识偏见现象.首先,以调查问卷的形式调研了不同性别和不同年龄段的人群对63个职业的性别倾向,发现和BCC语料库中多领域的职业性别偏见度呈显著的正相关.然后从共时的角度,
Aspect-based sentiment analysis(ABSA)aims at identifying sentiment polarities towards aspect in a sentence.Attention mechanism has played an important role in previous state-of-the-art neural models.H
This present study aims to investigate the colligational structures in China English.A corpus-based and comparative methodology was adopted in which three verbs of communication(discuss,communicate an
Answer selection(AS)is an important subtask of question answering(QA)that aims to choose the most suitable answer from a list of candidate an-swers.Existing AS models usually explored the single-scale
In recent years,machine reading comprehension is becoming a more and more popular research topic.Promising results were obtained when the machine reading comprehension task had only two inputs,context
Most of the current man-machine dialogues are at the two end-points of a spectrum of dialogues,i.e.goal-driven dialogues and non goal-driven chitchats.Document-driven dialogues provide a bridge betwee
Natural language inference(NLI)is a challenging task to determine the relationship between a pair of sentences.Existing Neural Network-based(NN-based)models have achieved prominent success.However,rar
In this paper,we present a neural model to map structured table into document-scale descriptive texts.Most existing neural net-work based approaches encode a table record-by-record and generate long s