A Comprehensive Verification of Transformer in Text Classification

来源 :第十八届中国计算语言学大会暨中国中文信息学会2019学术年会 | 被引量 : 0次 | 上传用户:lzj668
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Recently,a self-attention based model,named Transformer,is pro-posed in Neural Machine Translation(NMT)domain,and outperforms the RNNs based seq2seq model in most cases,hence it becomes the state-of-the-art model for NMT task.However,some studies find that the RNNs based model integrated with the Transformer structures could achieve almost the same experiment effect as the Transformer on the NMT task.In this paper,following the previous re-searches,we intend to further verify the performance of Transformer structures on the text classification task.Based on RNNs model,we gradually add each part of the Transformer block and evaluate their influence on the text classification task.We carry out the experiments on NLPCC2014 and dmsc_v2 datasets,and the experiment results show that multi-head attention mechanism and multiple attention layers could improve the performance of the model on the text classifi-cation task.Furthermore,the visualization of the attention weights also illustrates that multi-head attention outperforms the traditional attention mechanism.
其他文献
Answer selection(AS)is an important subtask of question answering(QA)that aims to choose the most suitable answer from a list of candidate an-swers.Existing AS models usually explored the single-scale
In recent years,machine reading comprehension is becoming a more and more popular research topic.Promising results were obtained when the machine reading comprehension task had only two inputs,context
Most of the current man-machine dialogues are at the two end-points of a spectrum of dialogues,i.e.goal-driven dialogues and non goal-driven chitchats.Document-driven dialogues provide a bridge betwee
Natural language inference(NLI)is a challenging task to determine the relationship between a pair of sentences.Existing Neural Network-based(NN-based)models have achieved prominent success.However,rar
In this paper,we present a neural model to map structured table into document-scale descriptive texts.Most existing neural net-work based approaches encode a table record-by-record and generate long s
Word embeddings have a significant impact on natural lan-guage processing.In morpheme writing systems,most Chinese word em-beddings take a word as the basic unit,or directly use the internal structure
Dropped pronoun recovery,which aims to detect the type of pronoun dropped before each token,plays a vital role in many applications such as Machine Translation and Information Extraction.Recently,deep
Distant supervision for relation extraction has been widely used to construct training set by aligning the triples of the knowledge base,which is an efficient method to reduce human efforts.However,th
In relation extraction,directly adopting a model trained in the source domain to the target domain will suffer greatly performance decrease.Existing studies extract the shared features between domains
Language model pre-training has proven to be useful in learning universal language representations.As a state-of-the-art language model pre-training model,BERT(Bidirectional Encoder Representations fr