A Comprehensive Verification of Transformer in Text Classification

来源 :第十八届中国计算语言学大会暨中国中文信息学会2019学术年会 | 被引量 : 0次 | 上传用户：lzj668

【摘要】

：

【作者】

：

Xiuyuan Yang Liang Yang Ran Bi Hongfei Lin

【机构】

：

Dalian University of Technology,Dalian Liaoning 116023,China

【出处】

：

第十八届中国计算语言学大会暨中国中文信息学会2019学术年会

【发表日期】

：

2019年8期

【关键词】

：

RNNs Transformer Text classification

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

　　Recently,a self-attention based model,named Transformer,is pro-posed in Neural Machine Translation(NMT)domain,and outperforms the RNNs based seq2seq model in most cases,hence it becomes the state-of-the-art model for NMT task.However,some studies find that the RNNs based model integrated with the Transformer structures could achieve almost the same experiment effect as the Transformer on the NMT task.In this paper,following the previous re-searches,we intend to further verify the performance of Transformer structures on the text classification task.Based on RNNs model,we gradually add each part of the Transformer block and evaluate their influence on the text classification task.We carry out the experiments on NLPCC2014 and dmsc_v2 datasets,and the experiment results show that multi-head attention mechanism and multiple attention layers could improve the performance of the model on the text classifi-cation task.Furthermore,the visualization of the attention weights also illustrates that multi-head attention outperforms the traditional attention mechanism.

其他文献

Encoder-Decoder Network with Cross-Match Mechanism for Answer Selection

Answer selection(AS)is an important subtask of question answering(QA)that aims to choose the most suitable answer from a list of candidate an-swers.Existing AS models usually explored the single-scale

会议

Answer SelectionMulti-PerspectiveCross-Match Mechanism

Capsule Networks for Chinese Opinion Questions Machine Reading Comprehension

In recent years,machine reading comprehension is becoming a more and more popular research topic.Promising results were obtained when the machine reading comprehension task had only two inputs,context

会议

Capsule NetworksMachine Reading ComprehensionMultiway Attention

A Document Driven Dialogue Generation Model

Most of the current man-machine dialogues are at the two end-points of a spectrum of dialogues,i.e.goal-driven dialogues and non goal-driven chitchats.Document-driven dialogues provide a bridge betwee

会议

Document-driven dialogueDoc-ReaderMulti-Copy

Testing the Reasoning Power for NLI Models with Annotated Multi-perspective Entailment Dataset

Natural language inference(NLI)is a challenging task to determine the relationship between a pair of sentences.Existing Neural Network-based(NN-based)models have achieved prominent success.However,rar

会议

Natural Language InferenceMulti-perspective Entailment Category Labeling System

Table-to-Text Generation via Row-Aware Hierarchical Encoder

In this paper,we present a neural model to map structured table into document-scale descriptive texts.Most existing neural net-work based approaches encode a table record-by-record and generate long s

会议

Table-to-Text GenerationSeq2SeqHierarchical Encoder

Enhancing Chinese Word Embeddings from Relevant Derivative Meanings of Main-Components in Characters

Word embeddings have a significant impact on natural lan-guage processing.In morpheme writing systems,most Chinese word em-beddings take a word as the basic unit,or directly use the internal structure

会议

Relevant derivative meaningComponent levelEnhanced word embedding

Dropped Pronoun Recovery in Chinese Conversations with Knowledge-enriched Neural Network

Dropped pronoun recovery,which aims to detect the type of pronoun dropped before each token,plays a vital role in many applications such as Machine Translation and Information Extraction.Recently,deep

会议

Chinese Dropped PronounSemantic ModelingExternal Knowledge

Denoising Distant Supervision for Relation Extraction with Entropy Weight Method

Distant supervision for relation extraction has been widely used to construct training set by aligning the triples of the knowledge base,which is an efficient method to reduce human efforts.However,th

会议

Relation ExtractionDistant SupervisionNoise Filtering

Cross-view Adaptation Network for Cross-domain Relation Extraction

In relation extraction,directly adopting a model trained in the source domain to the target domain will suffer greatly performance decrease.Existing studies extract the shared features between domains

会议

Relation ExtractionDomain AdaptionCross-view TrainingAdversarial Training

How to Fine-Tune BERT for Text Classification?

Language model pre-training has proven to be useful in learning universal language representations.As a state-of-the-art language model pre-training model,BERT(Bidirectional Encoder Representations fr

会议

Transfer LearningBERTText Classification

A Comprehensive Verification of Transformer in Text Classification

与本文相关的学术论文