论文部分内容阅读
【目的】通过在小样本量下基于机器学习算法实现文摘语句的自动分类,以此实现学术文摘结构的自动识别。【方法】设计多种学术文摘的文本表示特征,利用自然语言处理技术实现特征的自动提取,以此指导朴素贝叶斯、支持向量机模型进行训练,并利用训练模型自动识别文摘结构。【结果】实验证明该方法较之于同类方法能够在较少训练语料下实现较好的识别准确率。【局限】由于文摘中“方法”类别语句缺乏固定的类别特征词与核心动词,导致算法对该类别语句识别准确率较低。【结论】所提方法是一种小样本量情况下行之有效的学术文摘结构自动识别方法。
【Objective】 Automatic classification of digest sentences is realized based on machine learning algorithm with small sample size, so as to realize the automatic recognition of academic digest structure. 【Method】 The text features of many kinds of academic abstracts were designed. The natural language processing technology was used to extract the features automatically, which guided the naive Bayes and SVM models to train. The training model was used to automatically identify the structure of the abstracts. 【Result】 Experimental results show that this method can achieve better recognition accuracy with less training corpus than the same method. [Limitations] Due to lack of fixed category feature words and core verbs in Digest “Method ” category sentences, the accuracy of the algorithm in recognizing the category sentences is relatively low. 【Conclusion】 The proposed method is an effective method for automatic identification of academic abstracts in the case of small sample size.