论文部分内容阅读
科技文献通常包括研究目的、方法、结果和结论等信息,如何将科技文献标引上这些信息,帮助科研人员在数量巨大的文献中快速发现符合研究需要的内容显得尤为重要。文章在研究分析科技文献写作特点基础上,提出了基于词、英文(专有名词、缩写词)以及数字的核心特征词提取策略;然后将科技文献标引问题转化为句子分类问题,结合提出的核心特征词,采用支持向量机分类器对科技文献进行句子级别的语义标引。通过对1168篇糖尿病医学类论文实验,证明本文提出的方法能够有效地学习和标引科技文献中的句子,进而有效地对科技文献关键信息点进行自动标引。
Scientific and technical literature usually includes research purposes, methods, results and conclusions and other information, and how to document science and technology information on this information to help researchers in a huge quantity of literature quickly find content that meets research needs is particularly important. Based on the research on the characteristics of scientific and technical documents writing, this paper proposes a strategy of extracting core feature words based on words, English (proper nouns, abbreviations) and numbers. Then, the document classification of scientific articles is transformed into sentence classification problems. Core feature words, the use of support vector machine classifier for scientific literature sentence level semantic indexing. By experimenting with 1168 articles on diabetes medicine, this paper proves that the method proposed in this paper can effectively learn and index sentences in science and technology documents, and then effectively index key information points in scientific literature.