论文部分内容阅读
[目的/意义]改善现有专利技术主题分析方法主题辨识度低、主题词二义性、无法识别技术信息中的“问题”与相应“解决方案”等问题。[方法/过程]本文通过抽取专利文本中的SAO结构,并从SAO结构中识别“问题和解决方案”(P&S)模式,基于“bagofP&S”假设,构建基于“主语-行为-宾语”(subject-action-object,SAO)结构的LDA主题模型,实现对专利文献主题结构的识别和分析。[结果/结论]案例研究表明,该方法能够有效识别主题分布,并在主题辨识度和语义消岐方面较传统LDA模型具有较大优势。
[Purpose / Significance] To improve the theme identification method of the existing patented technologies with low subject identification and ambiguous keywords, and to identify issues such as “problem ” and corresponding “solution ” in technical information. [Method / Process] Based on the “bag of P & S” hypothesis and the “Problem and Solution” (P & S) pattern from the SAO structure, (Subject-action-object, SAO) structure LDA theme model to realize the identification and analysis of the subject structure of patent literature. [Result / Conclusion] The case study shows that this method can effectively identify the topic distribution and has more advantages than the traditional LDA model in terms of topic identification and semantic disambiguation.