论文部分内容阅读
文章利用文本挖掘技术抽取技术主题和规范化主题,为技术主题分析提供基础工作。根据技术主题在专利标题中的分布特点和技术主题分析时主题词的统计长度特征,提出一种主题度计算方法,将主题度较大的词作为主题词;通过计算相似度获得主题词的同义词对,借助统计特征对主题词规范化表示。实验结果表明,文章提出的主题词抽取方法是有效的,实验准确率为95.5%,召回率为95.5%;同时文章提出的主题规范化方法具有较大的意义。
The article uses the text mining technology to extract the technical theme and the standardization theme, and provides the basic work for the technical theme analysis. According to the distribution characteristics of the technical subject in the patent title and the statistical length characteristics of the main topic in the technical subject analysis, a method for calculating the thematic degree is proposed, with the words with the greater subject degree as the keywords, and the synonyms of the subject words Right, with the statistical characteristics of the subject of the standardized expression. The experimental results show that the proposed method of keyword extraction is effective, the experimental accuracy rate is 95.5% and the recall rate is 95.5%. At the same time, the topic standardization method proposed by the article has great significance.