论文部分内容阅读
主要介绍从中文专利文本中识别新技术术语的方法。利用ICTCLAS分词系统和停用词表抽取文档词元,通过改进的TFIDF模型计算词元权重并筛选出热点词元,再通过词间距测算对热点词元按顺序进行组配,经权重计算和阈值筛选后得到术语集,由专家人工判定识别出有效的新技术术语。最后给出应用实例并进行分析,验证该方法的有效性。
Mainly introduced from the Chinese patent text to identify new technical terms. ICTCLAS participle system and stop-use vocabulary are used to extract the document words. The improved TFIDF model is used to calculate the weight of the word elements, and the hot words are filtered out. Then the hot words are grouped in sequence according to the word distance calculation. After weight calculation and threshold After screening, a set of terms is obtained, which is determined by expert experts to identify valid new technical terms. Finally, the application example is given and analyzed to verify the effectiveness of the method.