论文部分内容阅读
【目的】开放的网络科技信息网页内容之间区分度较小,传统基于规则和统计学习的方法无法满足网络科技信息网页分类的具体应用需求。【方法】通过深入分析网络科技信息主题网页的内容和结构,利用开放本体等资源实现领域特征的学习,构建半监督的网络科技信息分类模型。【结果】实验结果表明提出的方法在网络科技信息分类实验中的精度、召回率和F1值分别达到0.9016、0.8756和0.8884,相比贝叶斯方法具有明显优势。【局限】该方法在应用到其他类别的网络科技信息分类时,仍然需要领域专家提供相关领域的核心种子特征。【结论】该方法可以满足网络科技信息深度加工的需求,实现有效的网络科技信息网页分类。
[Purpose] The distinction between open web content of S & T information is small. The traditional rules-based and statistical learning methods can not meet the specific application requirements of Web S & T information classification. 【Method】 Through the analysis of the content and structure of the web pages of the subject of network science and technology information, using the open ontology and other resources to realize the learning of the domain characteristics, a semi-supervised network technology information classification model was constructed. 【Result】 The experimental results show that the accuracy, recall and F1 of the proposed method are 0.9016, 0.8756 and 0.8884, respectively, which have obvious advantages over Bayesian method. [Limitations] This method still requires field experts to provide the core seed characteristics in relevant fields when applied to other categories of network science and technology information classification. 【Conclusion】 This method can meet the needs of deep processing of network technology information and realize effective classification of web technology information web pages.