论文部分内容阅读
随着社会生活网络化的日趋成熟,在很多研究和商业领域里都遇到了中文文本处理问题。不断深化的文本分类研究需要从文本的各个方面来解析文本信息,语义解析是文本挖掘的关键技术,语境识别可以应用在许多文本挖掘技术中,比如情感分析、舆情分析等。基于句法决策树、Ngram模型的特征要素提取方法和SVM分类器,提出一种语境分类模型,解决字词在不同语境下的多义性问题。该模型具有良好的泛化能力,在批量处理时具有很好的通用效果,能比较有效地解决文本挖掘中语境识别难题。
As the network of social life is maturing, Chinese text processing problems have been encountered in many research and commercial fields. Deepening text categorization research needs to analyze textual information from all aspects of texts. Semantic analysis is the key technology of text mining. Context recognition can be applied to many text mining technologies, such as sentiment analysis and public opinion analysis. Based on the syntactic decision tree, feature extraction method of N-gram model and SVM classifier, a context classification model is proposed to solve the ambiguity of words in different contexts. The model has a good generalization ability, has a good general effect in batch processing, and can solve the problem of context recognition in text mining more effectively.