论文部分内容阅读
Chinese organization name recognition is hard and important in natural language processing. To reduce tagged corpus and use untagged corpus, we presented combing Co-trainins with support vector machines (SVM) and conditional random fields (CRF) to improve recognition results. Based on principles of uncorrelated and compatible, we constructed different classifiers from different views within SVM or CRF alone and combination of these two models. And we modified a heuristic untagged samples selection algorithm to reduce time complexity. Experimental results show that under the same tagged data, Co-training has 10% F-measure higher than using SVM or CRF alone; under the same F-measure, Co-training saves at most 70% of tagged data to achieve the same performance.