论文部分内容阅读
A hybrid approach to English Part-ofSpeech (PoS) tagging with its target application being English-Chinese machine translation in business domain is presented,demonstrating how a present tagger can be adapted to le from a small amount of data and handle unknown words for the purpose of machine translation.A small size of 998 k English annotated corpus in business domain is built semiautomatically based on a new tagset; the maximum entropy model is adopted,and rule-based approach is used in post-processing.The tagger is further applied in Noun Phrase (NP) chunking.Experiments show that our tagger achieves an accuracy of 98.14%,which is a quite satisfactory result.In the application to NP chunking,the tagger gives rise to 2.21% increase in F-score,compared with the results using Stanford tagger.