A Substitution-Translation-Restoration Framework for Handling Unknown Words in Statistical Machine T

来源 :Journal of Computer Science & Technology | 被引量 : 0次 | 上传用户:ail2515857
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Unknown words are one of the key factors that greatly affect the translation quality.Traditionally, nearly all the related researches focus on obtaining the translation of the unknown words.However, these approaches have two disadvantages.On the one hand, they usually rely on many additional resources such as bilingual web data;on the other hand, they cannot guarantee good reordering and lexical selection of surrounding words.This paper gives a new perspective on handling unknown words in statistical machine translation (SMT).Instead of making great efforts to find the translation of unknown words, we focus on determining the semantic function of the unknown word in the test sentence and keeping the semantic function unchanged in the translation process.In this way, unknown words can help the phrase reordering and lexical selection of their surrounding words even though they still remain untranslated.In order to determine the semantic function of an unknown word, we employ the distributional semantic model and the bidirectional language model.Extensive experiments on both phrase-based and linguistically syntax-based SMT models in Chinese-to-English translation show that our method can substantially improve the translation quality. Unknown words are one of the key factors that greatly affect the translation quality.Traditionally, nearly all the related researches focus on obtaining the translation of the unknown words .However, these approaches have two disadvantages .On the one hand, they usually rely on many additional resources such as bilingual web data; on the other hand, they can not guarantee good reordering and lexical selection of surrounding words. This paper gives a new perspective on handling unknown words in statistical machine translation (SMT) .Instead of making great efforts to find the translation of unknown words, we focus on determining the semantic function of the unknown word in the test sentence and keeping the semantic function unchanged in the translation process. In this way, unknown words can help the phrase reordering and lexical selection of their surrounding words even though they still remain untranslated. order to determine the semantic function of an unknown word, we employ the distribu tional semantic model and the bidirectional language model. Extensive experiments on both phrase-based and linguistically syntax-based SMT models in Chinese-to-English translation show that our method can substantially improve the translation quality.
其他文献
芬兰西南部古元古代库提马加威金碲化物矿床热液流体的演化1区域地质芬诺斯坎德地盾的斯韦考芬构造域大约是在1.9Ga时由地壳快速增生形成的.坦佩雷片岩带位于地盾中部,其北部是大面积
目的:探讨早期开颅显微手术治疗高原地区高血压脑出血的临床效果。方法:选取我院在2014年12月—2015年12月间收治的56例高血压脑出血患者,随机分为观察组(早期开颅显微手术)
AIM To evaluate platelet activation markers in psoriasis patients, compared to controls, and investigate their association with the inflammatory burden of psori
Photoluminescence(PL) from self-organized Ge quantum dots(QDs) with large size and low density has been investigated over a temperature range from 10 to 300 K u
德国英飞凌科技在东京举行的展会“智能电网展2011&新一代汽车产业展2011”(举办期间:2011年6月15日~17日)上,展出了各种功率模块。比如,配备SiC JFET的产品和延长了功率循环
本文通过对于2013年电影发展和重要电影文本的分析,提出了“主体重构”给予中国电影的当下想象以新的动力和新的可能性。 Through the analysis of the movie development i
Starting from the uniform disk current model, an analytical solution for the electromagnetic pulse axial energy radiation is derived and a physical meaning then
期刊
学习者的外语学习观念是外语教学研究中的一个新的、亟待进一步开拓的领域。学习者的外语学习观念并非杂乱无章、无序可循,也不是孤立存在、互不相关的,而是处于一个有机联系
Electrostatic discharge(ESD) phenomena involve both electrical and thermal effects,and a direct electrostatic discharge to an electronic device is one of the mo