,Punjabi DeConverter for generating Punjabi from Universal Networking Language

来源 :Journal of Zhejiang University-Science C(Computers & Electro | 被引量 : 0次 | 上传用户:honeykaka
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
DeConverter is core software in a Universal Networking Language(UNL) system.A UNL system has EnConverter and DeConverter as its two major components.EnConverter is used to convert a natural language sentence into an equivalent UNL expression,and DeConverter is used to generate a natural language sentence from an input UNL expression.This paper presents design and development of a Punjabi DeConverter.It describes five phases of the proposed Punjabi DeConverter,i.e.,UNL parser,lexeme selection,morphology generation,function word insertion,and syntactic linearization.This paper also illustrates all these phases of the Punjabi DeConverter with a special focus on syntactic linearization issues of the Punjabi DeConverter.Syntactic linearization is the process of defining arrangements of words in generated output.The algorithms and pseudocodes for implementation of syntactic linearization of a simple UNL graph,a UNL graph with scope nodes and a node having un-traversed parents or multiple parents in a UNL graph have been discussed in this paper.Special cases of syntactic linearization with respect to Punjabi language for UNL relations like ’and’,’or’,’fmt’,’cnt’,and ’seq’ have also been presented in this paper.This paper also provides implementation results of the proposed Punjabi DeConverter.The DeConverter has been tested on 1000 UNL expressions by considering a Spanish UNL language server and agricultural domain threads developed by Indian Institute of Technology(IIT),Bombay,India,as gold-standards.The proposed system generates 89.0% grammatically correct sentences,92.0% faithful sentences to the original sentences,and has a fluency score of 3.61 and an adequacy score of 3.70 on a 4-point scale.The system is also able to achieve a bilingual evaluation understudy(BLEU) score of 0.72. DeConverter is core software in a Universal Networking Language (UNL) system. A UNL system has EnConverter and DeConverter as its two major components. EnConverter is used to convert a natural language sentence into an equivalent UNL expression, and DeConverter is used to generate a natural language sentence from an input UNL expression. This paper presents design and development of a Punjabi DeConverter. It describes five phases of the proposed Punjabi DeConverter, ie, UNL parser, lexeme selection, morphology generation, function word insertion, and syntactic linearization. This paper also shows all these phases of the Punjabi DeConverter with a special focus on syntactic linearization issues of the Punjabi DeConverter. Syntactic linearization is the process of defining arrangements of words in generated output. Algorithms and pseudocodes for implementation of syntactic linearization of a simple UNL graph , a UNL graph with scope nodes and a node having un-traversed parents or multiple par ents in a UNL graph have been discussed in this paper. Special cases of syntactic linearization with respect to Punjabi language for UNL relations like ’and’, ’or’, ’fmt’, ’cnt’, and ’seq’ have also been presented in this paper. This paper also provides implementation results of the proposed Punjabi DeConverter. The DeConverter has been tested on 1000 UNL expressions by considering a Spanish UNL language server and agricultural domain threads developed by the Indian Institute of Technology (IIT), Bombay, India, as gold-standards. The proposed system generates 89.0% grammatically correct sentences, 92.0% faithful sentences to the original sentences, and has a fluency score of 3.61 and an adequacy score of 3.70 on a 4-point scale. The system is also capable to achieve a bilingual evaluation understudy (BLEU) score of 0.72.
其他文献
The most challenging problem in mesh denoising is to distinguish features from noise. Based on the robust guided normal estimation and alteate vertex updating s
人参(Panax ginseng C.A.Mey)和西洋参(Panax quinquefolius Linn.)中国参业的发展却很不平衡,栽培品种类型混杂,栽培技术落后,病害严重.因此以有栽培品种进行遗传多样性分析
RAPD是一种新发展起来分子标记技术,在它诞生的短短几年中,不仅广泛地应用于基因定位、基因组作图等领域,在物种亲缘关系研究中也提供大量数据.该文对这一技术在棉花指纹图谱
Data sparseness,the evident characteristic of short text,has always been regarded as the main cause of the low accuracy in the classification of short texts usi
中国药典收载的正品甘草有乌拉尔甘草G.uralensis,胀果甘草G.inflata和光果甘草G.glabra的干燥根及根茎。甘草为常用中药,具有通经络、利血气、益气补中、清热解毒、祛谈止咳、
根据反义RNA技术的原理,将多聚半乳糖醛酸酶(Polygalacturonase,PG)基因反向克隆到植物表达载体pBI151上,构建具有反义PG基因的植物表达载体.通过酶切鉴定和Agarose gel电泳