基于向量空间模型的网页文本句子对齐方法研究

来源 :第十一届全国人机语音通讯学术会议 | 被引量 : 0次 | 上传用户：papalong2009

【摘要】

：

【作者】

：

张贯虹乌达巴拉巩政

【机构】

：

合肥学院计算机科学与技术系网络与智能信息处理重点实验室安徽合肥 230601 中国科学院合肥

【出处】

：

第十一届全国人机语音通讯学术会议

【发表日期】

：

2011年5期

【关键词】

：

语音处理互译词典 CHI统计数理语言学

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

　　平行网页文本中除了互为对照的内容，还存在一些无关的噪声，因此利用网页结构相似的方法解决平行网页中句对齐问题受到一定的限制。通过引入互译词典或同类词典的方法可以提高句对齐质量，但是双语词典的规模是有限的，不能覆盖所有对应的词汇。
　　本文利用基于向量空间模型提供的相似度计算方法对平行网页文本进行句子对齐，在向量空间模型中，网页文本中的句子为一维空间中的向量，选取实词作为特征项，利用CHI统计量计算词汇关联度，采用TF-IDF算法计算特征项权重，采用cosine距离计算句子向量之间的相似度，解决平行网页文本句对齐问题。以蒙古文-中文平行网页为实验对象，设计了相关实验。实验结果证实了本文方法的有效性。

其他文献

Automatic navigation algorithm in virtual complex indoor scenes

To meet the demand for efficient automatic navigation in virtual complex indoor scenes, this paper presents an automatic navigation algorithm. The algorithm uses Dijkstra algorithm for path planning o

会议

Automatic navigationVirtual realityDijkstraBezier curve

Building Virtual Entertainment Environment with Tiled Display Wall and Motion Tracking

Presented in this paper is an immersive and interactive entertainment environment which integrates multi-projector tiled display wall and motion tracking. Calibration methods are proposed for the geom

会议

virtual entertainmenttiled display wallmotion trackingCamshift algorithmedge

Efficient Coupling of Parallel Visualization and Simulations on Tens of Thousands

The scale of some datasets generated by simulations on tens of thousands of cores are gigabyte or larger per output step. It is imperative that efficient coupling of these simulations and parallel vis

会议

parallel visualizationlarge-scale simulationpatch-based parallel I/O strategy

Video Semantic Concept Detection based on Conceptual Correlation and Boosting

Semantic concept detection is a key technique to video semantic indexing. Traditional approaches did not take account of conceptual correlation adequately. A new approach based on conceptual correlati

会议

Video semantic concept detectionCo-concept-boostingContext based conceptual fu

An Adaptive Sampling Based Parallel Volume Rendering Algorithm

In this paper, a parallel ray-casting volume rendering algorithm based on adaptive sampling is presented for visualizing TB-scale time-varying scientific data. The algorithm samples a data field adapt

会议

parallel volume renderingadaptive samplingray casting

Using AR technology for automotive visibility and accessibility assessment

Automotive interior ergonomics analysis is important step for automotive development validation in the process, which directly affects the product development cycle time and cost. In order to provide

会议

Augmented realityautomotive ergonomicsA-pillar assessmentbinocular obstructio

A Visual Hull Algorithm of 3D Reconstruction Based on Interframe Coherence

The traditional volumetric visual hull generating methods were not applicable to real-time objects due to frame by frame calculations. A fast new algorithm based on interframe coherence was represente

会议

3D ReconstructionVisual HullFrame to Frame CoherenceMotion Estimation

Modeling and Simulation on Radar Detection Range Under Complex Electromagnetic Environment

A SERIES MODELS FOR RADAR DETECTION RANGE UNDER COMPLEX ELECTROMAGNETIC ENVIRONMENT WERE ESTABLISHED, INCLUDING ANTENNA GAIN, PROPAGATION IN MULTI-PATH, ATTENUATION, CLUTTERS OF RAINFALL AND SEA SURFA

会议

Radar detection rangemulti-pathsea clutterjammingM&S

A Window-Based Adaptive Correspondence Search Algorithm Using Mean Shift and Disparity Estimation

Aiming at the problem of low efficiency and unsatisfactory matching of uniform texture regions in binocular stereo vision, we propose a rapid window-based adaptive correspondence search algorithm usin

会议

Mean shiftdisparity estimationadaptive window matchingbinocular vision

基于不同音素概率分布的发音质量分数映射方法

现有的计算机辅助语言学习系统（Computer Assisted Language Learning，CALL）在得到GOP分数之后，对所有的音素都使用相同的映射函数计算相应的句子得分，忽略了不同音素发音之间的差异性。本文提出了一种使用专家评分语音对GOP分数归一化处理的新方法“概率分布映射算法” （probability distribution mapping algorithm，PDMA）。

会议

语音信号信号处理PDMA算法数理语言学

基于向量空间模型的网页文本句子对齐方法研究

与本文相关的学术论文