Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

来源 :计算机科学技术学报(英文版) | 被引量 : 0次 | 上传用户:seven16
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Document subjectivity analysis has become an important aspect of web text content mining. This problem is similar to traditional text categorization, thus many related classification techniques can be adapted here. However, there is one significant difference that more language or semantic information is required for better estimating the subjectivity of a document. Therefore, in this paper, our focuses are mainly on two aspects. One is how to extract useful and meaningful language features, and the other is how to construct appropriate language models efficiently for this special task. For the first issue, we conduct a Global-Filtering and Local-Weighting strategy to select and evaluate language features in a series of n-grams with different orders and within various distance-windows. For the second issue, we adopt Maximum Entropy (MaxEnt) modeling methods to construct our language model framework. Besides the classical MaxEnt models, we have also constructed two kinds of improved models with Gaussian and exponential priors respectively. Detailed experiments given in this paper show that with well selected and weighted language features, MaxEnt models with exponential priors are significantly more suitable for the text subjectivity analysis task.
其他文献
目的探讨小儿脱髓鞘病的临床特点,以提高其诊治水平。方法对本院45例小儿脱髓鞘病的临床特点、辅助检查结果 (包括影像学、肌电图、脑脊液、视觉诱发电位)及治疗方法与疗效进
The combination of Pt2+, benzoquinone and NaNO2 forms an electron-transfer chain, which leads to the oxidation of methane by O2 in CF3COOH aqueous solution. The
Two series of aromatic-aliphatic random copolyesters (PEBTOXS) with diverse diol ratios have been synthesizedby direct melting polycondensation. Two kinds of di
The equation governing the unsteady flow of viscoelastic fluids in an eccentric annulus was derived by using the common conversion Maxwell fluid constitutive eq
The COPZr-2 catalyst, which was prepared in our prophase research, showed good catalytic performance in methanol steam reforming reaction. In this article, the
It is well known that the Chinese Remainder Theorem (CRT) can greatly improve the performances of RSA cryptosystem in both running times and memory requirements
目的 分析山西地区汉族D9S925、D11S2368、D14S608、D15S659、D17S1290、D20S470等6个短串联重复序列(short tandem repeat,STR)基因座的遗传多态性.方法 根据GenBank资料合
目的 观察Tumstatin185~191对肺腺癌细胞增生与凋亡的影响,探讨其与蛋白激酶B(Akt)及细胞外调节蛋白激酶(ERK)活性的关系.方法 以人肺腺癌细胞株A549为研究对象,分别给予不同
高师实践性课程改革是当前学前教育专业研究的一个热议话题.文中拟就高师学前教育专业实践性教学课程的价值、意义及存在的问题和解决途径等作一些理论上的梳理及探究,以期推
ε-Caprolactam(CL or CPL) is one of the most important intermediates used in polymer industry for the production of several million tons of nylon-6 every year[1