论文部分内容阅读
网页是传递信息的重要载体,以网页作为研究对象是现今信息检索和信息关联研究的必然趋势.鉴于句子既是传递信息的基本单位,又是表述完整意思的语言单位,本文以句子为出发点对网页鉴别问题进行研究.句子的不同变换形式能够表述相同的含义的特点,给网页鉴别带来了困难.为解决该问题,首先定义句子和网页之间的4种关系:属于关系、同义词替换关系、简单语序变换关系、复杂语序变换关系,然后讨论每一种关系的识别问题,证明了:(1)识别句子和网页的属于关系是可判定问题并且是P问题;(2)识别同义词替换关系是不可判定问题;(3)识别简单语序变换关系是不可判定问题;(4)识别复杂语序变换关系是不可识别问题.上述结论勾画出了网页鉴别问题难易程度的谱系.
Web pages are an important carrier of information transmission, and Web pages as the research object are the inevitable trend of information retrieval and information association research nowadays.While the sentence is not only the basic unit of information transmission, but also the language unit of complete meaning, Different transformation forms of sentence can express the same meaning and bring difficulties to webpage identification.To solve this problem, we first define four kinds of relations between sentence and webpage: belong to relationship, synonym replacement relationship, Simple sequence transformation, complex sequence transformation, and then discusses the recognition of each relationship. It proves that: (1) the relationship between the recognition sentence and the web page is a determinable one and a P problem; (2) the recognition of synonym replacement is Unable to determine the problem; (3) Identify the relationship between simple sequence transformation is undecidable; (4) Identify the relationship between complex sequence transformation is not identifiable.