论文部分内容阅读
越来越多的网络数据都以XML格式表示和存储,开发高效的查询处理算法以便从带有层次结构的XML文档中提取满足条件的数据是一个必须解决的问题.从XML文档中提取信息时,通常使用已有XML查询语言的核心组件—Twig查询—来表达查询语义.Twig查询的一个固有特点是其中的查询节点之间具有特定的先序关系,正是这一特点使其在很多情况下变得难以使用,从而不得不通过放宽Twig查询的约束条件来表达更灵活的语义.文中主要解决“不完全结构约束的查询(PSTP查询)”的处理问题.提出一种扩展的XPath语法,通过引入Samepath轴,可以以一种简单有效的方式表达灵活的查询语义;提出一种基于扩展XPath语法的查询处理算法pTwigStack,可以高效处理PSTP查询,从而避免分别处理PSTP查询对应的每个Twig查询所导致的性能下降问题;提出两种基于DTD schema的优化方法,用以改进pTwigStack算法的处理性能.不同数据集上的实验结果表明,pTwigStack算法在处理PSTP查询时,综合性能明显优于已有方法.
As more and more network data is represented and stored in XML format, developing efficient query processing algorithms to extract qualified data from hierarchically structured XML documents is a must-have problem. When extracting information from an XML document , Usually uses the core component of the existing XML query language -Twig query- to express the query semantics.One of the inherent characteristics of the Twig query is that the query nodes have a particular order relationship between them, which is why this feature makes it possible in many cases So it has to be more flexible to semantics by relaxing the constraints of Twig queries.This paper mainly deals with the problem of “incomplete structured constraint query (PSTP query)” processing.An extended XPath By introducing the Samepath axis, flexible query semantics can be expressed in a simple and effective way. A query processing algorithm pTwigStack based on extended XPath syntax is proposed, which can efficiently process PSTP queries and avoid dealing with each of the PSTP queries Twig query caused by the performance degradation; proposed two DTD schema-based optimization methods to improve the pTwigStack algorithm The experimental results on different data sets show that the pTwigStack algorithm outperforms the existing methods when dealing with PSTP queries.