In this talk I will present a joint work with Michael Benedikt concerning the specification and composition of XML subtree queries. A frequent task encountered in XML processing is to filter an input document to produce a subdocument; that is, a document whose root-to-leaf paths is a subset of the set of root-to-leaf paths of the original document and which inherits the tree structure thereof. These are what we mean by subtree queries. Such queries can be used in defining views, either in data integration or in access control, and these views may as well be layered one on top of the other. Attempting to represent subtree queries in a general-purpose query language leads not only to cumbersome query specifications but also to performance issues, since the query engine cannot exploit the subtree nature of the query. The problem is exacerbated when applications require the sequential composition of multiple subtree queries, since it is even less likely that the composition can be recognized as a subtree query and the evaluation be optimized accordingly.
In this talk I will present the XML Subtree Query Language, a simple language for specifying subtree queries, and show that the language is closed under composition. This closure property allows a sequence of XML subtree queries to be rewritten as a single subtree query.
The XML subtree query language and the associated composition algorithms have been used in the GUPster and Incognito projects developed at Bell Labs. GUPster provides a single point of controlled access to user profile data. Incognito is a dedicated platform for XML access control. Incognito uses the Vortex rules engine
to resolve user context information and applies the composition algorithms to compose user queries with access control views (all of these being subtree queries) to compute the authorized user query that will be evaluated against XML documents.