Διαχείριση Διαδικτυακών Δεδομένων, Ολοκλήρωση και Προσαρμογή Γνώσης
Επιστημονικός υπεύθυνος: Καθ. Βασίλης Χριστοφίδης
ISL research and development activities in this area target advanced Semantic Web middleware technology (e.g., XML, RDF/S) enabling large-scale (semantic, structure and syntactic) information interoperation within or across Web communities on as as-needed basis, as well as efficient processing of information requests (i.e., queries).
The expansion of the Internet and in particular of the Web, enable us to access a huge number of autonomous information sources managed by quite heterogeneous systems. Although a lot of useful information is made available, users are not properly supported in effectively gathering data relevant to achieve their aims. As a consequence, data and knowledge integration has emerged as a vital need in many organizations (e.g., scientific, educational) striving to access useful information available both inside and outside their borders.
The new context of network-centric information systems raises a number of challenging issues for data and knowledge integration that are far more difficult than those encountered in traditional multidatabase and federated systems. Depending on the application needs, we have to face both materialized (i.e., Warehouses, Portals) and virtual (i.e., on-demand Query Mediation) integration scenarios, as well as an increasing degree of autonomy (i.e., Storage, Execution, Lifetime, Connection) of information sources. In this context, (i) the interpretation of data on a common meaningful basis is difficult because of the differences in the modeling assumptions (context) made when defining the available information sources. To preserve these assumptions when data are exchanged across institutional boundaries, adequate descriptive information (metadata) about the sources is required (e.g., more and more scientific and business communities begun to develop their own taxonomies or ontologies); (ii) conflict resolution from multiple information sources and thereby the a priori construction of an integrated view is a hard problem. To tackle this issue we need rather to consider several pair-wise mappings between the knowledge schemas of information sources and construct integrated views on-demand (e.g., depending on the user query); (iii) the space of information sources is very dynamic so adding or dropping a data source should be done with a minimal impact on the integrated view(s); (iv) the information sources may have different structuring and computing capabilities, ranging from full-featured structured database management systems to simple files with unstructured or semi-structured information; (v) the number of generated query plans can be fairly large and so heuristic plan enumeration algorithms as well as appropriate information quality criteria (e.g., coverage) need to be considered in order to compromise completeness of query results in favor of efficiency.
(a) The RDFSuite of tools for parsing, loading, querying and viewing RDF/S resource descriptions and schemata (in order to build a community metadata warehouse). In particular, our technical contributions are:
- The Validating RDF Parser (VRP): The First Parser supporting semantic validation of both RDF/S resource descriptions and schemata.
- The RDF Schema Specific DataBase (RSSDB): The First Store exploiting a variety of Object-Relational (SQL3) representations to manage RDF/S resource descriptions and schemata.
- The RDF Query Language (RQL): The First Declarative Language for uniformly querying RDF/S resource descriptions and schemata.
- The RDF View Language (RVL): The First Declarative Language for creating virtual RDF/S resource descriptions and schemata.
- The RQL Graphical Query Generator (GRQL): The Fist GUI generating minimal declarative queries by taking into account the browsing actions in an RDF/S schema during a user navigation session.
(b) The Semantic Web Integration Middleware (SWIM) for integrating relational and XML sources using RDF/S schemata (in order to develop a community virtual metadata integrator) as well as reformulating RDF/S queries to SQL and XQuery. In particular, the novel services supported by SWIM comprise:
- Declarative specification of XML> RDF and RDB > RDF mappings
- Semantic Query Optimization techniques (i.e., containment and minimization of increasing expressiveness RQL fragments)
- Reformulation of RQL queries (i.e., composition of RQL queries with the mappings to produce XML or RDB queries)
- Support of further abstractions of RDF data/schemata (i.e., composition of RQL queries with RVL views)
(c) The Semantic Query P2P Router and Planner (SQPeer) providing a fully-fledged framework for efficient processing queries over remote peer RDF/S bases (materialized or virtual) without a central administration overhead (in order to support large scale communities of interests). SQPeer novelty lies on:
- The construction of a distributed catalog of peer base advertisements using purely information from the RDF/S schemas declared the peers as RVL views
- The interleaved routing and planning algorithms for obtaining quickly
the most relevant results from the peers that can entirely answer
the largest query fragment.