Introduction of SPIMBENCH

Instance matching (IM), refers to the problem of identifying instances that describe the same real-world object (alternative names include entity resolution, duplicate detection, record linkage and object identification in the context of databases). The problem has been studied extensively for the relational and XML data setting. With the increasing adoption of Semantic Web Technologies and the publication of large interrelated RDF data sets and ontologies that form the Linked Data Cloud, it is crucial to develop IM techniques adapted to this setting that is characterized by an unprecedented number of sources across which to detect matches, a high degree of heterogeneity both at the schema and instance level, and rich semantics that accompany schemas defined in terms of expressive languages such as OWL, OWL 2, and RDFS. For such data, novel IM techniques have recently been proposed.

Clearly, the large variety of IM techniques requires their comparative evaluation to determine which technique is best suited for a given application. Performing such an assessment generally requires well-defined and widely accepted benchmarks to determine the weak and strong points of the methods or systems. Furthermore, such benchmarks typically motivate the development of more performant systems in order to overcome identified weak points, hence, suited benchmarks help push the limit of existing systems, advancing both research and technology. A number of benchmarks have already been proposed, both for relational and XML data [12] and, more recently, for RDF data, the type of data prevalent in the Web of Data

Here we present the Semantic Publishing Instance Matching Benchmark, in short, SPIMBENCH, a novel IM benchmark for the assessment of IM techniques for RDF data with an associated schema. Essentially, SPIMBENCH proposes and implements: (i) a set of test cases based on transformations that distinguish different types of matching entities, (ii) a scalable data generator, (iii) a gold standard documenting the matches that IM systems should find, and (iv) evaluation metrics. As will become more clear from the discussion below, SPIMBENCH extends the state-of-the-art IM benchmarks for RDF data in three main aspects: it allows for systematic scalability testing, supports a wider range of test cases, and provides an enriched gold standard.

Transformation-based test cases. Similarly to existing IM benchmarks, SPIMBENCH defines test cases that provide a systematic way for evaluating IM systems' performance in different settings. SPIMBENCH supports two types of test cases already widely supported by existing IM benchmarks, i.e., value-based test cases based on applying value transformations (e.g., blank character addition and deletion, change of date format, abbreviations, synonyms) on triples relating to a given input entity and structure- based test cases characterized by a structural transformation (e.g., different nesting levels for properties, property splitting, aggregation). However, SPIMBENCH is the first benchmark to support semantics-aware test cases that go beyond the standard RDFS constructs. More precisely, it is the first benchmark to support the OWL constructs for instance (in)equality, class and property equivalence and disjointness, property constraints, as well as complex class definitions. SPIMBENCH also supports simple test cases (implemented using the aforementioned transformations applied on different triples pertaining to the same instance), as well as complex test cases (implemented by combinations of individual transformations on the same triple).

Scalable data generator. In generating test datasets to be used for IM, we first generate a synthetic source dataset, ensuring that this dataset does not contain any matches itself. The generation of the source dataset extends the SPB data generator to tackle the more complex schema constructs expressed in terms of OWL. Next, we generate matches and non-matches to entities of the source dataset to cover the test cases described above. As a result, we obtain a synthetic target dataset that contains matches that IM methods should identify. Our data generation process allows the generation of arbitrary large datasets, thus supporting the evaluation of both the scalability and the matching quality of an IM system.

Weighted gold standard. To judge the matching quality of an IM solution, the ground truth with respect to the matches in the considered dataset has to be known. To this end, SPIMBENCH records each pair of generated matches (a pair consisting of an entity of the source dataset and an entity of the target dataset). Each pair is further described by annotations specific to the test case, i.e., the type of test case it represents, the property on which a transformation was applied (in the case of value-based and structure-based test cases), and a weight that translates the difficulty of finding a particular match. This detailed information, which is not provided by previous benchmarks, allows users of our benchmark (e.g., developers of IM systems) to more easily identify the reasons underlying the performance results obtained using SPIMBENCH and thereby supports IM systems' debugging and extension.

Evaluation Metrics. As the majority of IM benchmarks, SPIMBENCH uses recall, precision, and f-measure to assess the completeness, soundness, and overall matching quality of an IM system. In addition, SPIMBENCH allows testing scalability of IM solutions, as it also considers runtime as an evaluation metric.