Formalisation and Experiences of R2RML-based SPARQL to SQL query translation using Morph

This page represents a bundle for the research objects (queries and their corresponding query plans) used in the evaluation section of a paper (link to the paper) published at the processings of the 23rd International World Wide Web Conference (DOI: http://dx.doi.org/10.1145/2566486.2567981). You can also download all the materials here.

Abstract

R2RML is used to specify transformations of data available in relational databases into materialised or virtual RDF datasets. SPARQL queries evaluated against virtual datasets are translated into SQL queries according to the R2RML mappings, so that they can be evaluated over the underlying relational database engines. In this paper we describe an extension of a well-known algorithm for SPARQL to SQL translation, originally formalised for RDBMS-backed triple stores, that takes into account R2RML mappings. We present the result of our implementation using queries from a synthetic benchmark and from three real use cases, and show that SPARQL queries can be in general evaluated as fast as the SQL queries that would have been generated by SQL experts if no R2RML mappings had been used.

Inputs

This section provides the queries used in our evaluation. The evaluation is done using one syntatic benchmark (BSBM) and three Spanish/European project (Integrate, BizkaiSense, and Repener). We compare the following queries: Native SQL queries (N), Naive SQL translation (C), with self-join elimination (SJE) optimization, with subquery elimination (SQE) optimization, and both optimizations (SJE+SQE).

BSBM: N, C, SJE, SQE, SJE+SQE translations of 10 SPARQL queries (accessible here)
BizkaiSense: Morph (SJE+SQE) and D2R translation of 7 SPARQL queries (accessible here)
Integrate: Morph (SJE+SQE) and D2R translation of 6 SPARQL queries (accessible here)
Repener: Morph (SJE+SQE) and D2R translation of 2 SPARQL queries (accessible here)

Results of the evaluation

This sections provides the query plans generated by the database server for the queries above

BSBM query plans (accessible here)
BizkaiSense query plans (accessible here)
Integrate query plans (accessible here)
Repener query plans (accessible here)

About the authors

Freddy Priyatna		Freddy Priyatna is a Ph.D student of Artificial Intelligence Department of Universidad Politécnica de Madrid. He also is a member of Ontology Engineering Group, of the mentioned department. He holds an undergraduate degree of Computer Science in 2003 from University of Indonesia. He obtained his joint master degree in Computational Logic in 2009, given by Universidade Nova de Lisboa and Universidad Politécnica de Madrid. His master thesis work is in data integration area under supervision of Prof. Oscar Corcho. His research areas are on data integration and semantic web.
Oscar Corcho		Oscar Corcho is an Associate Professor at Departamento de Inteligencia Artificial (Facultad de Informática , Universidad Politécnica de Madrid) , and he belongs to the Ontology Engineering Group. His research activities are focused on Semantic e-Science and Real World Internet, although he also works in the more general areas of Semantic Web and Ontological Engineering. In these areas, he has participated in a number of EU projects (Wf4Ever, PlanetData, SemsorGrid4Env, ADMIRE, OntoGrid, Esperonto, Knowledge Web and OntoWeb), and Spanish Research and Development projects (CENITS mIO!, España Virtual and Buscamedia, myBigData, GeoBuddies), and has also participated in privately-funded projects like ICPS (International Classification of Patient Safety), funded by the World Health Organisation, and HALO, funded by Vulcan Inc.
Juan Sequeda		Juan Sequeda is a Ph.D Candidate and a NSF Graduate Research Fellow at the Department of Computer Sciences at the University of Texas at Austin. He is a member of the Research in Bioinformatics and Semantic Web (RiBS) Lab. His research interest lies in the intersection between Semantic Web/Linked Data and Relational Databases and Data Integration.

Acknowledgements

This research was supported by the Spanish project myBigData and by PlanetData (FP7- 257641). Juan Sequeda was supported by the NSF Graduate Research Fellowship. We also thank Boris, Jean-Paul, and Jose-Mora for the valuable discussions and people respon- sible of the Integrate (Raul Alonso, David Perez del Rey), BizkaiSense (Jon Lázaro, Oscar Peña, and Mikel Emaldi) and Repener (Alvaro Sicilia) projects for providing us the RDB2RDF mappings and queries used in these projects.

This page is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic License.