Efficient RDF Interchange (ERI) Format for RDF Data Streams

This page represents a bundle for the contents of the article submitted to ISWC'14, and it is currently under review. The purpose of this web page is to make accessible, link and describe the inputs and outputs of the analysis, which will be stored as a Research Object (pack).

Abstract

RDF streams are sequences of timestamped RDF statements or graphs, which can be generated by several types of data sources (sensors, social networks, etc.). They may provide data at high volumes and rates, and be consumed by applications that require real-time responses. Hence it is important to publish and interchange them efficiently. In this paper, we exploit a key feature of RDF data streams, which is the regularity of their structure and data values, proposing a compressed, efficient RDF interchange (ERI) format, which can reduce the amount of data transmitted when processing RDF streams. Our experimental evaluation shows that our format achieves significant space savings w.r.t. standard data streaming compression, remaining efficient in performance.

Inputs of the evaluation

The input of the analysis consists in 16 datasets, whose selection has been based on the number of triples, topic coverage, availability and, if possible, previous uses in benchmarking. We define three different categories of datasets: streaming (10), statistics (3) and general (3).

Obviously, Streaming datasets are our main application focus and they consist of:

Mix, a random mix of RDF streams (accessible here, original source)
Identica: RDF messages in the streamline of the microblogging site (accessible here, original source)
Wikipedia: Wikipedia edition monitoring (accessible here, original source)
AEMET-1: information from weather stations in Spain (accessible here, original source)
AEMET-2: information from weather stations in Spain (accessible here, original source)
Petrol: credit card transactions in petrol stations (accessible here, original source)
Flickr_Event_Media: media events in Flickr (accessible here, original source)
LOD_Nevada: weather measurements of the Nevada blizzard (accessible here, original source)
LOD_Charley: weather measurements of the Charley hurricane (accessible here, original source)
LOD_Katrina: weather measurements of the Katrina hurricane (accessible here, original source)

Statistical datasets, using the RDF Data Cube Vocabulary, are the prototypical example of other (non-streaming) data presenting clear regularities that ERI can take advantage of:

Eurostat_migr_reschange, population statistics from Eurostat-Linked Data (accessible here, original source)
Eurostat_tour_cap_nuts3, tourism statistics from Eurostat-Linked Data (accessible here, original source)
Eurostat_avia_paexac, transport statistics from Eurostat-Linked Data (accessible here, original source)

Finally, we experiment with general static datasets, without prior assumptions on data regularities:

LinkedMDB, films (accessible here, original source)
Faceted DBLP, bibliography (accessible here, original source)
Dbpedia 3-8, well-known knowledge base (accessible here, original source)

MODIFICATIONS

We convert each dataset to N-Triples by means of the Any23 0.9.0 tool. LOD_Nevada, LOD_Charley and LOD_Katrina result from appending their related turtle files by sampling date.

LICENSE

All datasets are freely provided by the aforementioned data sources. In general, datasets are licensed under the terms of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License.

The Eurostat's data is available under the original Eurostat license.

Source Code

The source code of the current prototype is accessible here

About the authors

Javier D. Fernández	Ontology Engineering Group (OEG), Univ. Politécnica de Madrid (Spain); jdfernandez@fi.upm.es
Alejandro Llaves	Ontology Engineering Group (OEG), Univ. Polit ́ecnica de Madrid (Spain); allaves@fi.upm.es
Óscar Corcho	Ontology Engineering Group (OEG), Univ. Polit ́ecnica de Madrid (Spain); ocorcho@fi.upm.es

Acknowledgements

The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 257641, PlanetData network of excellence. We are thankful for discussions with authors of the RDSZ approach, specially with Norberto Fernández (Universidad Carlos III de Madrid).

This page is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic License.