I have spent time on-and-off the past week looking at the performance of the Tracker RDF database. Tracker, I believe, started out as a desktop search tool for Gnome. I never used it in this incarnation, and it has only come to my attention  since version 0.7, when the developers implemented a general purpose RDF storage engine at its core. I wanted to know how this newly implemented RDF database compared to a widely used RDF database in terms of query performance. In essence I was interested in whether the Tracker project had spawned something that could compete with Virtuoso and 4Store.

You can find my complete results and analysis at the tracker mailing list archive, but the headline statement is that tracker had roughly 9 times the query performance of Virtuoso. The graph here shows the breakdown by query.

This is a drastic difference in performance that greatly favours the home-grown database utilized by Tracker. However this stellar performance comes at the cost of flexibility. Its obvious that the database has been tailored very much to the needs of Tracker itself. Unlike Virtuoso it is not ‘schema-free’. A description of the data (In the form of something called an RDF ontology) is required for storage. In addition to this, the data formats are more restrictive, and some common elements of RDF are missing.

My general impression was that Tracker has great query performance, especially considering a tiny memory footprint. Unfortunately it is not suited to storage of pre-existing RDF data sets, such as those generated for semantic-web applications. This could well change in the future. Tracker, and its RDF database, are in heavy development. They already have speed and seemingly stability in the code-base. It might soon be time to add the new features that make it more generally applicable.

I should add that when I started this work I was heavily sceptical. Codethink have been highly involved in RDF, but I have not joined in. I have learned a-lot in the past few weeks, and this has made me more positive. I still believe that RDF might be too flexible for its own good, and I’ve found that the ontologies are onerous, complicated, and not very well specified. I did however come across a great post which explains some of the advantages of RDF over other data models; SPARQL is far more intuitive than its SQL cousin. If used to its potential, with highly interlinked data, I think it may be possible for the benefits of RDF to outweigh the tough learning curve.