Blog
Books
Talks
Follow
Me
Search

Larsblog

RDF triple stores — an overview

<< 2012-09-17 19:56 >>

There's a huge range of triple stores out there, and it's not trivial to find the one most suited for your exact needs. I reviewed all those I could find earlier this year for a project, and here is the result. I've evaluated the stores against the requirements that mattered for that particular project. I haven't summarized the scores, as everyone's weights for these requirements will be different.

By a triple store I mean a tool that has some form of persistent storage of RDF data and lets you run SPARQL queries against that data. The SPARQL support can either be built-in as part of the main tool, or an add-on installed separately. This is why Sesame is in the table (it has SPARQL and a native backend), but Jena is not. The column for Fuseki with the tdb backend (which is basically Fuseki on top of Jena) is the closest you get to a Jena column.

I've deliberately left out rows for whether these tools support things like R2RML, query federation, data binding, SDshare, and so on, even though many of them do. The rationale is that if you pick a triple store that doesn't support these things you can get support anyway through separate components.

I've also deliberately left out cloud-only offerings, as I feel these are a different type of product from the databases you can install and maintain locally.

If something in the table is not clear, try mousing over it to get an explanation. If you have more information, or think any part of this is wrong, please leave a comment (or send me email), and we can clear it up.
Requirement Virtuoso Oracle OWLIM Allegro Bigdata Mulgara 4Store Sesame Stardog B* DB2 Fuseki
Open source Yes/no No No No Yes Yes Yes Yes No No No Yes
Free edition Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
10 billion statements Yes Yes Yes Yes Maybe No Maybe No Yes Coming ? No
Clustering Yes Yes Yes Yes Yes No Yes No Yes Cloud Yes No
SPARQL 1.0 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
SPARQL 1.1 Partial Yes Yes Yes Yes Partial Partial Partial Yes Yes Partial Yes
SPARQL Update Non-std Yes Yes Yes Yes TQL Upd Yes Yes Yes Yes No Yes
Support Yes Yes Yes Yes Yes No No Yes Yes Yes Yes Yes
Events Yes Yes Yes No No No No Yes No No Yes Yes/no
Reasoning Rules Materialized Rules Rules Datalog Rules Add-on Little OWL + rules No ? Rules
Constraints No Yes No No No No No No Yes No No No
Triple-level security Coming Yes No Some No No No No No No No No
Endpoint built in Yes No Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Live backup Yes Yes Yes Yes ? Yes Yes Kind of Kind of Yes Yes Yes
Embeddable Yes No Yes ? ? Yes Yes Yes Yes Yes No Yes

Since this blog post was originally published people have written to me about yet more triple stores that should be included. Since the table above is getting too wide already I'm adding a new table here with room for those. It should not be taken to mean that these triple stores are not as good; it just means they were added later.

Note that Tinkerpop supports multiple backends. We're here assuming the Neo4J backend.

Requirement LMF Tinkerpop Urika
Open source Yes Yes (w/ commercial ed) No
Free edition Yes Yes No
10 billion statements Probably ? Yes
Clustering No Yes Yes
SPARQL 1.0 Yes Yes Yes
SPARQL 1.1 Yes Partial Partial
SPARQL Update Yes Yes Yes
Support Yes ? Yes
Events Yes Yes No
Constraints No No No
Reasoning Rules Some Rules
Triple-level security No ? No
Endpoint built in Yes Yes Yes
Live backup Kind of Commercial No
Embeddable Yes Yes No

There is also Meronymy, but since it's not out of closed alpha yet, I haven't included it.

I guess the main thing to take away from this table is that there are lots of triple stores out there, giving users a wide range of products to choose from.

Kendall Clark's summary of the RDF database market from 2010 is still a good read if you want more on the subject.

Update: To keep track of all the updates: you are now looking at $Id$.

Update: Corrected Sesame support, as per comment from Tomas Francart. Corrected some BrighstarDB details from Graham Moore. Also added "Free edition" row suggested by Graham Moore.

Update: Fixed links for live backups, as pointed out by Maxime.

Update: Fixed 4Store and Sesame SPARQL 1.1 status, based on comment from Kjetil Kjernsmo. Also added "Embedded" row, as suggested by Graham Moore.

Update: Fixed OWLIM scaling based on comment from Jerven Bolleman. Fixed 4Store backup and embeddable status, based on comment from Steve Harris. Changed "Embedded" row to "Embeddable", based on comment from Kendall Clark.

Update: Virtuoso fixes: reasoning changed to rules, SPARQL Update support changed to partial. Also extended the clustering row. All based on comments from Kingsley Idehen.

Update: Added a column for Fuseki with the TDB backend, contributed by Andy Seaborne via email.

Update: Added a note that cloud-only products are not considered, based on a comment below from Martynas.

Update: Changed value for Virtuoso access control, based on information from vendor.

Update: Updated Oracle column based on comment from Bill Beauregard.

Update: Added a definition of triple store to clear up the Jena questions and simultaneously clarify that all products provide SPARQL support.

Update: OWLIM is embeddable, as per comment from Borislav Popov.

Update: Sesame is embeddable, as per comment from Laszlo Török.

Update: Virtuoso is embeddable, as per comment from Kingsley Idehen.

Update: Added LMF, based on email from Sergio Fernandez. Added Tinkerpop with Neo4J backend, as suggested by turnguard.

Update: Virtuoso SPARUL is non-standard, not partial.

Update: Stardog 1.1 release adds SPARQL 1.1 support and reasoning with rules. Also added link to Kendall Clark's piece.

Update: Noted that OWLIM Enterprise now supports live backup as of version 5.3. Also replaced "Yes" on reasoning with details. Both thanks to comment from Atanas Kiryakov below.

Update: Added Urika, based on comment from Jerven Bolleman below.

Update: Updated for Stardog 2.0 release, which adds SPARQL Update support.

Update: Updated for Stardog 2.1 release, which improves scalability dramatically.

Update: Updated for Stardog 3.0 release.

Similar posts

Implementing SPARQL with tolog

I realized quite a while ago (a year ago, maybe) that it's possible to implement SPARQL on top of tolog without too much effort

Read | 2005-12-15 23:03

How stone beer was brewed

It's only the last few centuries that metal kettles have become something that most people could afford to own

Read | 2016-12-18 12:47

Semantic Web adoption and the users

A hot topic at ESWC 2013, and many other places besides, was the issue of Semantic Web adoption, which after a decade and a half is still less than it should be

Read | 2013-10-18 15:43

Comments

Thomas Francart - 2012-09-18 06:32:53

Support exists for Sesame, notably from Aduna, the company that has originally developed it, or from independent consultants.

Lars Marius - 2012-09-18 06:54:18

@Thomas: Thank you. Updated now. :)

Ghislain - 2012-09-18 07:48:58

Hello Lars, nice overview. What about Fuseki: http://jena.apache.org/documentation/serving_data/index.html? Could it be added in your table as a triple store worth to investigate?

Regards

Lars Marius - 2012-09-18 08:01:28

@Ghislain: Good question. Fuseki can use different backends, but I guess we could evaluate the Fuseki+TDB combination. I'll have a look and see if I can work it in. Thank you.

Maxime - 2012-09-18 08:41:24

Please note that regarding "Live backup" functionality, links for Allegro and OWLIM are reversed.

Lars Marius - 2012-09-18 08:45:42

@Maxime: Oops. Fixed now. Thank you!

Inge Henriksen - 2012-09-18 10:22:24

Thanks for the mention, Lars Marius.

There is also a nice overview of triplestores here:

http://en.wikipedia.org/wiki/Triplestore#Technical_overview

Kjetil Kjernsmo - 2012-09-19 04:35:42

Very nice overview, Lars Marius. Actually, I didn't think 4store had 100% SPARQL 1.1. It uses Rasqal for parsing queries, and unless that's been updated recently, it doesn't do things like property paths. I may not be uptodate, but you may want to check. Here's the current implementation status: http://www.w3.org/2009/sparql/implementations/

Lars Marius - 2012-09-19 04:53:22

@Inge: Thank you. :)

@Kjetil: Thank you! Updated both 4Store and Sesame.

Martynas - 2012-09-19 08:27:28

What abou cloud services like Dydra? http://dydra.com

Kingsley Idehen - 2012-09-19 08:37:57

Some very important corrections re. Virtuoso.

SPARQL 1.1 Update Syntax is supported as demonstrated by my recent G+ note [1]. The problem with SPARQL Update is that live demonstration requires a public endpoint to support at least one of the following authentication & authorization methods: digest, oAuth, WebID. Virtuoso support all of these too.

As for reasoning , We support forward- and backward-chained reasoning, and this is at massive scales covering: sameAs, inverseFunctionalProperty (IFP), owl:SymmetricalProperty, owl:inverseOf, owl:TransitiveProperty, owl:equivalentProperty, owl:equivalentClass, rdfs:subClassOf, rdfs:subPropertyOf. I also have some G+ notes demonstrating utility of OWL reasoning as delivered by Virtuoso [2].

Fine-grained access control: We support the concept of Views to which WebID ACLs are applied. This is how you constrain access to triples in a practical and scalable manner. Make a SPARQL View and then apply an ACL. I've recently posted some WebID ACLs examples showcasing semantic relationships based ACLs [3].

I am unsure about what you mean by "Constraints" . Ditto "Embedded" .

Links:

1. http://bit.ly/Uo5hP6 -- note juxtaposing our SPASQL and SPARQL 1.1 Update Syntaxes

2. http://bit.ly/OEBP7N -- Virtuoso reasoning related notes and live demos

3. http://bit.ly/NmGbMZ -- WebID based fine-grained ACLs examples showing the power of social relationship semantics applied to resource access.

Thanks!

Kingsley

jerven bolleman - 2012-09-19 09:34:25

I would say that OLWIM can do more than 10 Billion triples. That one of the reasons why we selected it for the beta.sparql.uniprot.org site where there are more than 5 billion triples in production today. Ontotexts linkedlifedata.com demo has 8 billion triples. As we embed OWLIM into our own server code it is embed able, the same is true for BigData.

I don't think there is an Oracle free edition that supports their semantic web extensions. I thought that was bound to their Enterprise Edition with Partitioning extensions (i.e. not free or even cheap ;)

Steve Harris - 2012-09-19 09:44:05

Great summary, good work.

Minor correction: 4store has online ("live") backup, and can be embedded (via a C API, there are Perl bindings for it too).

Kingsley Idehen - 2012-09-19 21:50:36

One addition thing, we support SPARQL 1.1 (Query and Update Language) Syntax. What is it that you feel we don't support? We also support SPARQL Endpoint Descriptions i.e., our /SPARQL endpoint resolves to a description graph etc..

We also support the SPARQL Graph Store Update Protocol.

For Authentication we support Digest, OAuth, and WebID.

Lars Marius - 2012-09-20 05:13:53

@Kingsley: You are right that Virtuoso's SPARQL Update support has improved dramatically since I looked at it last. You do now implement the latest draft, although not all of it (the first three examples in the spec don't work, for example). I've changed that cell to "Partial".

Reasoning: you're right, my bad. Changed to "Rules".

Fine-grained access control: I need to read through that more carefully to understand it. Will do.

Constraints: Can you declare that every person must have a date of birth, and have it checked? Obviously you can query with SPARQL, but that's not the same.

Embedded, now changed to embeddable. Basically, can you run the database in the same process as the application? I'll be the first to admit that it's not something you want to do in every application, but sometimes you do want it.

SPARQL 1.1: Virtuoso supports some of it, but not all. BIND, for example. VALUES. Property paths. I'm sure there's more that's not supported.

Graph store protocol: I didn't add a row for that because I haven't really seen any use for it in our scenarios.

Authentication: That's a very good point. A row "Secure SPARQL" to indicate whether some combination of HTTPS and authentication is supported would be very useful, since many applications need that. I'll add that row when I have time.

Thanks a lot for clearing a number of things up.

Steffen Schlönvoigt - 2012-09-20 15:15:38

Great list, thanks!

Ian Dickinson - 2012-09-23 09:22:23

The support link for Fuseki/TDB should be http://www.epimorphics.com/web/support

Ian

Lars Marius - 2012-09-23 14:50:05

@Ian: Thank you! Added to the table.

Bill Beauregard - 2012-09-25 13:18:03

Hi Lars,

Thanks for taking the time to compile this. I have some corrections for the Oracle column:

Free edition - No. 10 billion statements - Yes. Expected to scale with Oracle Database and appropriate hardware to support petabyte data sets. Constraints - Yes. integrated with Oracle Database SPARQL Built-in - Yes through Jena and Sesame, and integrated with SQL

Thanks and regards, Bill

Lars Marius - 2012-09-25 17:51:48

@Bill: Thank you very much for your corrections. The table has been updated accordingly, except for the last part: having to install Jena/Sesame in addition to Oracle is exactly what I meant when I put "No" in the Oracle cell. The SPARQL Endpoint is an add-on that must be installed separately and administrated separately.

Bill Beauregard - 2012-09-27 16:29:09

Thanks, Lars. I see your point about the extra installation step to install/update the Joseki/Jena end point. However, there is no administration after installation and once the endpoint is installed it is as transparent to the developer as any other product's SPARQL endpoint. The concern is that the reader will miss the fact that Oracle Database does have built-in SPARQL processing. Is it possible to reflect this somehow in the table, perhaps by having 2 rows, one for SPARQL endpoint install (open src/manual) & built-in SPARQL processing (Yes), or combine the attributes in the the row header header as "SPARQL endpoint install/processing built in" (open src-manual/Yes) Thanks and regards, Bill

Lars Marius - 2012-09-28 03:47:36

@Bill: I think that's a reasonable concern. I solved it by adding a paragraph (no 2 from top) setting out my definition of a triple store, which is a persistent store that supports SPARQL. So it should now be clear to everyone that all products in the table support SPARQL. The mouse-over for the "SPARQL built-in" row also makes it clear that the alternatives are: Yes = built-in, No = SPARQL add-on must be installed separately.

And, obviously, there is administration of the separate SPARQL component after installation for the simple reason that every component must receive some degree of administration over its lifetime. Upgrades, restarts, etc etc. But I'll grant you that there's nothing in the Oracle SPARQL add-on that requires any active day-to-day administration and tuning beyond simply keeping a Tomcat (or whatever) instance running.

Bill Beauregard - 2012-09-28 13:27:44

Thanks, Lars. The table at a glance, "SPARQL built in - No" can likely be misinterpreted. Have you considered: "Endpoint built in" or "SPARQL Endpoint built in"? It seems more to your point and unambiguous. Oracle's interest is to make clear that it has BUILT IN SPARQL query evaluation & processing. The phrase in the introductory text "run SPARQL queries against that data" may not be interpreted as such. Thanks, Bill

Lars Marius - 2012-09-28 14:10:55

@Bill: I'm happy to change the name of the row if you feel that's better. Changed now.

borislav popov - 2012-10-03 12:08:54

Great that you spared the effort to make this public. Two comments which come from another perspective:

1. OWLIM is embeddable so you can update the table.

2. It is very interesting when you start applying triple stores in enterprise projects. It is actually completely different than the initial drive for the Sem Web enthusiasts. What the corporations need is quite simplistic reasoning, but augmented with the ability to scale up, distribute for failover and load balancing and not so much load huge data sets, but provide efficient querying of the data.

And here when we say "data" it is more than RDF, it is most likely a combination of ontologies and instance bases with a set of textual documents and annotations linking them into a pretty much hybrid index where you should be able to provide both structured and full-text queries along with all the geospatial and co-occurrence based searches. Another especially interesting aspect is the ranking of the results, where benefit can be drawn both from RDF Rank (similar to the PageRank ideas), co-occurrence frequencies, and also TF.IDF type of scores against the labels in the graph.

Very often when you exploit a triple store in such an environment it comes together with semantic annotation (be it manual or automatic) of documents and then it becomes also very important to keep in synchrony the data in the triple store with the data models of the text analysis components, like gazetteers or machine learning based taggers. In this regard, what is really helpful is a sound notification mechanism provided by the triple store, so all changes can be timely replicated in the text analysis models as well.

Great job and great you shared it - thank you.

Borislav

Lars Marius - 2012-10-04 15:37:40

@Borislav: Thank you for the OWLIM information.

The scenario you describe for enterprise usage is pretty much how we've used triple stores in enterprise projects over the past two years. I don't know that we couldn't use reasoning, though, it's just that so far we haven't.

We actually do combine RDF with the sort of TFIDF approach you describe, by indexing both documents and RDF data with search engines. We still do SPARQL queries for a number of things, but the user-facing interface tends to be a search engine. For information retrieval type applications I think that often makes the most sense.

I've written a paper about the first (and biggest) of these projects, which I hope to publish in some journal soon. Anyway, I just wanted to note that what you write matches our experience very well.

turnguard - 2012-10-22 10:46:13

hi,

since you have b* in your list, you might want to include neo4j also, see http://neo4j.org/

László Török - 2012-10-22 11:51:41

Hi,

thanks, great analysis!

Just a comment on Sesame: it's can be embedded in your JVM application, we're using it for an internal tool here at the Universitát der Bundeswehr in Münich.

Regards,

Laszlo

Philip Fennell - 2012-10-22 11:52:17

From a distributed access point-of-view it would be good if you could also include rows for Triple Stores that support the SPARQL 1.1 HTTP Graph Store and SPARQL 1.1 protocols for talking to these stores over HTTP.

Lars Marius - 2012-10-22 12:40:45

@turnguard: No, I'm not including neo4j, because it doesn't support RDF and SPARQL. If there were some tool that provided RDF and SPARQL support on top of neo4j I might include that.

turnguard - 2012-10-22 13:40:15

@lars, sorry, i just realized that the openrdf-sail component is no longer available (it was and i tested it a couple of month ago)

apparently there are some projects that provide rdf/sparql support for neo4j like this one [1].

wkr turnguard

[1] http://datablend.be/?p=554

Kingsley Idehen - 2012-10-22 14:27:52

Lars,

Please note that we've always had a compact edition of Virtuoso that can be embedded in systems with limited resources (e.g. memory):

See: LiteMode = 0/1 (default 0) in INI. The KDE Desktop uses this variant of Virtuoso, for instance.

Admin guide: http://docs.openlinksw.com/virtuoso/databaseadmsrv.html -- just search on pattern: LiteMode

Lars Marius - 2012-10-23 05:22:52

@Philip: I'm not going to add the graph store protocol, because I can't see any use for it. It seems pretty marginal to me. The SPARQL 1.1 protocol is implicit in the SPARQL 1.1 row.

Kingsley Idehen - 2012-10-24 11:44:01

Lars,

I can concede SPARQL 1.1 as being partial due to property path and BIND syntax sugar not being currently supported (we do have our native alternative for now which we called SPARQL-BI many years ago).

Re. SPARQL Update, what is it that you feel we don't support? Here is an example using SPARQL 1.1 and our SPARUL syntaxes:

## SPARQL 1.1 Syntax

INSERT {GRAPH <http://vocab.deri.ie/pdo.ttl> 
      {  ?s rdfs:isDefinedBy <http://vocab.deri.ie/pdo> . 
        <http://vocab.deri.ie/pdo> <http://open.vocab.org/terms/defines> ?s.
        <http://vocab.deri.ie/pdo> a owl:Ontology .
        ?s <http://www.w3.org/2007/05/powder-s#describedby> <http://vocab.deri.ie/pdo.ttl> .
        <http://vocab.deri.ie/pdo.ttl> foaf:primaryTopic ?s .
      }
    }
WHERE {GRAPH <http://vocab.deri.ie/pdo.ttl> { 
      {?s rdfs:subClassOf ?o} 
      UNION 
      {?s rdfs:subPropertyOf ?o} 
      UNION {?s owl:equivalentClass ?o} 
      UNION {?s owl:equivalentProperty ?o} 
      UNION {?s a ?o}
    } 
  }

## Our SPARUL Syntax

INSERT INTO <http://vocab.deri.ie/pdo.ttl> 
      {   ?s rdfs:isDefinedBy <http://vocab.deri.ie/pdo> . 
        <http://vocab.deri.ie/pdo> <http://open.vocab.org/terms/defines> ?s.
        <http://vocab.deri.ie/pdo> a owl:Ontology .
        ?s <http://www.w3.org/2007/05/powder-s#describedby> <http://vocab.deri.ie/pdo.ttl> .
        <http://vocab.deri.ie/pdo.ttl> foaf:primaryTopic ?s.
      }
FROM <http://vocab.deri.ie/pdo.ttl>
WHERE {   {?s rdfs:subClassOf ?o} 
      UNION 
      {?s rdfs:subPropertyOf ?o} 
      UNION 
      {?s owl:equivalentClass ?o} 
      UNION 
      {?s owl:equivalentProperty ?o} 
      UNION 
      {?s a ?o}
    }

Lars Marius - 2012-10-24 14:39:31

@Kingsley: You put it very well yourself when you say "our SPARUL syntax". The issue is that it's not the standard syntax. I could make that clearer, so I'm changing the cell to "Non-std" now.

Atanas Kiryakov - 2012-12-01 04:52:02

As of release 5.3 OWLIM supports Live Backup, http://www.ontotext.com/news/owlim-5-3-press-release

Reasoning support cannot be handles as simple as yes/no. Full proper Yes, should mean "the engine takes care to interpret the semantics of the data and the answer queries according to it as part of its standard mode of operation, transparently, without extra care from the application or substantial performance penalties".

This is the case only in OWLIM. Any substantial reasoning performed run time on top of big amount of data is impractically slow. Solutions based on materialization, where everything need to be re-inferred from scratch upon deletion of even a single statement are obviously not an instance of "transparent reasoning support"

Lars Marius - 2012-12-01 08:45:09

@Atanas: Thank you! I've updated the OWLIM backup entry now. I agree that answering "Yes" to reasoning is too broad, so I've been more detailed with OWLIM and Tinkerpop now.

Sergei Sheinin - 2013-01-28 10:16:20

I am a developer of a new programming language built atop of EAV database model. It does not contain assumptions about stored data, supports storage of documents in various formats and may store RDF triplets. It offers ample run-time library and enables adding programming logic to documents. Built with EAV at its core it contains functions designed to interact with EAV objects. Data nodes or EAV table columns may be of any datatype supported by DBMS. Its low level syntax is implemented with key-value pairs and supports multiple implementations of higher level syntax-independent coding styles. Its database schema remains unchanged as documents and tables are added.

Jerven Bolleman - 2013-03-25 12:15:43

Noticed that you didn't have yarcdata/uRiKA yet

Requirement uRiKA
Open source No
Free edition No
10 billion statements Yes
Clustering No Yes
SPARQL 1.0 Yes
SPARQL 1.1 Partial
SPARQL Update Yes
Support Yes
Events No
Constraints No
Reasoning Rules
Triple-level security No
Endpoint built in Yes
Live backup No
Embeddable No

Lars Marius - 2013-04-11 02:51:46

@Jerven: Yes, I left out Urika because I couldn't find any real information on it. Thank you for contributing this. I'm adding it to the blog post now, and basically taking your word for the correctness of the information.

Nasreddine - 2013-05-06 05:23:49

Good work. Thanks.

I wish you were add more explication for requirements or provide external source that explain them.

Add a comment

Name required
Email optional, not published
URL optional, published
Comment
Spam don't check this if you want to be posted
Not spam do check this if you want to be posted