Maniphest T206560

[Epic] Evaluate alternatives to Blazegraph
Open, HighPublic
Actions

Assigned To

None

Authored By

	Smalyshev
	Oct 9 2018, 6:27 PM

Description

Since Blazegraph project seems to not be active anymore (last commit 2 years ago at https://github.com/blazegraph/database) we need to evaluate if we want to switch to graph DB project that is more actively supported/developed.

The requirements should be:

Full SPARQL 1.1 support, including SPARQL Update
Open source
Can load and run queries on full Wikidata database

Related Objects
Search...

Status	Assigned	Task
Open	None	T206560 [Epic] Evaluate alternatives to Blazegraph
Resolved	Gehel	T206561 Evaluate Virtuoso as alternative to Blazegraph
Resolved	AWesterinen	T275398 Create an updated survey of graph backends for WDQS
Resolved	AWesterinen	T291207 Create list of criteria for graph backend candidates for WDQS
Declined	None	T289561 Evaluate Apache Rya as alternative to Blazegraph
Declined	None	T289621 Evaluate Halyard as alternative to Blazegraph
Declined	None	T289760 Evaluate Oxigraph as alternative to Blazegraph
Declined	None	T290082 Evaluate Apache HBase and RDF4J as alternative to Blazegraph
Open	None	T290240 Evaluate whether RDF Delta is a good idea to have in the backend
Open	None	T290839 Evaluate a double backend strategy for WDQS
Resolved	Gehel	T291903 Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster
Resolved	Gehel	T297473 Evaluate RDF4j API
Resolved	Gehel	T299460 Evaluate the Apache Jena Framework
Duplicate	AWesterinen	T303263 Stable snapshot and test queries to ensure the correctness of query results
Resolved	None	T306724 [EPIC] Create testing strategy for Blazegraph alternatives
Duplicate	None	T301227 Create RDF dataset for testing alternatives to Blazegraph
Open	None	T306725 Decide which Blazegraph-specific SERVICEs will be migrated and how
Open	None	T306726 Identify necessary hardware for testing graph backend candidates
Resolved	AWesterinen	T306727 Split the test strategy into sub-components (scale, functional...)
Resolved	AWesterinen	T306728 Define the data we wish to test on
Resolved	RKemper	T405395 DPE SRE work to enable testing of Blazegraph alternatives
Resolved	Andrew	T406240 Request creation of query-service (blazegraph alternatives) VPS project

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

So9q added a subtask: T290839: Evaluate a double backend strategy for WDQS.Sep 13 2021, 5:19 AM

Hannah_Bast mentioned this in T290839: Evaluate a double backend strategy for WDQS.Sep 14 2021, 12:40 PM

For whoever is interested, I wrote more about the QLever SPARQL engine on this thread: https://phabricator.wikimedia.org/T290839 .

Lucas_Werkmeister_WMDE mentioned this in T183629: rdf:type of statement in WDQS seems to be missing.Sep 15 2021, 8:49 AM

Kjauslin subscribed.Sep 17 2021, 1:28 PM

• toan subscribed.Sep 27 2021, 6:39 AM

Michael subscribed.Sep 27 2021, 10:23 AM

DD063520 subscribed.Oct 5 2021, 8:33 AM

I imported the wikidata-DB into neo4j and it works quite well.

In T206560#7430950, @AndreasKuczera wrote:

I imported the wikidata-DB into neo4j and it works quite well.

Can you be more specific? When we tested Wikidata on Neo4j several years ago, it worked in principle, but the performance was unacceptable. In particular, Neo4j does not efficiently support all kinds of JOIN operations that occur in typical SPARQL queries. Could you time a few SPARQL queries on your Neo4j instance and report the results here? That would be very helpful. For starters, you can simply pick some example queries from https://query.wikidata.org

Create a query set (extracted from the WDQS log) and a Wikidata subset of data to benchmark against graph databases (such as TPC).
Ask graph database vendors to test their products and publish the results to the community.

See
http://tpc.org/
https://github.com/socialsensor/graphdb-benchmarks

Consider relational databases with particular schema as graph backends

Daniel Hernández, Aidan Hogan, Cristian Riveros, Carlos Rojas, Enzo Zerega. "Querying Wikidata: Comparing SPARQL, Relational and Graph Databases". In the Proceedings of the 15th International Semantic Web Conference (ISWC), Kobe, Japan, October 17–21, 2016

http://aidanhogan.com/docs/wikidata-sparql-relational-graph.pdf

Consider property graph back ends such as Neo4J and TigerGraph

Kovács, T., Simon, G., & Mezei, G. (2019). Benchmarking Graph Database Backends—What Works Well with Wikidata?. Acta Cybernetica, 24(1), 43-60. https://doi.org/10.14232/actacyb.24.1.2019.5

@So9q @AndreasKuczera @Versant.2612 why are you polluting the thread by suggesting projects/products that clearly do not meet the requirements? This includes Ontop, JanusGraph, TigerGraph, Neo4J etc.

“Pollution” is a strong word that comes off as needlessly hostile. It seems
prudent and rational to get a broad sense of the landscape(and where it is
moving). The Wikidata data model is not trivially 1:1 with RDF/SPARQL and
there may be scope for hybrid solutions.

I would agree @DanBri.

@DanBri I would agree if this issue was not specifically about "alternatives to BlazeGraph" (RDF triplestore), with explicit requirements. Finding such alternative will already be difficult if not impossible, mostly due to the open-source requirement.

If you want non-RDF solutions be evaluated as well, then I think a separate issue should be created. But I doubt it has a chance of being completed within any reasonable timeframe.

Iamamz3 subscribed.Nov 3 2021, 11:48 AM

There is this ticket Create list of criteria for graph backend candidates for WDQS to help reason about the future choice.

YULdigitalpreservation subscribed.Nov 11 2021, 2:15 PM

Hey all, apologies if this has already been covered elsewhere, but I'm curious why Apache Jena Fuseki is not on the list of Blazegraph alternatives? It seems to meet the We've used Jena from time to time and really like it (it has a lot of features out of the box), but if there's been a previous analysis and it was not worth considering for WDQS's needs I'd love to learn from that.

AndySeaborne subscribed.Nov 19 2021, 10:37 PM

In T206560#7517212, @BenAtOlive wrote:

Hey all, apologies if this has already been covered elsewhere, but I'm curious why Apache Jena Fuseki is not on the list of Blazegraph alternatives? It seems to meet the We've used Jena from time to time and really like it (it has a lot of features out of the box), but if there's been a previous analysis and it was not worth considering for WDQS's needs I'd love to learn from that.

I think only because so far no-one has brought it up. Please add a ticket for it with additional information.

nguyenm9 subscribed.Nov 20 2021, 2:54 PM

I am taking the liberty to polute the thread with a reference to "MillenniumDB: A Persistent, Open-Source, Graph Database" https://arxiv.org/pdf/2111.01540.pdf from November 2021. Millennium may have some serious limitations in terms of requirements that can be setup, but interestingly they write "However, MillenniumDB was designed with the complete version of Wikidata – including qualifiers, references, etc. – in mind." and their benchmarks seems strong. They compare against Blazegraph, Jena, Virtuoso and Neo4J.

accounting_data_logger subscribed.Dec 10 2021, 7:15 PM

In T206560#7562538, @Fnielsen wrote:

I am taking the liberty to polute the thread with a reference to "MillenniumDB: A Persistent, Open-Source, Graph Database" https://arxiv.org/pdf/2111.01540.pdf from November 2021. Millennium may have some serious limitations in terms of requirements that can be setup, but interestingly they write "However, MillenniumDB was designed with the complete version of Wikidata – including qualifiers, references, etc. – in mind." and their benchmarks seems strong. They compare against Blazegraph, Jena, Virtuoso and Neo4J.

Thanks for the pointer! Here are my first impressions from reading the paper:

The engine is based on similar ideas as QLever. However, QLever is around for 5 years already, which the authors fail to acknowledge. I am sure they didn't do it on purpose though. I wrote to them.

Like QLever, their engine currently is read-only and does not support SPARQL Update operations. Given the design of their engine, this is not something that will be easy to add.

Their engine is currently very far away from SPARQL 1.1 support. In the current version, even basic features like GROUP BY and mathematical expressions are missing. I am not sure whether they actually strive for SPARQL 1.1 support, since the motivation expressed in the paper goes more in the direction of a more general data model that is independent of a particular query language. Anyway, adding full SPARQL 1.1 support would be a lot of work, as we know from experience.

I find the evaluation misleading. Right at the beginning of their evaluation section, in Section 5.1, they claim that their engine is 30 times faster than Virtuoso for very simple queries (consisting of a single triple). We know Virtuoso very well and have compared it with QLever extensively. Virtuoso is a very mature and efficient engine and hard to beat, even on more complex queries. On simple queries, there are natural barriers to what can be achieved, and Virtuoso often (though not always) does the optimal thing. I think the authors either did not configure Virtuoso optimally or they stumbled on an artefact without being aware of it. Namely, Virtuoso is rather slow when it has to produce a very large output. That is not a weakness of their query processing engine, but of the way they translate their internal IDs to output IRIs and literals.

@KingsleyIdehen maybe you can provide some feedback concerning @4, in particular, the last two sentences.

We can be objective about feature support.

The working group tests for SPARQL 1.1 (updated for RDF 1.1) are maintained by the community: https://w3c.github.io/rdf-tests/.

They have reasonable coverage of features.

In addition, engines can and do support more of "XPath and XQuery Functions and Operators 3.1" than the minimal required by the SPARQL REC.

https://www.w3.org/TR/xpath-functions-3/

JohannesKalmbach subscribed.Dec 13 2021, 1:35 PM

Dr.uesenfieber subscribed.Dec 20 2021, 1:06 PM

Bovlb subscribed.Dec 21 2021, 3:46 AM

nguyenm9 added a comment.Dec 24 2021, 11:40 AM

This comment was removed by nguyenm9.

also, any thoughts on https://cambridgesemantics.com/anzograph/ ?

"Horizontally Scalable Graph Database Built for Online Analytics and Data Harmonization"

it looks like anzograph could handle 1 trillion triples back in 2016.

taavi unsubscribed.Dec 24 2021, 11:20 PM

Are there any timescale/triple scale goals currently being stated?

With a baseline minimum of 1B triples/3 months, and assuming a 5-10 year goal for any choice, that gets to 36B-56B triples minimum and it could easily exceed that.

So9q closed subtask T290082: Evaluate Apache HBase and RDF4J as alternative to Blazegraph as Declined.Jan 28 2022, 5:30 PM

driib subscribed.Jan 29 2022, 2:39 PM

• MPhamWMF moved this task from Epics to Current work on the Wikidata-Query-Service board.Jan 31 2022, 4:38 PM

• MPhamWMF added a project: Discovery-Search (Current work).

• MPhamWMF moved this task from Incoming to Epics on the Discovery-Search (Current work) board.Jan 31 2022, 4:44 PM

Lectrician1 subscribed.Jan 31 2022, 10:07 PM

So9q added a subtask: T301227: Create RDF dataset for testing alternatives to Blazegraph.Feb 8 2022, 9:50 AM

Eposthumus subscribed.Feb 10 2022, 1:26 PM

SirkoS subscribed.Feb 28 2022, 2:38 PM

Query performance is an important point to consider - I found a query that will run one million time slower in one database engine than in another one

CtrlZvi subscribed.Mar 27 2022, 4:39 PM

AWesterinen closed subtask T275398: Create an updated survey of graph backends for WDQS as Resolved.Apr 1 2022, 4:23 PM

AWesterinen subscribed.Apr 11 2022, 3:07 PM

• MPhamWMF closed subtask T289561: Evaluate Apache Rya as alternative to Blazegraph as Declined.Apr 22 2022, 6:49 PM

• MPhamWMF closed subtask T289621: Evaluate Halyard as alternative to Blazegraph as Declined.

• MPhamWMF closed subtask T289760: Evaluate Oxigraph as alternative to Blazegraph as Declined.Apr 22 2022, 6:51 PM

• MPhamWMF added a subtask: T306724: [EPIC] Create testing strategy for Blazegraph alternatives.

In T206560#7775800, @Bugreporter wrote:

Query performance is an important point to consider - I found a query that will run one million time slower in one database engine than in another one

Claims without evidence, such as that quoted above, are generally not helpful for evaluations such as this.

It would be helpful to all if you would post the query you describe, as well as the details of your testing — such as which engine(s) you tested (including name and version), on which OS (including version), on what hardware (including processor, bitness, and RAM), whether the engine & data were in a "hot/warm" state or just past cold start, etc.

Testing your query against current public endpoints and posting details of those results would also be helpful.

YOUR1 subscribed.Jun 1 2022, 11:37 AM

YOUR1 unsubscribed.

YOUR1 subscribed.

AWesterinen mentioned this in T225205: Support for named graphs in SPARQL query federation.Jun 13 2022, 5:00 PM

• MPhamWMF closed subtask T306724: [EPIC] Create testing strategy for Blazegraph alternatives as Resolved.Aug 8 2022, 2:20 PM

Gehel closed subtask T206561: Evaluate Virtuoso as alternative to Blazegraph as Resolved.Aug 15 2022, 1:02 PM

Gehel closed subtask T297473: Evaluate RDF4j API as Resolved.

Gehel closed subtask T299460: Evaluate the Apache Jena Framework as Resolved.

pmokeefe subscribed.Nov 19 2022, 9:31 PM

bking subscribed.Jan 12 2023, 5:16 PM

Jelabra subscribed.Jan 19 2023, 4:03 PM

cicalese removed a project: MediaWiki-Stakeholders-Group.Feb 24 2023, 2:26 AM

Sj mentioned this in T330525: Migrate Wikidata off of Blazegraph.Feb 24 2023, 8:06 PM

Kristbaum subscribed.Mar 27 2023, 8:55 AM

• MPhamWMF moved this task from Current work to Epics on the Wikidata-Query-Service board.Apr 11 2023, 6:44 PM

• MPhamWMF removed a project: Discovery-Search (Current work).

• roti_WMDE subscribed.May 11 2023, 3:45 PM

Frostly subscribed.Dec 8 2023, 1:15 AM

AndrewTavis_WMDE subscribed.Dec 14 2023, 4:37 PM

TuukkaH subscribed.May 13 2024, 10:42 AM

diegodlh subscribed.Jul 7 2024, 8:55 PM

VIGNERON subscribed.Aug 22 2024, 7:55 AM

Count_Count subscribed.Sep 6 2024, 9:47 AM

Zache mentioned this in T376979: Figure out the future of Wikimedia Commons Query Service (WCQS).Jan 6 2025, 9:38 AM

mxn subscribed.Mar 23 2025, 5:18 PM

Lens0021 subscribed.Oct 13 2025, 3:26 PM

Pfps subscribed.Nov 4 2025, 2:13 AM

bking added a subtask: T405395: DPE SRE work to enable testing of Blazegraph alternatives.Nov 17 2025, 6:06 PM

Daniel_Mietchen mentioned this in T413097: Raise quota on wikiqlever so that an instance with 256 GB RAM and 3 x 4 TB SSD can be launched.Dec 20 2025, 1:54 AM

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update mentions this task so maybe posting this request here will be effective.

There are several topics in the discussion page https://www.wikidata.org/wiki/Wikidata_talk:SPARQL_query_service/WDQS_backend_update that have been present for some time but that have not received any response.

This is especially concerning as the page states: "This page is the central hub for updates, background information, and community discussions related to the migration. "

https://www.wikidata.org/wiki/Wikidata:Wikidata_Platform_team/Newsletter_November_2025 is marked as inactive, with rather strong warnings about not being relevant. It this really the case?

Pfps unsubscribed.Jan 12 2026, 1:10 PM

Pfps subscribed.

Hi @Pfps , the topics on the backend update page are being responded to now. Apologies for missing them - there was a large amount of ownership handoff of documentation when our new team started and subscribing to this page to listen for comments from the community got lost in the shuffle. We are listening to the page now to ensure it can reliably serve as the central hub for updates we denoted it as.

The newsletter page you shared is likely marked as inactive following the publication of a newer version (see here). We slightly revamped this process in the last month, which may have also caused some confusion. Please see the page linked in this comment for our current monthly report and details on how to subscribe to upcoming newsletters.

OK, there is a newer newsletter. But that's not a newer version of the information in the November newsletter, as far as I can tell. The wording in the inactive banner contains: "Either the page is no longer relevant or consensus on its purpose has become unclear." I don't think that either of these are the case and those who see the wording are likely to be misled.

you are totally right @Pfps and thanks for flaging that. I have removed the inactive banner

Radim.kubacki subscribed.Thu, Jan 29, 10:03 PM

	F34123693: image.png
	Feb 26 2021, 12:52 PM

[Epic] Evaluate alternatives to BlazegraphOpen, HighPublicActions

Description

Related ObjectsSearch...

Event Timeline

[Epic] Evaluate alternatives to Blazegraph
Open, HighPublic
Actions

Related Objects
Search...