Page MenuHomePhabricator

Add support for SPARQL to retrieve data from Wikidata
Open, LowPublicFeature

Description

It would be really nice to add support in the new Charts extension for SPARQL request to retrieve data.
I used this feature with the previous extension. One can see one example here: https://fr.wikipedia.org/wiki/Mod%C3%A8le:D%C3%A9sint%C3%A9gration_du_neutron_libre/Dur%C3%A9e_de_vie_moyenne_du_neutron

Another example from T396269: population graphs on euwiki doing sparql queries on the old Graphs bit:

https://eu.wikipedia.org/wiki/Txantiloi:Biztanleria_grafiko_automatikoa

as in

``
{{biztanleria grafiko automatikoa|barrak=14}}
``

on https://eu.wikipedia.org/wiki/Altsasu

Most likely best way to implement this is by exposing WDQS queries to Lua via Extension:ExternalData or similar, see also REST needs for T393500

Event Timeline

Nemoralis changed the subtype of this task from "Task" to "Feature Request".Nov 30 2024, 1:00 PM

Definitely essential as it is used on many Wikipedia articles (and has the potential to illustrate with useful data even more!)

Actually, neither of these two examples (French and Basque) require SPARQL – the existing Wikibase Scribunto interface is powerful enough to get the data they need, so T396532: Add support for inline data in charts is enough for them to be rewritable.

For use cases that do need SPARQL, we have a problem: the parser cache. The existing Wikibase Scribunto interface takes care of purging the parser cache (i.e. re-rendering the page when Wikidata data changes), the mechanism used by Charts to load data from Commons Data:*.tab pages takes care of purging the parser cache, but as powerful SPARQL is, it’s hard to keep track of what data it uses, which would be required for purging the cache at the right time. The only solution that comes to my mind is reducing the cache expiry time (i.e. the time after which the content from the cache even if no change is detected): instead of the usual thirty days, delete pages using SPARQL data from the cache after one day – this means a higher load on the servers as they need to parse these pages up to thirty times as often (realistically this will be lower, as people don’t read all pages every day, but still an increase), yet outdated data can appear on pages for up to 24 hours.

I don't know how cache was handled in graphs, I understand that data was cached in the same way you mention. We should have *at least* the same data ingesting methods as we had before. So, if the data should be, for technical reasons, cached for some days, let it be cached.

Actually, neither of these two examples (French and Basque) require SPARQL – the existing Wikibase Scribunto interface is powerful enough to get the data they need, so T396532: Add support for inline data in charts is enough for them to be rewritable.

T388616 will work for those, it allows both adding datapoints with arguments and allows lua to talk to extension:chart. It also has an user specified cache expiry. It is just waiting for final signoff and config change. It is allready known how to code in lua with that feature.

I don't know how cache was handled in graphs, I understand that data was cached in the same way you mention. We should have *at least* the same data ingesting methods as we had before. So, if the data should be, for technical reasons, cached for some days, let it be cached.

I haven’t looked into the source code, but since Graph rendered graphs in the browser, I guess it just didn’t cache anything, which is bad for performance, but easy to do when one writes code that runs in the browsers. Since Charts pre-renders a version of the graph for browsers with JS disabled (which is a very good thing, one that I really missed in Graph!), it needs to get the data on the server, where complete lack of caching is not an option (having enough servers to serve content without caching is simply impossible). By the way, I very much agree with you that all functionality of Graph should be restored (except maybe for features that were entirely unused, if there were any), I just wanted to highlight the challenges the new architecture brings.

T388616 […] also has an user specified cache expiry.

I don’t want a user-specified cache expiry, I want automatic cache invalidation. If data doesn’t change for an entire month (which is the case in the vast majority of months in the vast majority of articles), I don’t want to put extra load on the servers, but if I edit Wikidata, I want to see the result of my edit immediately (within minutes). The Wikibase Scribunto interface (in cases where it’s powerful enough) is perfectly capable of handling this, we just need a way to feed its result into Charts.