Page MenuHomePhabricator

Show usage of Wikimedia Commons provided data across different wiki
Closed, ResolvedPublicFeature

Description

Feature summary (what you would like to be able to do and where):

It will be nice if there were some way which pages across different wikis are using a data page

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

A data page like this https://commons.wikimedia.org/wiki/Data:Wikipedia_statistics/data.tab has a nearly empty links in WhatLinksHere https://commons.wikimedia.org/wiki/Special:WhatLinksHere/Data:Wikipedia_statistics/data.tab and that's while this is used in different Wikipedias

Benefits (why should this be implemented?):

So Commons admins can learn where the data is used to not delete it by mistake or different users can learn what they can with given data in their own wiki, just like provided File or Template usage.

Also purposed as a community wishlist, https://meta.wikimedia.org/wiki/Community_Wishlist/Wishes/Usage_of_Wikimedia_Commons_provided_data_across_different_wiki

First I learnt about this issue from @Tacsipacsi here https://www.wikidata.org/wiki/MediaWiki_talk:Linkscount.js#Global_use_of_Wikimedia_Commons_data

Quoting their message,

It looks like mw:Extension:JsonConfig uses standard API requests to get data from Commons, so Commons has no information about the current request coming from JsonConfig, let alone the originating wiki or page. I don’t see traces of per-wiki storing of the pages accessing a particular data file, either, so probably the only solution is an all-wiki full-text search, which of course will miss a lot of usages due to string concatenation and similar issues. 🙁 —Tacsipacsi

See also: T370378: Explore usage tracking for chart pages and tabular data pages

Event Timeline

Ebrahim updated the task description. (Show Details)
Ebrahim added a subscriber: Tacsipacsi.

We are currently investigating how we can support this in T383446 and can report back here once we confirm this will be feasible.

CCiufo-WMF moved this task from Backlog to Sprint 15 on the Charts board.
CCiufo-WMF edited projects, added Charts (Sprint 15); removed Charts.
bvibber added a project: Schema-change.

Tagging for schema-change as there's a single column addition on globaljsonlinks which in production lives in x1.commonswiki. We can deploy code safely before the schema change in a pinch by disabling the $wgTraclGlobalJsonLinksNamespaces feature flag, in which case it will use canonical namespaces or numeric fallbacks for cases of custom namespaces not present on Commons.

Change #1111690 had a related patch set uploaded (by Bvibber; author: Bvibber):

[mediawiki/extensions/JsonConfig@master] Support for Data: page global usage display

https://gerrit.wikimedia.org/r/1111690

CCiufo-WMF edited projects, added Charts (Sprint 17); removed Charts (Sprint 15).
CCiufo-WMF moved this task from Incoming to Code Review on the Charts (Sprint 17) board.

Change #1111690 merged by jenkins-bot:

[mediawiki/extensions/JsonConfig@master] Support for Data: page global usage display

https://gerrit.wikimedia.org/r/1111690

Jdlrobson-WMF added subscribers: bvibber, Jdlrobson-WMF.

Can you please sign off Chris?

I was happy that this finally works, but I’m not that happy with the performance. I opened https://commons.wikimedia.org/wiki/Data:I18n/Documentation.tab, which I was sure that will have some links, to see how it looks like – and I saw more than 100k links! Usually, query pages show 50 links by default, with up to 500 being available via links/preferences and 5000 being the hard limit that can be reached by tweaking the URL. This page returns over 20 times more than the usual hard limit! I don’t think this is okay from either a frontend performance or a backend/DB performance viewpoint. Please make it so that similarly to GlobalUsage, only a handful of links appear directly on the page, with everything else being available from a – paginated – special page.

I was happy that this finally works, but I’m not that happy with the performance. I opened https://commons.wikimedia.org/wiki/Data:I18n/Documentation.tab, which I was sure that will have some links, to see how it looks like – and I saw more than 100k links! Usually, query pages show 50 links by default, with up to 500 being available via links/preferences and 5000 being the hard limit that can be reached by tweaking the URL. This page returns over 20 times more than the usual hard limit! I don’t think this is okay from either a frontend performance or a backend/DB performance viewpoint. Please make it so that similarly to GlobalUsage, only a handful of links appear directly on the page, with everything else being available from a – paginated – special page.

Yep, that was due to a little corner-cutting porting logic over from GlobalUsage; we hadn't quite realized so many common usages would come up so quickly and skipped the pager. :) Pager code has now been ported over and seems to be working on T371300; once we pass code review we'll deploy, probably early next week.

On very widely used data pages it will still need to hit a lot of rows in the database due to the way the indexes don't quite match what we want to display, but won't have to format them all and return them all as HTML so should be much more efficient.