Investigate and evaluate hCaptcha to replace Wikimedia's Fancy Captcha
Open, In Progress, HighPublic
Actions

Description

This is not complete and as such, should be considered a WIP. Comments/questions and such below are welcome

Following on from T249854: Add support for hCaptcha, and as a potential solution to T241921: Fix Wikimedia captchas (and the various older incantations).

hCaptcha is an alternative to reCaptcha, without the usual privacy concerns that come with it.

cloudflare did move to hCaptcha, but now have got their own turnstile.

It may still require a change to the WIkimedia's Privacy Policy, as it requires loading JS from an external website, and submitting data back to them, but hCaptchas Privacy Policy is seemingly more in line with what we'd want (IANAL, and would need WMF-Legal review obviously). They're more interested in the aggregate data rather than individual data, and try to discard other data as soon as they can.

hCaptcha are offering donation of websites "earnings" from captchas being solved to the Wikimedia Foundation rather than keeping it for themselves. While I imagine this won't solve all of Wikimedia's funding problems, it's nice that we're considered a good solution for the problem. Obviously, there's the potential of this resulting in captcha solves on Wikimedia sites also helping generate income

The implementation is similar to reCaptcha, selecting images of a certain type etc.

Localisation is done to ~150 languages, and they're planning on open sourcing UI translations onto github, so a chance to expand that further and to help support more languages (which is one goal of the Captcha replacement project, T7309: Localize captcha images, though removing the text strings to be identified and typed out does make that task kinda redundant)

There's also a labelling service we could potentially use with MachineVision instead of the Google services. It would be potentially possible to use our own captchas to help label our own images from commons, somewhat a mix of T87598: Create a CAPTCHA that is also a useful micro edit and T34695: Implement, Review and Deploy Wikicaptcha

Questions:

Does this image matching captcha solution help our Accessibility issues?

Known caveats/issues:

No "no JS" solution (currently)
- Can't serve captcha through API without expecting clients to load JS etc
- Possibility of specifically allowing bots - https://docs.hcaptcha.com/enterprise/friendly_bots
- Newer option: https://docs.hcaptcha.com/enterprise/nojs_clients
Browser support versions will differ from ours - https://docs.hcaptcha.com/faq vs https://www.mediawiki.org/wiki/Compatibility#General_information
Not FOSS
- However, Wikimedia can get access to JS source for auditing purposes
Requires dependency on external server

Useful links:

Details

Other Assignee: EMill-WMF

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Invalid		None	T289607 <Security Initiative> Improving Captcha
		In Progress		kostajh	T250227 Investigate and evaluate hCaptcha to replace Wikimedia's Fancy Captcha

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Neither are Google services we use for MachineVision, and for Google Translate in Content Translation, along with other services from other companies for translation. I think there's probably more too, without digging too deeply. I obviously understand interaction with those is more optional, where a captcha as part of the login flow (and other flows) is not so optional. And also in those cases, the users aren't directly interacting with Google Services, they're doing it via a "proxy" app/API. But in those cases, information like IP address serves no benefit. A translation between two languages is the same wherever you are in the world.

But we do not rely on them to edit and we have plenty of alternatives. Also the requests go through proxies.

As it stands, no one has come forward with an appropriate Free/Open Source solution to the Captcha problem (unless I an everyone else involved has missed someone posting something that is enlightening and solves the problem). And as is clear by the mostly lack of progress on our own Captcha in over a decade, it's clear, that even with the best will in the world, the Foundation and my colleagues don't have all of the large quantities of required knowledge/experience of how to improve our captcha, whilst getting the benefits of l10n/i18n (which generally is something we do do quite well) and accessibility, and even more importantly don't have the time and resources to work on the projects in a capacity to make significant headway *at the same time as all the other work we have to do*.

I once suggested Wikimedia to develop one - T174874: Create a standalone Wikimedia CAPTCHA service

So in the same way we don't use coreboot on our servers, and we use propreitary software switches and routers (I could continue), because of lack of appropriate alternatives. And of course, we don't use FOSS hardware; again, for the same reasons. It's just not practical.

But we do have control on our servers. We do not have control on hCaptcha ones. At least it should be something that can be installed in Wikimedia servers; even if they may contact hCaptcha servers, the Captcha should work without them.

And do bare in mind many community members don't feel as strongly (or in many cases, even care) as you do. How many use Windows? And therefore IE or Edge? Mac? Safari? iPhone? Non free drivers and binaries on Linux systems? In some cases they're forced to (work machines etc), but many by choice. Granted, it's consuming resoucrces using non FOSS, but it's a vein of a similar argument.

Again, nobody requires users to use Windows. But Wikimedia may be going to require (at least new) users to use a non-free third-party service.

In the whole Wikimedia there's very few places that external scripts are loaded - content of MachineVision and Google Translate are already filtered so that they may not do anything bad. Here hCaptcha may theoretically inject arbitrary script to Wikimedia pages.

We have a current effort to replace any external resources, for privacy concerns. See also T135963: Add support for Content-Security-Policy (CSP) headers in MediaWiki

In T250227#6062984, @Bugreporter wrote:

We have a current effort to replace any external resources, for privacy concerns. See also T135963: Add support for Content-Security-Policy (CSP) headers in MediaWiki

Yes, I'm aware of this. But CSP has a whitelisting system for this particular kind of issue. CSP is to stop unwanted and not specifically allowed things from being loaded; not stopping the wanted things that make things work

In T250227#6062984, @Bugreporter wrote:

In the whole Wikimedia there's very few places that external scripts are loaded - content of MachineVision and Google Translate are already filtered so that they may not do anything bad. Here hCaptcha may theoretically inject arbitrary script to Wikimedia pages.

And their functionality and data requirements are different.

Yes, hCaptcha could (hell, we've seent it happen on Wikis enough times too. Sure it doesn't always last long, but it happens) inject arbitary scripts. Either purposefully, or accidentally due to some breach. But that's what contracts are for; so then if they are breached, there's legal ramifications.

In T250227#6062949, @Bugreporter wrote:

Neither are Google services we use for MachineVision, and for Google Translate in Content Translation, along with other services from other companies for translation. I think there's probably more too, without digging too deeply. I obviously understand interaction with those is more optional, where a captcha as part of the login flow (and other flows) is not so optional. And also in those cases, the users aren't directly interacting with Google Services, they're doing it via a "proxy" app/API. But in those cases, information like IP address serves no benefit. A translation between two languages is the same wherever you are in the world.

But we do not rely on them to edit and we have plenty of alternatives. Also the requests go through proxies.

Again, what they do and how they work are different. Solving the captcha (ie the action/work) is only part of the process. Removing information the backend work with, such as IP, makes the service mostly useless. Please read my original responses.

Also, in most cases, most users will not see a Captcha. Certainly, I imagine long registered users won't have seen one on Wikimedia (unless creating an additional account for example) in a long time.

In T250227#6062949, @Bugreporter wrote:

As it stands, no one has come forward with an appropriate Free/Open Source solution to the Captcha problem (unless I an everyone else involved has missed someone posting something that is enlightening and solves the problem). And as is clear by the mostly lack of progress on our own Captcha in over a decade, it's clear, that even with the best will in the world, the Foundation and my colleagues don't have all of the large quantities of required knowledge/experience of how to improve our captcha, whilst getting the benefits of l10n/i18n (which generally is something we do do quite well) and accessibility, and even more importantly don't have the time and resources to work on the projects in a capacity to make significant headway *at the same time as all the other work we have to do*.

I once suggested Wikimedia to develop one - T174874: Create a standalone Wikimedia CAPTCHA service

Great. But I've already answered this question. We only have limited time and resources. Your task was also explcitily declined. Same as many other ideas where people suggest we should branch out and do X.

In T250227#6062476, @Reedy wrote:

As it stands, no one has come forward with an appropriate Free/Open Source solution to the Captcha problem (unless I an everyone else involved has missed someone posting something that is enlightening and solves the problem). And as is clear by the mostly lack of progress on our own Captcha in over a decade, it's clear, that even with the best will in the world, the Foundation and my colleagues don't have all of the large quantities of required knowledge/experience of how to improve our captcha, whilst getting the benefits of l10n/i18n (which generally is something we do do quite well) and accessibility, and even more importantly don't have the time and resources to work on the projects in a capacity to make significant headway *at the same time as all the other work we have to do*.

In T250227#6062949, @Bugreporter wrote:

But we do have control on our servers. We do not have control on hCaptcha ones. At least it should be something that can be installed in Wikimedia servers; even if they may contact hCaptcha servers, the Captcha should work without them.

Again, read my answer about how the captcha works. Passing things through our servers removes that useful information, so we might aswell not bother.

How much control do we necessarily have with propriety firmware etc on them? How many Intel Management Engine type exploits are there out there? Sure, we can limit that by controlling egress, but that doesn't necessarily remove it completely.

In T250227#6062949, @Bugreporter wrote:

Again, nobody requires users to use Windows. But Wikimedia may be going to require (at least new) users to use a non-free third-party service.

And in the same way you think that not using an FOSS solution is a big problem, other people do not. I suspect a decent amount of people that use Wikipedia don't know what this means, nor do they care. They'll happily use it on other sites they use, which are doing whatever with their data. Doesn't mean you're wrong, but certainly doesn't mean you're right either.

Bugreporter mentioned this in T174861: CAPTCHA ineffective, consider using reCAPTCHA.Apr 16 2020, 8:57 PM

In T250227#6062949, @Bugreporter wrote:

Again, nobody requires users to use Windows. But Wikimedia may be going to require (at least new) users to use a non-free third-party service.

This is explicitly not true; a huge number of businesses (I would argue "almost all", though obviously I don't have any hard statistics to back that up) force their employees to use Windows, for a variety of reasons (it's what the tech support on-hand is familiar with; apps the company relies on were written for Windows and it'd be expensive to update or replace them; the company values paid technical support; etc). You can argue that any or all of these should be non-concerns for any business, but you're screaming into an empty amphitheater in that case. Even ignoring this, pretty much any public computer is going to be Windows just because it has the broadest software support and the general public is by far most likely to already be familiar with it.

Reedy updated the task description. (Show Details)Apr 16 2020, 9:40 PM

Many companies have a volume license of Windows, but it is not the case of WMF.

Jdforrester-WMF subscribed.Apr 16 2020, 10:10 PM

In T250227#6064300, @Bugreporter wrote:

Many companies have a volume license of Windows, but it is not the case of WMF.

@Bugreporter: It is entirely irrelevant what WMF folks use on their machines. Please move off-topic Windows license discussions somewhere else. Thanks!

Florian awarded a token.Apr 17 2020, 3:01 PM

Florian subscribed.

alistair3149 subscribed.Apr 17 2020, 5:20 PM

kolbert subscribed.Apr 19 2020, 12:31 AM

kolbert awarded a token.Apr 19 2020, 12:34 AM

Tks4Fish subscribed.Apr 20 2020, 1:07 AM

Rail subscribed.Apr 20 2020, 10:25 AM

Reedy moved this task from Incoming to Back Orders on the Security-Team board.Apr 27 2020, 3:06 PM

Rail awarded a token.Apr 27 2020, 6:25 PM

Pppery subscribed.Aug 14 2020, 7:07 PM

greg subscribed.Sep 22 2020, 2:58 PM

Addshore subscribed.Nov 25 2020, 7:49 PM

Szotsaki subscribed.Nov 29 2020, 5:37 PM

• sdkim subscribed.Dec 10 2020, 10:20 PM

• sdkim added projects: Product Infrastructure Roadmap, Tech-Product API Roadmap.Dec 16 2020, 4:29 PM

Ladsgroup subscribed.Jan 4 2021, 10:23 PM

sbassett mentioned this in T6845: CAPTCHA doesn't work for people with visual impairments.Jan 21 2021, 3:58 PM

I don't think they would need the IP address. If all they want are statistics on the number of requests/solves from an IP address, they could be given a HMAC of the IP address with a secret salt. Plus probably the AS and country of the IP, since I'm sure that's also part of their risk analysis. They couldn't combine requests from wmf users with those from third parties, wikimedia sites would be on its own island, but that's the goal. We have a big enough user base, that I doubt it combining it would really be needed. That, plus proxying the actual image loads (and not letting them insert arbitrary javascript, but using a known-good copy), I think would work wrt privacy. Still not ideal from a FOSS philosophical POV, though.

From an operational perspective, a concern I have is the dependency that is created if using a single vendor for a service like this. If in 5-10 years time, after several mergers and acquisitions, the captcha provider decided to stop providing the service under acceptable terms for us (e.g. they change their terms and are no longer wishing to respect user privacy at all, in order to monetize them), what would we do? It's not like we could stop requiring captchas without an impact. At the very least, the current implementation current would have to be kept at an appropriate level, so it can easily fall back there in such case (or, simply, if the vendor had an outage).

Bugreporter added a project: Software-Licensing.Jan 24 2021, 5:39 AM

In T250227#6771575, @Platonides wrote:

I don't think they would need the IP address. If all they want are statistics on the number of requests/solves from an IP address, they could be given a HMAC of the IP address with a secret salt.

hCaptcha does indeed support such a paradigm by allowing clients to pass blinded end-user IPs to their backend, where they are isolated from the rest of the statistical reputation-scoring hCaptcha performs within the context of their large pool of client data. I cannot find any public-facing documentation for this feature, but I can confirm that it exists and would be a requirement for any proposed Wikimedia implementation.

They couldn't combine requests from wmf users with those from third parties, wikimedia sites would be on its own island, but that's the goal. We have a big enough user base, that I doubt it combining it would really be needed.

There would be a potential downgrade of the performance of hCaptcha's reputation-scoring relative to their standard implementation, but this would still be a vast improvement over FancyCaptcha, which essentially has none.

That, plus proxying the actual image loads (and not letting them insert arbitrary javascript, but using a known-good copy), I think would work wrt privacy. Still not ideal from a FOSS philosophical POV, though.

hCaptcha provides both first-party hosting and full-proxy options for their primary javascript widget and related resources, the latter of which should alleviate all user privacy issues within the context of Wikimedia's current privacy policy. In discussions with hCaptcha, they are also extremely comfortable with Wikimeda/WMF having as much access to relevant source code as possible for audit purposes. As you mentioned, this isn't fully in alignment with certain FOSS philosophies, but is likely the best outcome possible for such a vendor relationship. By contrast, Google currently does not and would likely be unwilling to satisfy any of these requirements with reCaptcha.

From an operational perspective, a concern I have is the dependency that is created if using a single vendor for a service like this. If in 5-10 years time, after several mergers and acquisitions, the captcha provider decided to stop providing the service under acceptable terms for us (e.g. they change their terms and are no longer wishing to respect user privacy at all, in order to monetize them), what would we do? It's not like we could stop requiring captchas without an impact. At the very least, the current implementation current would have to be kept at an appropriate level, so it can easily fall back there in such case (or, simply, if the vendor had an outage).

This is indeed a concern, and one that the Security-Team addressed within a recent WMF-internal risk assessment. FancyCaptcha (or similar) would need to be maintained to some extent as either a fallback captcha system (in the case of service outages) or as a temporary replacement if hCaptcha's terms and/or ethos ever departed significantly from current expectations. This would all likely be codified via contractual agreements between the WMF and hCaptcha, if this option were to move forward.

sbassett triaged this task as High priority.Jan 25 2021, 6:12 PM

sbassett moved this task from Back Orders to Watching on the Security-Team board.

From an operational perspective, a concern I have is the dependency that is created if using a single vendor for a service like this. If in 5-10 years time, after several mergers and acquisitions, the captcha provider decided to stop providing the service under acceptable terms for us (e.g. they change their terms and are no longer wishing to respect user privacy at all, in order to monetize them), what would we do? It's not like we could stop requiring captchas without an impact. At the very least, the current implementation current would have to be kept at an appropriate level, so it can easily fall back there in such case (or, simply, if the vendor had an outage).

This is indeed a concern, and one that the Security-Team addressed within a recent WMF-internal risk assessment. FancyCaptcha (or similar) would need to be maintained to some extent as either a fallback captcha system (in the case of service outages) or as a temporary replacement if hCaptcha's terms and/or ethos ever departed significantly from current expectations. This would all likely be codified via contractual agreements between the WMF and hCaptcha, if this option were to move forward.

Can hCaptcha allow us to create a custom version of service that may be hosted in WMF server? this would significantly reduce the risk of outage and suspension. A non-revocable legal agreement of running the service may also be needed. Note even with it, this may still be much more controversial than T272111.

In T250227#6774900, @Bugreporter wrote:

Can hCaptcha allow us to create a custom version of service that may be hosted in WMF server? this would significantly reduce the risk of outage and suspension. A non-revocable legal agreement of running the service may also be needed. Note even with it, this may still be much more controversial than T272111.

If hCaptcha were to be implemented within Wikimedia production, part of that process would involve creating a custom service that managed the proxied transmission of fully-anonymized data to hCaptcha for evaluation. And ideally said service would provide us more flexibility in migrating to separate or fallback captcha systems, such as FancyCaptcha, if the need arose. I do not believe there would be a way to avoid sending any data to hCaptcha, as that is not possible with their current architecture. But as previously discussed, there are a number of ways (technical, legal, etc) which should make such transactions as secure and private as possible and fully-compliant with the current Wikimedia privacy policy.

Update: this is a fairly interesting blog post from Cloudflare discussing their migration from reCaptcha to hCaptcha. They had many similar concerns over user privacy.

Samwalton9-WMF subscribed.Jan 29 2021, 11:04 AM

• sdkim moved this task from Untriaged to In Review on the Tech-Product API Roadmap board.Feb 3 2021, 8:37 PM

sbassett mentioned this in T281397: Test trust tokens as a captcha alternative for Wikimedia.Apr 28 2021, 6:57 PM

Reedy updated the task description. (Show Details)May 12 2021, 8:44 PM

Chicocvenancio subscribed.May 14 2021, 6:14 PM

Chicocvenancio awarded a token.May 14 2021, 6:23 PM

DrMel subscribed.May 14 2021, 8:14 PM

Psychoslave subscribed.May 24 2021, 6:51 AM

Aklapper mentioned this in T34695: Implement, Review and Deploy Wikicaptcha.Jun 15 2021, 10:55 AM

ppelberg subscribed.Jul 1 2021, 4:36 PM

sbassett mentioned this in T290917: New Service Request Security API Gateway.Sep 13 2021, 8:27 PM

GeneralNotability subscribed.Oct 13 2021, 9:51 PM

Aklapper mentioned this in T289607: <Security Initiative> Improving Captcha.Oct 14 2021, 4:20 PM

Meirae subscribed.Nov 8 2021, 4:50 PM

• DAbad added a parent task: T289607: <Security Initiative> Improving Captcha.Dec 2 2021, 4:23 PM

WRT the accessibility questions in other tasks... https://medium.com/@hCaptcha/accessibility-at-hcaptcha-current-and-future-plans-e99dbf2f3996

Lokal_Profil mentioned this in T302066: Investigate Captcha for web page contact form.Feb 18 2022, 11:15 AM

Lokal_Profil subscribed.

Diskdance subscribed.May 12 2022, 1:27 PM

Varnent subscribed.Jun 27 2022, 10:04 PM

• Naleksuh updated the task description. (Show Details)Oct 7 2022, 2:37 PM

• Naleksuh subscribed.Oct 7 2022, 2:43 PM

Addshore unsubscribed.Jun 27 2023, 12:39 PM

1234qwer1234qwer4 subscribed.Jul 11 2023, 10:53 AM

Blablubbs subscribed.Aug 3 2023, 6:18 PM

kostajh subscribed.Sep 4 2023, 11:34 AM

Deauthorized subscribed.Oct 14 2023, 12:05 AM

Diskdance mentioned this in T333770: Evaluate Cloudflare Turnstile as alternative to FancyCaptcha at Wikimedia.Jan 3 2024, 8:37 AM

Vermont subscribed.Jan 15 2024, 2:07 AM

Trizek-WMF subscribed.Jan 18 2024, 12:51 PM

kostajh added a subtask: T356599: hCaptcha: Implement compatibility with DiscussionTools.Feb 16 2024, 12:51 PM

sbassett mentioned this in T356599: hCaptcha: Implement compatibility with DiscussionTools.Feb 16 2024, 3:02 PM

aliu awarded a token.Mar 19 2024, 4:35 PM

aliu subscribed.

isaacl subscribed.Mar 19 2024, 4:49 PM

isaacl unsubscribed.

isaacl subscribed.

Aklapper added a project: WE4.2 Bot detection.Sep 16 2024, 8:53 AM

Aklapper removed a subscriber: • sdkim.

Reedy added a subtask: T284894: MobileFrontend editor not compatible with ConfirmEdit/reCAPTCHA.Oct 24 2024, 1:15 AM

Reedy removed subtasks: T284894: MobileFrontend editor not compatible with ConfirmEdit/reCAPTCHA, T356599: hCaptcha: Implement compatibility with DiscussionTools.Oct 24 2024, 1:25 AM

Reedy moved this task from Backlog to Feature Requests/Improvements on the ConfirmEdit (CAPTCHA extension) board.Oct 24 2024, 1:32 AM

Reedy mentioned this in T378188: Implement secure enclave mode for hCaptcha.Oct 25 2024, 2:42 PM

Reedy mentioned this in T378194: hCaptcha: Implement no captcha mode (API/no js usage).Oct 25 2024, 4:28 PM

TheresNoTime subscribed.Nov 13 2024, 5:07 PM

Noting also the description is rather out of date

Reedy updated the task description. (Show Details)Nov 13 2024, 5:25 PM

Johannnes89 subscribed.Nov 13 2024, 5:25 PM

Reedy updated the task description. (Show Details)Nov 13 2024, 5:28 PM

Reedy removed a subscriber: • Naleksuh.

kostajh changed the task status from Open to In Progress.Nov 28 2024, 8:36 AM

A_smart_kitten subscribed.Nov 29 2024, 11:57 AM

XXBlackburnXx subscribed.Dec 4 2024, 11:55 PM

matmarex subscribed.Mar 14 2025, 3:03 PM

Reedy mentioned this in T388531: Migrate Security-Team jobs to mw-cron.Mar 17 2025, 6:41 PM

sgrabarczuk subscribed.Jul 9 2025, 10:47 AM

EMill-WMF subscribed.Aug 29 2025, 3:23 AM

As some of you may have seen on the annual plan Meta page, we have been integrating our infrastructure with hCaptcha; it's currently live on test2wiki, and we will post more announcements on wikis as well as on Diff in the coming days.

For more details, particularly the privacy safeguards and risks, see the project page: mw:hCaptcha.

I'd also like invite you to subscribe to the Product Safety and Integrity team newsletter where we'll be keeping you updated on the highlights of our projects as well as the cross-project strategic thinking.

kostajh claimed this task.Sep 4 2025, 6:52 AM

kostajh updated Other Assignee, added: EMill-WMF.

Framawiki subscribed.Wed, Jan 21, 12:28 AM

	Reedy
	Apr 14 2020, 8:31 PM

	F31756707: EVR9uTuXsAATveo.jpg
	Apr 14 2020, 8:31 PM

Investigate and evaluate hCaptcha to replace Wikimedia's Fancy CaptchaOpen, In Progress, HighPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Investigate and evaluate hCaptcha to replace Wikimedia's Fancy Captcha
Open, In Progress, HighPublic
Actions

Related Objects
Search...