Page MenuHomePhabricator

Wrong error - Your translation contains a total of 98% of unmodified text
Closed, ResolvedPublicBUG REPORT

Description

but my translation actually has 0% of the same text, except the names.

Screenshot 2025-01-12 at 13-34-44 翻譯頁 - 維基百科,自由嘅百科全書.png (879×1 px, 185 KB)

you can go into the server and check the raw text of my draft.


A minimal test case (quick link for translation ) and video showing the issue below:

Event Timeline

The problem is critical because it prevents me from publishing.

But is it the same as provided by automatic translation or you modified the automatic translated text?

But is it the same as provided by automatic translation or you modified the automatic translated text?

photo_2025-01-15_11-34-40.jpg (586×1 px, 83 KB)

supposedly i did not use machine translation but copy the original text, but another yellow warning says my text is 71% identical to machine.

Also, i just tried machine translation available as of right now, (only google translate is available). the google translated text is zh-hant, not yue, so there's not much overlap between my text and a google machine translated text.

Nikerabbit triaged this task as Medium priority.Jan 16 2025, 8:54 AM
Nikerabbit moved this task from Needs Triage to Bugs on the ContentTranslation board.

by the way, this is not a new problem. it's been around for at least several years, since when i started using CX.

but i usually could bypass the warnings by making some changes. this time no matter what i do the warning persists and prevents publishing.

and from my impression this kind of faulty warnings only occur when i translate into a kanji script, but not when i translate into latin script. my guess is probably how your detection mechanism fails to properly recognise / calculate cjk chars (or maybe even any non-latin chars).

Also, i just tried machine translation available as of right now, (only google translate is available). the google translated text is zh-hant, not yue, so there's not much overlap between my text and a google machine translated text.

Thanks for flagging this! This seems to be a bug in the configuration. Google supports Cantonese now, but the configuration was not properly updated and it still has a redirect to zh-hant instead. I created a ticket to capture the issue in more detail: T383863: Adjust Google Configuration to expose Cantonese MT instead of Chinese

But is it the same as provided by automatic translation or you modified the automatic translated text?

photo_2025-01-15_11-34-40.jpg (586×1 px, 83 KB)

supposedly i did not use machine translation but copy the original text, but another yellow warning says my text is 71% identical to machine.

I created a test case page and a video to illustrate the issue. Added them to the description for visibility. I think this can help engineers to identify the origin of the bug.

Pginer-WMF raised the priority of this task from Medium to High.Jan 16 2025, 10:28 AM
Pginer-WMF updated the task description. (Show Details)
Pginer-WMF added a project: LPL Essential.

I made a simple codepen where we can test in isolation and troubleshoot the algorithm that checks how much the automatic translation has been modified. It can be used to understand why the scenario illustrated in @Pginer-WMF's video is still considered with too much unmodified text no matter how many "test " you add to the text. However, I'm not sure how to go about retrieving the automatic translation and modified text from the original reporter's draft.

I made a simple codepen where we can test in isolation and troubleshoot the algorithm that checks how much the automatic translation has been modified. It can be used to understand why the scenario illustrated in @Pginer-WMF's video is still considered with too much unmodified text no matter how many "test " you add to the text. However, I'm not sure how to go about retrieving the automatic translation and modified text from the original reporter's draft.

The translation debugger tool may be helpful. There is more info and a short video tutorial on how to use it in this documentation page.

SBisson lowered the priority of this task from High to Medium.Feb 7 2025, 3:24 PM

Also, i just tried machine translation available as of right now, (only google translate is available). the google translated text is zh-hant, not yue, so there's not much overlap between my text and a google machine translated text.

Hi @RoyZuo, the configuration has been updated. Now the machine translation in Content Translation should match the new support from Google Translate for Yue. Feel free to share any thoughts in T383863: Adjust Google Configuration to expose Cantonese MT instead of Chinese
Thanks!

@Pginer-WMF thx a lot for the notice!
however, in the mean time over all these years since the phab tickers asking for reenabling machine translation function for canto were initially filed, i've stopped editing the esoteric and hostile yuewp, and also transitioned to a style of cantonese writing with miminal sinitic and sinocentric influences.
so, currently publicly available translation services, which all train their models using a strongly sinocentric system, is not of much use to me anymore.

Nikerabbit claimed this task.

Could not reproduce the test case in the video.