Fix printing of emoji on Windows when stdout is redirected#3374
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
JelleZijlstra
left a comment
There was a problem hiding this comment.
Thanks for your contribution!
Monkeypatching sys.stdout feels quite risky though. A few questions:
- What will happen on Windows terminals that use an encoding incompatible with UTF-8? I'm worried we'll get mojibake in that case.
- Could this be fixed in
clickinstead? We useclick.echoto output all emoji, andclickis supposed to take care of output encoding for us. - The
pyclnissue reported gettingUnicodeEncodeError, while the OP of the Black issue saw literal\u1234output instead. Do you know why that is?
This isn't possible on ~"modern" Windows terminals (which I means Windows 10+, even the built in CMD.exe understands utf-8) Also from python 3.7+ the output is utf8 when stdout is connected to a TTY, so this disparity only occurs when not in a TTY. So in short I think the risk of mojibake is tiny. And whatever risk there is already applies without this change and you'd get an encoding error on trying to convert the extended unicode emoji character into cp1252. One option I could do here is to limit this to Windows 10+. They should remove the risk entirely.
It can, but for some reason it felt like a more disruptive change to make there rather then just to a single project (black) that we know we already expect to be able to print utf-8/high unicode.
I suspect the Unicode escape is something to do with how the output is captured by pre-commit. I can dig into that more if you'd like confirmation |
|
This is funny to even mention but since Black works on Python 3.7+, it will run on Windows 7, Python 3.9 and newer don't support Windows 7 but still support So, while decreasingly likely, it is certainly possible for a user of an older Windows to run Black. I would add the version check to avoid regressions. Skipping an emoji and showing edit: turns out Python's requirements are Visual Studio 2017 requirements which means no Windows Vista and Windows 8 for you. Windows 7 and 8.1 is fine. |
|
So adding a >>> platform.uname()
uname_result(system='Windows', node='sinope', release='10', version='10.0.22621', machine='AMD64')
>>> platform.release()
'10' |
|
Ah I've done a bit of digging and technically it's not even all versions of windows 10, but anyone since the Creators update, build 1809, so I'll make the check even more detailed. (And I've checked that this will still work on Github Actions too. On Server 2022 it returns |
| if ( | ||
| "pytest" not in sys.modules | ||
| and platform.system() == "Windows" | ||
| and tuple(map(int, platform.version().split("."))) >= (10, 0, 1809) |
There was a problem hiding this comment.
Can't decide if I love or hate that last line. Advantages:
- it's correct (as it uses
intcomparisons); - it performs well (as the
andshort circuits for non-Windows users); - fits in a single line.
The disadvantage is that it reads pretty clumsily ("make a tuple out of an int map of the string that platform.version() returns split by the period... and then compare it to (10, 0, 1809)"). Especially that int map bit.
But I guess the advantages outweigh my personal preference here.
There was a problem hiding this comment.
Pretty much sums up my feelings when writing it!
This is most apparent when black is run via pre-commit which does `subprocess.Popen(stdout=PIPE)`. One option around this would be to instruct users to set `PYTHONUTF8=1`, but that is not a very good approach, as basically everything supports UTF-8 these days. This fix was inspired by pycln and hadialqattan/pycln#54
On older versions of Windows the cmd.exe can't display UTF-8, so this would likely result in mojibake. By limiting it to Windows10 we _know_ that displaying UTF-8 will work
It _might_ work on earlier versions, but this is where `CreatePseudoConsole` and the ConHost overhaul was released which definitely has universal utf-8 support. See https://learn.microsoft.com/en-us/windows/console/createpseudoconsole
Co-authored-by: Łukasz Langa <[email protected]>
|
Any other comments or changes requested @ambv @JelleZijlstra? |
ichard26
left a comment
There was a problem hiding this comment.
Thank you! I appreciate all of your in-depth investigation, I don't understand what's wrong here myself, but this seems safe have read your comments.
I'll reach out to the click folks tomorrow, but I agree it's very possible click won't fix this for us as it (seems to) requires monkeypatching the standard streams.
Sorry for taking so long to get back to you!
|
|
||
|
|
||
| def patched_main() -> None: | ||
| #: Fixes errors with emoji in Windows terminals when output is redirected |
There was a problem hiding this comment.
| #: Fixes errors with emoji in Windows terminals when output is redirected | |
| #: Fixes errors with emoji in Windows terminals when output is redirected |
|
Coming back to this, I'm not enthusiastic about including this in Black:
|
|
If we don't accept this (and I entirely understand why) is there perhaps a place in the docs that I could document the work around to users (setting |
|
Mentioning that workaround in the docs seems fine. |
|
One comment on the "We're monkeypatching the standard library, always a risky thing to do" -- if that is the only concern then I could replace the |
|
Looks like this won't be accepted so I'll close the pr |
This is most apparent when black is run via pre-commit which does
subprocess.Popen(stdout=PIPE).One option around this would be to instruct users to set
PYTHONUTF8=1,but that is not a very scalable approach, as basically everything supports
UTF-8 these days.
This fix was inspired by pycln and hadialqattan/pycln#54
Fixes #3156
Checklist - did you ...
CHANGES.mdif necessary?