Skip to content

Commit 6ea4186

Browse files
authored
bpo-28180: Implementation for PEP 538 (#659)
- new PYTHONCOERCECLOCALE config setting - coerces legacy C locale to C.UTF-8, C.utf8 or UTF-8 by default - always uses C.UTF-8 on Android - uses `surrogateescape` on stdin and stdout in the coercion target locales - configure option to disable locale coercion at build time - configure option to disable C locale warning at build time
1 parent 0afbabe commit 6ea4186

File tree

14 files changed

+699
-55
lines changed

14 files changed

+699
-55
lines changed

Doc/using/cmdline.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -713,6 +713,42 @@ conflict.
713713

714714
.. versionadded:: 3.6
715715

716+
717+
.. envvar:: PYTHONCOERCECLOCALE
718+
719+
If set to the value ``0``, causes the main Python command line application
720+
to skip coercing the legacy ASCII-based C locale to a more capable UTF-8
721+
based alternative. Note that this setting is checked even when the
722+
:option:`-E` or :option:`-I` options are used, as it is handled prior to
723+
the processing of command line options.
724+
725+
If this variable is *not* set, or is set to a value other than ``0``, and
726+
the current locale reported for the ``LC_CTYPE`` category is the default
727+
``C`` locale, then the Python CLI will attempt to configure the following
728+
locales for the ``LC_CTYPE`` category in the order listed before loading the
729+
interpreter runtime:
730+
731+
* ``C.UTF-8``
732+
* ``C.utf8``
733+
* ``UTF-8``
734+
735+
If setting one of these locale categories succeeds, then the ``LC_CTYPE``
736+
environment variable will also be set accordingly in the current process
737+
environment before the Python runtime is initialized. This ensures the
738+
updated setting is seen in subprocesses, as well as in operations that
739+
query the environment rather than the current C locale (such as Python's
740+
own :func:`locale.getdefaultlocale`).
741+
742+
Configuring one of these locales (either explicitly or via the above
743+
implicit locale coercion) will automatically set the error handler for
744+
:data:`sys.stdin` and :data:`sys.stdout` to ``surrogateescape``. This
745+
behavior can be overridden using :envvar:`PYTHONIOENCODING` as usual.
746+
747+
Availability: \*nix
748+
749+
.. versionadded:: 3.7
750+
See :pep:`538` for more details.
751+
716752
Debug-mode variables
717753
~~~~~~~~~~~~~~~~~~~~
718754

Doc/whatsnew/3.7.rst

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,51 @@ Summary -- Release highlights
7070
New Features
7171
============
7272

73+
.. _whatsnew37-pep538:
74+
75+
PEP 538: Legacy C Locale Coercion
76+
---------------------------------
77+
78+
An ongoing challenge within the Python 3 series has been determining a sensible
79+
default strategy for handling the "7-bit ASCII" text encoding assumption
80+
currently implied by the use of the default C locale on non-Windows platforms.
81+
82+
:pep:`538` updates the default interpreter command line interface to
83+
automatically coerce that locale to an available UTF-8 based locale as
84+
described in the documentation of the new :envvar:`PYTHONCOERCECLOCALE`
85+
environment variable. Automatically setting ``LC_CTYPE`` this way means that
86+
both the core interpreter and locale-aware C extensions (such as
87+
:mod:`readline`) will assume the use of UTF-8 as the default text encoding,
88+
rather than ASCII.
89+
90+
The platform support definition in :pep:`11` has also been updated to limit
91+
full text handling support to suitably configured non-ASCII based locales.
92+
93+
As part of this change, the default error handler for ``stdin`` and ``stdout``
94+
is now ``surrogateescape`` (rather than ``strict``) when using any of the
95+
defined coercion target locales (currently ``C.UTF-8``, ``C.utf8``, and
96+
``UTF-8``). The default error handler for ``stderr`` continues to be
97+
``backslashreplace``, regardless of locale.
98+
99+
.. note::
100+
101+
In the current implementation, a warning message is printed directly to
102+
``stderr`` even for successful implicit locale coercion. This gives
103+
redistributors and system integrators the opportunity to determine if they
104+
should be making an environmental change to avoid the need for implicit
105+
coercion at the Python interpreter level.
106+
107+
However, it's not clear that this is going to be the best approach for
108+
the final 3.7.0 release, and we may end up deciding to disable the warning
109+
by default and provide some way of opting into it at runtime or build time.
110+
111+
Concrete examples of use cases where it would be preferrable to disable the
112+
warning by default can be noted on :issue:`30565`.
113+
114+
.. seealso::
115+
116+
:pep:`538` -- Coercing the legacy C locale to a UTF-8 based locale
117+
PEP written and implemented by Nick Coghlan.
73118

74119

75120
Other Language Changes

Lib/test/support/script_helper.py

Lines changed: 30 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,35 @@ def interpreter_requires_environment():
4848
return __cached_interp_requires_environment
4949

5050

51-
_PythonRunResult = collections.namedtuple("_PythonRunResult",
52-
("rc", "out", "err"))
51+
class _PythonRunResult(collections.namedtuple("_PythonRunResult",
52+
("rc", "out", "err"))):
53+
"""Helper for reporting Python subprocess run results"""
54+
def fail(self, cmd_line):
55+
"""Provide helpful details about failed subcommand runs"""
56+
# Limit to 80 lines to ASCII characters
57+
maxlen = 80 * 100
58+
out, err = self.out, self.err
59+
if len(out) > maxlen:
60+
out = b'(... truncated stdout ...)' + out[-maxlen:]
61+
if len(err) > maxlen:
62+
err = b'(... truncated stderr ...)' + err[-maxlen:]
63+
out = out.decode('ascii', 'replace').rstrip()
64+
err = err.decode('ascii', 'replace').rstrip()
65+
raise AssertionError("Process return code is %d\n"
66+
"command line: %r\n"
67+
"\n"
68+
"stdout:\n"
69+
"---\n"
70+
"%s\n"
71+
"---\n"
72+
"\n"
73+
"stderr:\n"
74+
"---\n"
75+
"%s\n"
76+
"---"
77+
% (self.rc, cmd_line,
78+
out,
79+
err))
5380

5481

5582
# Executing the interpreter in a subprocess
@@ -107,30 +134,7 @@ def run_python_until_end(*args, **env_vars):
107134
def _assert_python(expected_success, *args, **env_vars):
108135
res, cmd_line = run_python_until_end(*args, **env_vars)
109136
if (res.rc and expected_success) or (not res.rc and not expected_success):
110-
# Limit to 80 lines to ASCII characters
111-
maxlen = 80 * 100
112-
out, err = res.out, res.err
113-
if len(out) > maxlen:
114-
out = b'(... truncated stdout ...)' + out[-maxlen:]
115-
if len(err) > maxlen:
116-
err = b'(... truncated stderr ...)' + err[-maxlen:]
117-
out = out.decode('ascii', 'replace').rstrip()
118-
err = err.decode('ascii', 'replace').rstrip()
119-
raise AssertionError("Process return code is %d\n"
120-
"command line: %r\n"
121-
"\n"
122-
"stdout:\n"
123-
"---\n"
124-
"%s\n"
125-
"---\n"
126-
"\n"
127-
"stderr:\n"
128-
"---\n"
129-
"%s\n"
130-
"---"
131-
% (res.rc, cmd_line,
132-
out,
133-
err))
137+
res.fail(cmd_line)
134138
return res
135139

136140
def assert_python_ok(*args, **env_vars):

0 commit comments

Comments
 (0)