Skip to content

log.py: improve non-utf-8 input and output#48005

Merged
alalazo merged 1 commit intodevelopfrom
hs/fix/log-encoding-issues
Dec 11, 2024
Merged

log.py: improve non-utf-8 input and output#48005
alalazo merged 1 commit intodevelopfrom
hs/fix/log-encoding-issues

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Dec 10, 2024

Replaces #48001

The build process is assumed to output utf-8 most of the time: it contains (1) python's own output, which I believe is almost always ascii due to our package name restrictions (but maybe under -d we print non-utf-8 accidentally); and (2) possibly output from build subprocesses, which is usually utf-8 unless a compiler or linker decides to dump a binary to stdout, which happens.

For the unlikely case that the build process does not output utf-8, @alalazo and I agreed that it would be sensible to output valid utf-8 to the main process's stdout, as well as the log file, meaning we escape invalid utf-8 by ? chars. That way users can cat and grep the log file safely/easily. Obviously information does get lost here, but if you really need the exact output from a build process to troubleshoot a build, you could also look at strace -f -s1000 output, so technically nothing is lost.

However, a related issue is that even though we have valid utf-8 lines from the build process, sys.stdout may not support utf-8. That can happen in principle on any platform with any version of Python. To deal with this edge case, we do a little dance to re-encode the utf-8 as sys.stdout.encoding with errors replaced (so more ?s). That's inefficient, but doesn't error, and is an unlikely code path (most likely only to happen on Python 3.6 without locale set, or default C locale).

@spackbot-app spackbot-app bot added core PR affects Spack core functionality utilities labels Dec 10, 2024
@haampie haampie added the v0.23.1 PRs to backport for v0.23.1 label Dec 10, 2024
@haampie haampie force-pushed the hs/fix/log-encoding-issues branch from 7f0e896 to 85f22ee Compare December 10, 2024 11:03
@spackbot-app spackbot-app bot added the tests General test capability(ies) label Dec 10, 2024
@haampie haampie changed the title log.py: improve utf-8 handling, and non-utf-8 output log.py: improve utf-8 input, and non-utf-8 output Dec 10, 2024
@haampie haampie changed the title log.py: improve utf-8 input, and non-utf-8 output log.py: improve non-utf-8 input and output Dec 10, 2024
@alalazo alalazo merged commit e9d2732 into develop Dec 11, 2024
@alalazo alalazo deleted the hs/fix/log-encoding-issues branch December 11, 2024 09:54
fryeguy52 pushed a commit to fryeguy52/spack that referenced this pull request Dec 17, 2024
tdrwenski pushed a commit to tdrwenski/spack that referenced this pull request Dec 26, 2024
kshea21 pushed a commit to kshea21/spack that referenced this pull request Dec 26, 2024
@haampie haampie mentioned this pull request Feb 3, 2025
27 tasks
teaguesterling pushed a commit to teaguesterling/spack that referenced this pull request Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core PR affects Spack core functionality tests General test capability(ies) utilities v0.23.1 PRs to backport for v0.23.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants