Skip to content

Fix text encoding in python CLI#611

Merged
maciejpirog merged 1 commit intomainfrom
mpir/non-utf-chars-in-python
Mar 11, 2026
Merged

Fix text encoding in python CLI#611
maciejpirog merged 1 commit intomainfrom
mpir/non-utf-chars-in-python

Conversation

@maciejpirog
Copy link
Copy Markdown
Contributor

@maciejpirog maciejpirog commented Mar 10, 2026

When encountering an incorrect UTF-8 character, an exception was raised, now we use the "replace" policy everywhere, which skips incorrect characters.

Closes #560

See the comment section in the issue for the reproduction pack (tested on macos arm + debian in docker) that can be also used to test the fix

@maciejpirog maciejpirog changed the title [WIP] Fix text encoding in python CLI Fix text encoding in python CLI Mar 11, 2026
Copy link
Copy Markdown
Collaborator

@dimitris-m dimitris-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked that the fix works:

  1. The original #560 bug (non-UTF-8 source bytes crashing the RPC decode path) -- fixed by errors="replace" on subprocess and decode calls in rpc.py and git.py.

  2. An additional issue I tested: in containers without a UTF-8 locale (e.g. node:20-slim with LANG=C), Python defaults stdout to ASCII encoding, and the box-drawing characters in scan output () cause UnicodeEncodeError. The sys.stdout.reconfigure(encoding="utf-8", errors="replace") in wrapper.py fixes this. Tested baseline v1.16.3 (crashes) vs this PR (works) in that environment.

Note: --experimental is unaffected in both cases since OCaml handles its own encoding.

@maciejpirog maciejpirog merged commit 3eb558d into main Mar 11, 2026
43 checks passed
@maciejpirog maciejpirog deleted the mpir/non-utf-chars-in-python branch March 11, 2026 11:09
@maciejpirog maciejpirog mentioned this pull request Mar 11, 2026
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Mar 13, 2026
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [opengrep/opengrep](https://github.com/opengrep/opengrep) | patch | `v1.16.3` → `v1.16.4` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>opengrep/opengrep (opengrep/opengrep)</summary>

### [`v1.16.4`](https://github.com/opengrep/opengrep/releases/tag/v1.16.4): Opengrep 1.16.4

[Compare Source](opengrep/opengrep@v1.16.3...v1.16.4)

#### Improvements

- Improvements in the install script by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;610](opengrep/opengrep#610)
- Fix text encoding in python CLI to prevent a bug when reading a non-utf8 character by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;611](opengrep/opengrep#611)

**Full Changelog**: <opengrep/opengrep@v1.16.3...v1.16.4>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever MR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My42NC4zIiwidXBkYXRlZEluVmVyIjoiNDMuNjQuMyIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90IiwiYXV0b21hdGlvbjpib3QtYXV0aG9yZWQiLCJkZXBlbmRlbmN5LXR5cGU6OnBhdGNoIl19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnicodeDecodeError in RPC layer when autofix encounters non-UTF-8 source files

2 participants