Skip to content

UnicodeEncodeError in miner.py / convo_miner.py / split_mega_files.py on Windows cp1251/cp1252 (incomplete fix of #47) #535

@krugdenis

Description

@krugdenis

Summary

Issue #47 (closed) fixed the UnicodeEncodeError crash caused by box-drawing characters in searcher.py on Windows. The same class of crash is still present in several other modules that print (U+2713) or other non-ASCII glyphs, and there is no sys.stdout.reconfigure(encoding="utf-8", ...) anywhere in the package.

Reproduction

Windows 11, Python 3.13, default console encoding cp1251, mempalace 3.0.14.

# Crash 1: mempalace init
mempalace init --yes D:/path/to/project

Crashes while printing detected-entities confirmation block (Cyrillic content from project docs):

File "C:\Program Files\Python313\Lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 30-34: character maps to <undefined>
# Crash 2: mempalace mine (project mode)
mempalace mine D:/path/to/project --wing vibeOps
  print(f"  ✓ [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers}")
  File "cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2713' in position 2: character maps to <undefined>
# Crash 3: mempalace mine --mode convos
mempalace mine C:/Users/.../.claude/projects/... --mode convos --extract general

Same \u2713 crash from convo_miner.py:359.

Affected print sites (U+2713 checkmark)

  • mempalace/miner.py:596print(f" ✓ [{i:4}/{len(files)}] ...")
  • mempalace/convo_miner.py:359print(f" ✓ [{i:4}/{len(files)}] ...")
  • mempalace/split_mega_files.py:227print(f" ✓ {name} ...")

Plus anywhere in entity_detector.py that prints user project text (Cyrillic / non-ASCII file contents) during init's confirmation step.

grep reconfigure\(encoding .../mempalace/ returns zero hits — the #47 fix is not applied anywhere in the installed package.

Workaround

set PYTHONIOENCODING=utf-8
set PYTHONUTF8=1

Both commands then succeed. But this is a bad UX — the tool ships crashing on default Windows consoles for any user whose project directory contains non-ASCII text or any mining run that prints a progress checkmark (i.e. all of them).

Proposed fix

Call this once at the top of cli.py::main() (so every subcommand benefits):

import sys
if sys.platform == "win32" and hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")
    sys.stderr.reconfigure(encoding="utf-8", errors="replace")

This is exactly the suggested fix from #47 but applied at the CLI entry point instead of inside one module.

Environment

  • Windows 11, Python 3.13.x
  • mempalace 3.0.14
  • Console: default Windows Terminal (cp1251 on Russian locale, cp1252 on en-US)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions