Skip to content

fix: replace Unicode checkmark with ASCII for Windows encoding (#535)#681

Merged
bensig merged 2 commits intoMemPalace:developfrom
jphein:fix/unicode-checkmark
Apr 19, 2026
Merged

fix: replace Unicode checkmark with ASCII for Windows encoding (#535)#681
bensig merged 2 commits intoMemPalace:developfrom
jphein:fix/unicode-checkmark

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 12, 2026

Summary

  • Replace Unicode ✓ (U+2713) with ASCII + in convo_miner.py and split_mega_files.py
  • Windows terminals using cp1251/cp1252 crash on the Unicode character during mining progress output

Files changed

  • mempalace/convo_miner.py — line 359
  • mempalace/split_mega_files.py — line 227

Test plan

  • Existing tests pass
  • Verify mining progress output displays correctly on Windows

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 12, 2026 07:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent Windows console encoding crashes during progress output by replacing the Unicode checkmark (✓) with an ASCII alternative in mining/splitting scripts.

Changes:

  • Replace with + in convo_miner.py progress output.
  • Replace with + in split_mega_files.py per-chunk write output.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
mempalace/convo_miner.py Swaps the per-file progress marker from Unicode to ASCII.
mempalace/split_mega_files.py Swaps the per-chunk success marker from Unicode to ASCII.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/convo_miner.py

total_drawers += drawers_added
print(f" [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers_added}")
print(f" + [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers_added}")
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change removes the checkmark here, but the same Unicode ✓ progress marker still exists in mempalace/miner.py (used by the main mempalace mine flow). If the goal is to avoid Windows cp1251/cp1252 UnicodeEncodeErrors during progress output, that remaining print will still crash—please update it (or centralize the status marker) so all mining commands are consistent.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added miner.py checkmark and all box-drawing/arrow chars (─, →) across both files. 5044464

Comment on lines 224 to 228
print(f" [{i + 1}/{len(boundaries) - 1}] {name} ({len(chunk)} lines)")
else:
out_path.write_text("".join(chunk), encoding="utf-8")
print(f" {name} ({len(chunk)} lines)")
print(f" + {name} ({len(chunk)} lines)")

Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing ✓ with '+' helps, but this script still prints other non-ASCII characters (e.g., the box-drawing '─' separators and the '→' arrow). On Windows consoles/codepages like cp1251/cp1252 those can also raise UnicodeEncodeError, so the encoding crash may persist even after this change. Consider switching those to ASCII equivalents (e.g., '-' and '->') or explicitly ensuring stdout is UTF-8 / using a safe write strategy.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced ─ with - and → with -> in split_mega_files.py. All non-ASCII progress markers are now ASCII. 5044464

jphein added a commit to jphein/mempalace that referenced this pull request Apr 12, 2026
jphein added a commit to jphein/mempalace that referenced this pull request Apr 12, 2026
…ged, add MemPalace#681, bump test count to 715

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@jphein jphein force-pushed the fix/unicode-checkmark branch from 5044464 to d092918 Compare April 13, 2026 14:22
@igorls igorls added area/i18n Multilingual, Unicode, non-English embeddings area/mining File and conversation mining area/windows Windows-specific bugs and compatibility bug Something isn't working labels Apr 14, 2026
jphein and others added 2 commits April 16, 2026 08:59
…Palace#535)

Windows terminals using cp1251/cp1252 crash on the Unicode ✓ (U+2713)
in progress output. Replace with ASCII + in convo_miner.py and
split_mega_files.py.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Also fix miner.py checkmark and box-drawing/arrow chars (─, →) in
both miner.py and split_mega_files.py that would crash on cp1251/cp1252.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@jphein jphein force-pushed the fix/unicode-checkmark branch from 586d302 to 15ea385 Compare April 16, 2026 16:01
jphein added a commit to jphein/mempalace that referenced this pull request Apr 16, 2026
fork-direction.md and TODO-fork-improvements.md were scattered strategic
thinking that belongs in the front-door README: competitive landscape,
roadmap (P0-P6), and open problems. Merges them in and deletes the
standalone files.

PR-status tables (README + CLAUDE.md) were stale after today's rebases
of MemPalace#659, MemPalace#660, MemPalace#661, MemPalace#673, MemPalace#681 — updated to reflect current mergeable
state. MemPalace#673 specifically noted as cleanly rebased against MemPalace#863.

test_readme_claims.py skips MCP-tool-table and dialect-reference checks
when absent — our slimmed fork README doesn't reproduce upstream's tool
table structure.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 18, 2026

Friendly ping on review — this is a one-character change swapping the Unicode ✓ for ASCII to fix Windows GBK / cp1252 encoding errors (#535). Clean merge against current develop. Happy to rebase if needed.

@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 19, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

@bensig bensig merged commit 62439e1 into MemPalace:develop Apr 19, 2026
6 checks passed
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
…emPalace#1036

Overnight/morning:
- MemPalace#681, MemPalace#1000, MemPalace#1023 merged — moved from "Still ahead" to "Merged upstream (post-v3.3.1)"
- bensig reviewed MemPalace#659 (wing_ prefix + agent filter) and MemPalace#1021 (silent_guard
  default) — both addressed on their PR branches
- MemPalace#673 needed re-rebase after overnight develop merges; done
- MemPalace#1036 filed: paginate miner.status(), closes upstream MemPalace#802 and MemPalace#1015

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
igorls added a commit that referenced this pull request Apr 19, 2026
Version bumps across pyproject.toml, mempalace/version.py, README badge,
uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*).

CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added
covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop
fix + tandem sweeper (#998), None-metadata guards (#999, #1013),
chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681),
HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path
cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).
@igorls igorls mentioned this pull request Apr 19, 2026
8 tasks
igorls added a commit that referenced this pull request Apr 19, 2026
Non-ASCII glyphs (regression of the #681 class of Windows UnicodeEncodeError):
- mempalace/cli.py: "✗" → "ERROR:", "⚠" → "WARNING:", em dash → "-"
- mempalace/sweeper.py: "⚠" → "WARNING:"

Backend arg validation:
- mempalace/backends/chroma.py: `_normalize_get_collection_args` now
  raises TypeError on unexpected trailing positional args instead of
  silently dropping them — surfaces call-site bugs early.

Docs site:
- website/.vitepress/config.mts: gate Google Analytics scripts behind
  MEMPALACE_DOCS_GA_ID env var (default off). Self-hosters no longer
  get GA injected unconditionally.

Landing page SPA hygiene:
- website/.vitepress/theme/landing/useLandingEffects.js: collect all
  IntersectionObserver disconnects and removeEventListener thunks in a
  shared `cleanups` registry; drain it in `onBeforeUnmount` so observers
  and form/replay listeners don't leak across SPA navigations.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 21, 2026
Resolutions:
- `.claude-plugin/.mcp.json`, `plugin.json` — adopt upstream's `mempalace-mcp`
  console-script command (added via upstream MemPalace#340 for pipx/uv). Run
  `pip install -e .` in plugin venv after merge to install the entry point.
- `.claude-plugin/hooks/*.sh`, `hooks/*.sh` — adopt upstream's console-command
  resolution order (`mempalace` script → `python3 -m mempalace` → `python`).
  `MEMPAL_PYTHON` override still works inside `hooks_cli.py`.
- `mempalace/hooks_cli.py`, `tests/test_hooks_cli.py` — keep fork's
  `_mempalace_python()` helper (fork-ahead #4); upstream only had
  `sys.executable`, which loses MEMPAL_PYTHON override.
- `mempalace/miner.py` — keep fork's concurrent mining path (fork-ahead #1),
  apply upstream's unicode-`✓` → ASCII-`+` fix (MemPalace#681) to both paths.
- `mempalace/backends/chroma.py` — take upstream's refined `quarantine_stale_hnsw`
  docstring (it's the version merged via our own MemPalace#1000).

Brought in: 33 upstream commits including Belarusian/Chinese/German/Spanish/French
entity detection, console-script entry points, hook plugin-root space quoting,
and v3.3.2 tag (which contains our MemPalace#681/MemPalace#1000/MemPalace#1023).

Tests: 1096 passed, 106 deselected (benchmarks). Ruff clean.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 21, 2026
…#1023 merged, MemPalace#673 rebased

- Bump fork-ahead section header to "after v3.3.2"
- Strike #11 (quarantine_stale_hnsw) and MemPalace#18 (PID file guard) as
  merged-into-upstream-via-v3.3.2, keep entries for traceability
- Add v3.3.2-shipped items to "Merged into upstream (post-v3.3.1)"
- Rebuild PR table: 10 merged / 7 open / 7 closed; add MemPalace#1024 row,
  reclassify MemPalace#681/MemPalace#1000/MemPalace#1023 as merged, note MemPalace#673 rebased 2026-04-21
- Annotate MemPalace#661 status with the GitHub review-state machine caveat
  (CHANGES_REQUESTED persists until reviewer dismisses, not owed)
- Bump test count 1063 → 1096 post-merge
jphein added a commit to jphein/mempalace that referenced this pull request Apr 21, 2026
…unts

- Upstream released v3.3.2 on 2026-04-21 (our MemPalace#681/MemPalace#1000/MemPalace#1023)
- Drawer count 152,682 → 165,632; wings 23 → 28; tests 1063 → 1096
- MemPalace#673 now MERGEABLE after fresh rebase + squash to 1 commit
- MemPalace#661 status clarified: CHANGES_REQUESTED persists until reviewer
  dismisses, not an outstanding owe
- Merged-upstream section split out v3.3.2 release group
arnoldwender added a commit to arnoldwender/mempalace that referenced this pull request Apr 23, 2026
…ace#1034)

GBK consoles (Windows PowerShell/CMD default) cannot encode U+2713 (✓),
U+2717 (✗), and U+2014 (—). The same class of UnicodeEncodeError fixed
in miner.py via MemPalace#681 affects closet_llm.py and cli.py.

Replace with ASCII equivalents: [OK], [FAIL], [!], and hyphen.
arnoldwender added a commit to arnoldwender/mempalace that referenced this pull request Apr 24, 2026
…ace#1034)

GBK consoles (Windows PowerShell/CMD default) cannot encode U+2713 (✓),
U+2717 (✗), and U+2014 (—). The same class of UnicodeEncodeError fixed
in miner.py via MemPalace#681 affects closet_llm.py and cli.py.

Replace with ASCII equivalents: [OK], [FAIL], [!], and hyphen.
@jphein jphein deleted the fix/unicode-checkmark branch April 25, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/i18n Multilingual, Unicode, non-English embeddings area/mining File and conversation mining area/windows Windows-specific bugs and compatibility bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants