Skip to content

tests: fix Windows encoding assumptions and offline Chroma embedding dependency #776

@111r1ck

Description

@111r1ck

What happened?

Running the full test suite on Windows exposed two test reliability issues:

  1. Several onboarding tests failed with UnicodeDecodeError because generated UTF-8 files were read using the platform default encoding, which is GBK on this environment.
  2. tests/test_convo_miner.py::test_convo_mining could trigger Chroma's default ONNX embedding model download, causing the test to fail in offline or restricted-network environments.

Observed onboarding failures included:

  • tests/test_onboarding.py::test_generate_aaak_bootstrap_entities_content
  • tests/test_onboarding.py::test_generate_aaak_bootstrap_facts_content
  • tests/test_onboarding.py::test_generate_aaak_bootstrap_collision
  • tests/test_onboarding.py::test_generate_aaak_bootstrap_no_relationship

Typical encoding error:

UnicodeDecodeError: 'gbk' codec can't decode byte ...
Typical convo miner/network error:

httpx.ConnectError: [SSL: UNEXPECTED_EOF_WHILE_READING] ...
The onboarding implementation writes files explicitly as UTF-8, but the tests read them using Path.read_text() without an explicit encoding. The convo miner test used a real Chroma collection, which could fall back to Chroma's default ONNX embedding function and attempt a model download.

What did you expect?

The full test suite should pass consistently on Windows and should not depend on platform default encoding or network access.

In particular:

onboarding tests should read UTF-8 bootstrap files correctly on Windows
test_convo_mining should run fully offline without trying to download embedding models
How to reproduce:

Use a Windows environment with a GBK/default non-UTF-8 locale.
Run:
python -m pytest tests -v
Observe onboarding failures with UnicodeDecodeError, and in offline/restricted environments observe test_convo_mining fail due to Chroma ONNX model download/network access.
Environment:

OS: Windows
Python version: 3.11.9
MemPal version: current develop-targeted work prior to merge
Addressed in #775  for branch validation and full-suite test reliability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions