What happened?
Running the full test suite on Windows exposed two test reliability issues:
- Several onboarding tests failed with
UnicodeDecodeError because generated UTF-8 files were read using the platform default encoding, which is GBK on this environment.
tests/test_convo_miner.py::test_convo_mining could trigger Chroma's default ONNX embedding model download, causing the test to fail in offline or restricted-network environments.
Observed onboarding failures included:
tests/test_onboarding.py::test_generate_aaak_bootstrap_entities_content
tests/test_onboarding.py::test_generate_aaak_bootstrap_facts_content
tests/test_onboarding.py::test_generate_aaak_bootstrap_collision
tests/test_onboarding.py::test_generate_aaak_bootstrap_no_relationship
Typical encoding error:
UnicodeDecodeError: 'gbk' codec can't decode byte ...
Typical convo miner/network error:
httpx.ConnectError: [SSL: UNEXPECTED_EOF_WHILE_READING] ...
The onboarding implementation writes files explicitly as UTF-8, but the tests read them using Path.read_text() without an explicit encoding. The convo miner test used a real Chroma collection, which could fall back to Chroma's default ONNX embedding function and attempt a model download.
What did you expect?
The full test suite should pass consistently on Windows and should not depend on platform default encoding or network access.
In particular:
onboarding tests should read UTF-8 bootstrap files correctly on Windows
test_convo_mining should run fully offline without trying to download embedding models
How to reproduce:
Use a Windows environment with a GBK/default non-UTF-8 locale.
Run:
python -m pytest tests -v
Observe onboarding failures with UnicodeDecodeError, and in offline/restricted environments observe test_convo_mining fail due to Chroma ONNX model download/network access.
Environment:
OS: Windows
Python version: 3.11.9
MemPal version: current develop-targeted work prior to merge
Addressed in #775 for branch validation and full-suite test reliability.
What happened?
Running the full test suite on Windows exposed two test reliability issues:
UnicodeDecodeErrorbecause generated UTF-8 files were read using the platform default encoding, which is GBK on this environment.tests/test_convo_miner.py::test_convo_miningcould trigger Chroma's default ONNX embedding model download, causing the test to fail in offline or restricted-network environments.Observed onboarding failures included:
tests/test_onboarding.py::test_generate_aaak_bootstrap_entities_contenttests/test_onboarding.py::test_generate_aaak_bootstrap_facts_contenttests/test_onboarding.py::test_generate_aaak_bootstrap_collisiontests/test_onboarding.py::test_generate_aaak_bootstrap_no_relationshipTypical encoding error: