fix: use epsilon comparison for mtime to prevent unnecessary re-mining#610
Conversation
web3guru888
left a comment
There was a problem hiding this comment.
Float equality on mtime is a real issue — ChromaDB's metadata serialization round-trip is the exact path that introduces this. We've seen similar behavior in integrations that store and retrieve mtime from metadata across separate client sessions: the stored value comes back with a few ULP of drift even though the file never changed.
The abs(...) < 0.001 epsilon is the right threshold. A millisecond is several orders of magnitude below any meaningful file-modification granularity (most filesystems use 1s or 100ns resolution) and well above floating-point noise.
One subtle point: this interacts correctly with the new sinful1992 issue (#608) — if the HNSW cache is stale and a fresh client reads the mtime back, epsilon comparison prevents a phantom re-mine on the already-indexed file. Small fix, good knock-on.
LGTM.
[MemPalace-AGI integration — 215 tests, 710 KG entities]
Two items in "Fork Changes (still ahead of upstream after v3.3.1 merge)" were never — or are no longer — fork-only. Demote both: 1. Epsilon mtime comparison (palace.py) Upstream merged Arnold Wender's equivalent fix as PR MemPalace#610 on 2026-04-12 (commit bb7ed80). Their threshold is 0.001 vs our fork's 0.01, but abs(stored - current) < epsilon is semantically identical. Moved to "Merged into upstream (post-v3.3.1)". 2. ".jsonl exempt from JUNK_FILE_SIZE cap" The description was wrong. The actual change (commit 560fdbd) adds ".jsonl" to READABLE_EXTENSIONS in miner.py — a whitelist addition, not a size-cap exemption. And it was authored by MSL (upstream maintainer) at the same SHA on upstream/develop. Never was a fork contribution. Moved to "Pulled in from upstream/develop". Related: upstream also raised MAX_FILE_SIZE 10MB → 500MB in d137d12 (the actual size-cap fix, separate concern). Clarified that item now at #1 (bulk_check_mined) is fork-only and independent of the mtime comparison fix. Renumbered remaining "still ahead" items 1-18. Co-Authored-By: Claude Opus 4.7 <[email protected]>
Remove three stale rows that are already in upstream/develop: - Epsilon mtime comparison (merged upstream as PR MemPalace#610, Arnold Wender) - .jsonl READABLE_EXTENSIONS addition (upstream-authored at SHA 560fdbd) - max_distance threshold (superset already in upstream via MemPalace#667) All three moved to "Superseded by upstream" with attribution. Add two rows that were fork-ahead but missing from the table: - cmd_export and cmd_purge CLI commands (cli.py, PR pending) - mempal_precompact_hook.sh transcript auto-mining: session ID parsing, find-by-session-id fallback, inline chunk_exchanges → upsert (PR pending) Annotate existing rows: - PID guard: branch pr/pid-file-guard (local, not pushed) - Save hook Python auto-detection: expand files list to include both .claude-plugin/ hooks (same pattern was applied there) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Problem
palace.py:68— float equality comparison for file modification times:ChromaDB stores metadata values through serialization/deserialization cycles that can introduce floating-point precision loss. This causes
==to returnFalsefor files that haven't actually changed, triggering unnecessary re-mining on every run.Fix
Replace exact equality with epsilon comparison:
Sub-millisecond precision is more than sufficient for file modification timestamps.
Test plan
pytest tests/ -v --ignore=tests/benchmarks— all 80 tests pass (includingtest_file_already_mined_check_mtime)ruff check+ruff formatclean