Skip to content

fix: use epsilon comparison for mtime to prevent unnecessary re-mining#610

Merged
bensig merged 1 commit intoMemPalace:developfrom
arnoldwender:fix/mtime-float-comparison
Apr 11, 2026
Merged

fix: use epsilon comparison for mtime to prevent unnecessary re-mining#610
bensig merged 1 commit intoMemPalace:developfrom
arnoldwender:fix/mtime-float-comparison

Conversation

@arnoldwender
Copy link
Copy Markdown
Contributor

Problem

palace.py:68 — float equality comparison for file modification times:

return float(stored_mtime) == current_mtime

ChromaDB stores metadata values through serialization/deserialization cycles that can introduce floating-point precision loss. This causes == to return False for files that haven't actually changed, triggering unnecessary re-mining on every run.

Fix

Replace exact equality with epsilon comparison:

return abs(float(stored_mtime) - current_mtime) < 0.001

Sub-millisecond precision is more than sufficient for file modification timestamps.

Test plan

  • pytest tests/ -v --ignore=tests/benchmarks — all 80 tests pass (including test_file_already_mined_check_mtime)
  • ruff check + ruff format clean
  • Mine a project twice — second run should skip unchanged files

Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float equality on mtime is a real issue — ChromaDB's metadata serialization round-trip is the exact path that introduces this. We've seen similar behavior in integrations that store and retrieve mtime from metadata across separate client sessions: the stored value comes back with a few ULP of drift even though the file never changed.

The abs(...) < 0.001 epsilon is the right threshold. A millisecond is several orders of magnitude below any meaningful file-modification granularity (most filesystems use 1s or 100ns resolution) and well above floating-point noise.

One subtle point: this interacts correctly with the new sinful1992 issue (#608) — if the HNSW cache is stale and a fresh client reads the mtime back, epsilon comparison prevents a phantom re-mine on the already-indexed file. Small fix, good knock-on.

LGTM.


[MemPalace-AGI integration — 215 tests, 710 KG entities]

@bensig bensig changed the base branch from main to develop April 11, 2026 22:21
@bensig bensig merged commit bb7ed80 into MemPalace:develop Apr 11, 2026
6 checks passed
NickShtefan pushed a commit to NickShtefan/mempalace that referenced this pull request Apr 12, 2026
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Two items in "Fork Changes (still ahead of upstream after v3.3.1 merge)"
were never — or are no longer — fork-only. Demote both:

1. Epsilon mtime comparison (palace.py)
   Upstream merged Arnold Wender's equivalent fix as PR MemPalace#610 on
   2026-04-12 (commit bb7ed80). Their threshold is 0.001 vs our fork's
   0.01, but abs(stored - current) < epsilon is semantically identical.
   Moved to "Merged into upstream (post-v3.3.1)".

2. ".jsonl exempt from JUNK_FILE_SIZE cap"
   The description was wrong. The actual change (commit 560fdbd) adds
   ".jsonl" to READABLE_EXTENSIONS in miner.py — a whitelist addition,
   not a size-cap exemption. And it was authored by MSL (upstream
   maintainer) at the same SHA on upstream/develop. Never was a fork
   contribution. Moved to "Pulled in from upstream/develop".
   Related: upstream also raised MAX_FILE_SIZE 10MB → 500MB in d137d12
   (the actual size-cap fix, separate concern).

Clarified that item now at #1 (bulk_check_mined) is fork-only and
independent of the mtime comparison fix. Renumbered remaining "still
ahead" items 1-18.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Remove three stale rows that are already in upstream/develop:
- Epsilon mtime comparison (merged upstream as PR MemPalace#610, Arnold Wender)
- .jsonl READABLE_EXTENSIONS addition (upstream-authored at SHA 560fdbd)
- max_distance threshold (superset already in upstream via MemPalace#667)

All three moved to "Superseded by upstream" with attribution.

Add two rows that were fork-ahead but missing from the table:
- cmd_export and cmd_purge CLI commands (cli.py, PR pending)
- mempal_precompact_hook.sh transcript auto-mining: session ID parsing,
  find-by-session-id fallback, inline chunk_exchanges → upsert (PR pending)

Annotate existing rows:
- PID guard: branch pr/pid-file-guard (local, not pushed)
- Save hook Python auto-detection: expand files list to include both
  .claude-plugin/ hooks (same pattern was applied there)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants