Skip to content

BUG: Float mtime comparison breaks file deduplication — every file re-mined on each run #475

@jphein

Description

@jphein

Summary

file_already_mined() in palace.py:68 uses strict float equality to compare stored vs current mtime:

return float(stored_mtime) == current_mtime

os.path.getmtime() returns a float. ChromaDB stores metadata via JSON serialization, which introduces floating-point precision loss (e.g., 1712345678.123456 may round-trip as 1712345678.1234560012817383). The strict == comparison frequently fails even for unchanged files, causing every file to be re-mined on every run.

This defeats the entire dedup/skip mechanism and silently bloats the palace with duplicate drawers.

Reproduction

  1. mempalace init <dir> && mempalace mine <dir> — initial mine
  2. mempalace mine <dir> — re-mine without changing any files
  3. Observe that all files are processed again (not skipped)

Suggested Fix

Use epsilon comparison:

return abs(float(stored_mtime) - current_mtime) < 0.01

Or truncate to integer seconds:

return int(float(stored_mtime)) == int(current_mtime)

Environment

  • mempalace 3.1.0
  • ChromaDB 0.6.3
  • Linux x86_64, Python 3.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/miningFile and conversation miningbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions