Skip to content

feat: prevent duplicate history entries on import (anti-join)#52

Merged
eitsupi merged 8 commits intomainfrom
feat/anti-join-history-import
Feb 4, 2026
Merged

feat: prevent duplicate history entries on import (anti-join)#52
eitsupi merged 8 commits intomainfrom
feat/anti-join-history-import

Conversation

@eitsupi
Copy link
Copy Markdown
Owner

@eitsupi eitsupi commented Feb 4, 2026

Summary

  • Add duplicate detection (anti-join) to arf history import, skipping entries that already exist in the target database by default
  • Entries with timestamps are matched on (command_line, timestamp) pair; entries without timestamps are matched on command_line alone
  • Add --import-duplicates flag to restore the old unconditional import behavior
  • Dry-run mode also performs dedup checks when target databases exist (independently per database)

Test plan

  • 9 new unit tests covering dedup with/without timestamps, per-database isolation, --import-duplicates flag, dry-run with dedup, and partial database existence
  • All 39 import tests pass
  • cargo clippy clean
  • cargo fmt --check clean
  • Snapshots updated (CLI help, shell completions)

🤖 Generated with Claude Code

eitsupi and others added 3 commits February 4, 2026 16:00
Add duplicate detection to history import using an anti-join approach:
- Entries with timestamps: match on (command_line, timestamp) pair
- Entries without timestamps: match on command_line alone

Duplicates are skipped by default. The new --import-duplicates flag
restores the old behavior of importing everything unconditionally.

When duplicates are detected, the count is shown with a hint about
--import-duplicates. Dry-run mode also performs dedup checks when
the target databases exist.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add early-return in `is_duplicate` to avoid String allocation when the
  command doesn't exist at all (most common case)
- Fix dry-run dedup to build sets independently per database, so it
  works when only one of r.db/shell.db exists
- Change `import_entries_dry_run` signature to accept independent
  `Option<&DedupSet>` per database instead of a tuple
- Add doc comment explaining why `commands` intentionally stores all
  command_lines (including those with timestamps)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add INVARIANT comment in `from_history` documenting that `commands`
  must be a superset of command_lines in `command_timestamps`, since
  `is_duplicate` relies on this for its fast-path filter
- Add test for dry-run with partial dedup (only R dedup set, no shell)
  to guard against regressions in independent per-database dedup

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add --import-duplicates flag to the Options table in README
- Update Notes section to reflect default dedup behavior
- Add [Unreleased] CHANGELOG entry for the behavior change

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds duplicate-detection (“anti-join”) to arf history import so imports skip entries already present in the target DB by default, with an opt-out flag.

Changes:

  • Add --import-duplicates flag and wire it through CLI/help/completions.
  • Implement dedup sets (DedupSet) and integrate duplicate skipping into both real imports and dry-run previews.
  • Expand unit tests and update CLI snapshot outputs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/arf-console/src/main.rs Threads the new flag into import handling and adds dry-run dedup checks by opening existing DBs.
crates/arf-console/src/history/import.rs Implements DedupSet, integrates dedup into import + dry-run paths, and adds extensive dedup-focused tests.
crates/arf-console/src/cli.rs Adds --import-duplicates option to history import.
crates/arf-console/src/snapshots/arf__cli__tests__help_history_import.snap Updates CLI help snapshot to include the new flag.
crates/arf-console/src/snapshots/arf__cli__tests__completions_bash.snap Updates bash completion snapshot for the new flag.
crates/arf-console/src/snapshots/arf__cli__tests__completions_zsh.snap Updates zsh completion snapshot for the new flag.
crates/arf-console/src/snapshots/arf__cli__tests__completions_fish.snap Updates fish completion snapshot for the new flag.
crates/arf-console/src/snapshots/arf__cli__tests__completions_powershell.snap Updates PowerShell completion snapshot for the new flag.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eitsupi eitsupi marked this pull request as draft February 4, 2026 16:17
eitsupi and others added 3 commits February 4, 2026 16:20
- Use timestamp_millis() instead of timestamp() for dedup matching,
  consistent with reedline's SQLite storage format
- Add DedupSet::from_db() that opens the database in read-only mode,
  avoiding WAL/shm side-effect files in --dry-run mode
- Keep DedupSet::from_history() for the non-dry-run path where the
  database is already opened for writing

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add comment documenting that reedline stores start_timestamp as
  Unix milliseconds in SQLite
- Improve error message for from_db when history table is missing
- Add test_from_db_matches_from_history verifying that from_db and
  from_history produce identical dedup sets for the same database

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Document from_history() timestamp fallback behavior and how from_db
  avoids it
- Document that intra-batch duplicates are not detected (by design)
- Add warning when dry-run dedup is skipped due to missing history_dir

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eitsupi eitsupi marked this pull request as ready for review February 4, 2026 23:46
Add test_import_skips_notimestamp_when_timestamped_exists to verify that
importing a command without a timestamp is skipped when the same command
already exists in the database with a timestamp.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@eitsupi eitsupi merged commit f135b2a into main Feb 4, 2026
10 checks passed
@eitsupi eitsupi deleted the feat/anti-join-history-import branch February 4, 2026 23:53
eitsupi added a commit that referenced this pull request Feb 5, 2026
Re-importing is now safe since anti-join deduplication was added in #52.
Changed from CAUTION to NOTE with updated messaging.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@eitsupi eitsupi mentioned this pull request Feb 5, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants