feat: prevent duplicate history entries on import (anti-join)#52
Merged
feat: prevent duplicate history entries on import (anti-join)#52
Conversation
Add duplicate detection to history import using an anti-join approach: - Entries with timestamps: match on (command_line, timestamp) pair - Entries without timestamps: match on command_line alone Duplicates are skipped by default. The new --import-duplicates flag restores the old behavior of importing everything unconditionally. When duplicates are detected, the count is shown with a hint about --import-duplicates. Dry-run mode also performs dedup checks when the target databases exist. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add early-return in `is_duplicate` to avoid String allocation when the command doesn't exist at all (most common case) - Fix dry-run dedup to build sets independently per database, so it works when only one of r.db/shell.db exists - Change `import_entries_dry_run` signature to accept independent `Option<&DedupSet>` per database instead of a tuple - Add doc comment explaining why `commands` intentionally stores all command_lines (including those with timestamps) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add INVARIANT comment in `from_history` documenting that `commands` must be a superset of command_lines in `command_timestamps`, since `is_duplicate` relies on this for its fast-path filter - Add test for dry-run with partial dedup (only R dedup set, no shell) to guard against regressions in independent per-database dedup Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add --import-duplicates flag to the Options table in README - Update Notes section to reflect default dedup behavior - Add [Unreleased] CHANGELOG entry for the behavior change Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Adds duplicate-detection (“anti-join”) to arf history import so imports skip entries already present in the target DB by default, with an opt-out flag.
Changes:
- Add
--import-duplicatesflag and wire it through CLI/help/completions. - Implement dedup sets (
DedupSet) and integrate duplicate skipping into both real imports and dry-run previews. - Expand unit tests and update CLI snapshot outputs.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/arf-console/src/main.rs | Threads the new flag into import handling and adds dry-run dedup checks by opening existing DBs. |
| crates/arf-console/src/history/import.rs | Implements DedupSet, integrates dedup into import + dry-run paths, and adds extensive dedup-focused tests. |
| crates/arf-console/src/cli.rs | Adds --import-duplicates option to history import. |
| crates/arf-console/src/snapshots/arf__cli__tests__help_history_import.snap | Updates CLI help snapshot to include the new flag. |
| crates/arf-console/src/snapshots/arf__cli__tests__completions_bash.snap | Updates bash completion snapshot for the new flag. |
| crates/arf-console/src/snapshots/arf__cli__tests__completions_zsh.snap | Updates zsh completion snapshot for the new flag. |
| crates/arf-console/src/snapshots/arf__cli__tests__completions_fish.snap | Updates fish completion snapshot for the new flag. |
| crates/arf-console/src/snapshots/arf__cli__tests__completions_powershell.snap | Updates PowerShell completion snapshot for the new flag. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use timestamp_millis() instead of timestamp() for dedup matching, consistent with reedline's SQLite storage format - Add DedupSet::from_db() that opens the database in read-only mode, avoiding WAL/shm side-effect files in --dry-run mode - Keep DedupSet::from_history() for the non-dry-run path where the database is already opened for writing Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add comment documenting that reedline stores start_timestamp as Unix milliseconds in SQLite - Improve error message for from_db when history table is missing - Add test_from_db_matches_from_history verifying that from_db and from_history produce identical dedup sets for the same database Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Document from_history() timestamp fallback behavior and how from_db avoids it - Document that intra-batch duplicates are not detected (by design) - Add warning when dry-run dedup is skipped due to missing history_dir Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add test_import_skips_notimestamp_when_timestamped_exists to verify that importing a command without a timestamp is skipped when the same command already exists in the database with a timestamp. Co-Authored-By: Claude Opus 4.5 <[email protected]>
eitsupi
added a commit
that referenced
this pull request
Feb 5, 2026
Re-importing is now safe since anti-join deduplication was added in #52. Changed from CAUTION to NOTE with updated messaging. Co-Authored-By: Claude Opus 4.5 <[email protected]>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
arf history import, skipping entries that already exist in the target database by default(command_line, timestamp)pair; entries without timestamps are matched oncommand_linealone--import-duplicatesflag to restore the old unconditional import behaviorTest plan
--import-duplicatesflag, dry-run with dedup, and partial database existencecargo clippycleancargo fmt --checkclean🤖 Generated with Claude Code