Fix parallel issue creation race condition#8
Closed
v4rgas wants to merge 7 commits intosteveyegge:mainfrom
Closed
Conversation
Add TestMultiProcessIDGeneration to reproduce the bug where multiple bd create processes fail with UNIQUE constraint errors when run simultaneously. Each goroutine opens a separate database connection to simulate independent processes. Test currently fails with 17/20 processes getting UNIQUE constraint errors, confirming the race condition in the in-memory ID counter.
Replace the in-memory nextID counter with an atomic database-backed counter using the issue_counters table. This fixes race conditions when multiple processes create issues concurrently. Changes: - Add issue_counters table with atomic INSERT...ON CONFLICT pattern - Remove in-memory nextID field and sync.Mutex from SQLiteStorage - Implement getNextIDForPrefix() for atomic ID generation - Update CreateIssue() to use database counter instead of memory - Update RemapCollisions() to use database counter for collision resolution - Clean up old planning and bug documentation files Fixes the multi-process ID generation race condition tested in cmd/bd/race_test.go.
TestImportCounterSyncAfterHighID demonstrates that importing an issue with a high explicit ID (bd-100) doesn't sync the auto-increment counter, causing the next auto-generated ID to be bd-4 instead of bd-101. This test currently fails and documents the expected behavior.
When importing issues with explicit high IDs (e.g., bd-100), the issue_counters table wasn't being updated. This caused the next auto-generated issue to collide with existing IDs (bd-4 instead of bd-101). Changes: - Add SyncAllCounters() to scan all issues and update counters atomically - Add SyncCounterForPrefix() for granular counter synchronization - Call SyncAllCounters() in import command after creating issues - Add comprehensive tests for counter sync functionality - Update TestImportCounterSyncAfterHighID to verify fix The fix uses a single efficient SQL query to prevent ID collisions with subsequently auto-generated issues.
Move counter sync from import to CreateIssue to handle parallel issue creation. This ensures the counter is always up-to-date before generating new IDs, preventing collisions when multiple processes create issues concurrently. Remove unused SyncCounterForPrefix method and its test.
steveyegge
added a commit
that referenced
this pull request
Oct 14, 2025
- Replace in-memory ID counter with atomic database-backed counter - Add issue_counters table for atomic, cross-process ID generation - Add SyncAllCounters() and SyncCounterForPrefix() methods - Preserve dirty_issues table and dirty tracking from main - Both features (atomic counter + dirty tracking) now work together - All tests passing Closes bd-62
steveyegge
added a commit
that referenced
this pull request
Oct 14, 2025
This commit addresses all critical follow-up issues identified in the code review of PR #8 (atomic counter implementation). ## bd-64: Fix SyncAllCounters performance bottleneck (P0) - Replace SyncAllCounters() on every CreateIssue with lazy initialization - Add ensureCounterInitialized() that only scans prefix-specific issues on first use - Performance improvement: O(n) full table scan → O(1) for subsequent creates - Add comprehensive tests in lazy_init_test.go ## bd-65: Add migration for issue_counters table (P1) - Add migrateIssueCountersTable() similar to migrateDirtyIssuesTable() - Checks if table is empty and syncs from existing issues on first open - Handles both fresh databases and migrations from old databases - Add comprehensive tests in migration_test.go (3 scenarios) ## bd-66: Make import counter sync failure fatal (P1) - Change SyncAllCounters() failure from warning to fatal error in import - Prevents ID collisions when counter sync fails - Data integrity > convenience ## bd-67: Update test comments (P2) - Update TestMultiProcessIDGeneration comments to reflect fix is in place - Change "With the bug, we expect errors" → "After the fix, all should succeed" All tests pass. Atomic counter implementation is now production-ready. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Owner
|
Thanks @v4rgas for this excellent fix! I've merged your changes with some critical follow-up improvements (bd-64, bd-65, bd-66, bd-67). All tests passing. The atomic counter implementation is now production-ready. Your original commits are preserved with your authorship. Thanks again for tackling this tricky race condition! |
Owner
Follow-up improvements included:bd-64 (P0): Fixed performance bottleneck
bd-65 (P1): Added migration for issue_counters table
bd-66 (P1): Made import counter sync failure fatal
bd-67 (P2): Updated test comments
All changes in commit 5c2cff4. Your commits are in main with full authorship preserved! |
steveyegge
added a commit
that referenced
this pull request
Oct 14, 2025
Release notes: - --deps flag for one-command issue creation (#18) - External reference tracking for linking to external trackers - Critical bug fixes (dep tree, auto-import, parallel creation) - Windows build support and Go extension examples - Community PRs merged (#8, #10, #12, #14, #15, #17) See CHANGELOG.md for full details.
rsnodgrass
added a commit
to rsnodgrass/beads
that referenced
this pull request
Jan 14, 2026
Security fixes from SECURITY_AUDIT.md: 1. JSONL file locking (#1 & steveyegge#3): Add flock-based shared/exclusive locking for JSONL reads (auto-import) and writes (export/flush) to prevent corruption when sync and daemon flush race. 2. Transaction tracking (steveyegge#2): Add atomic counter to track active transactions, ensuring reconnect waits for in-flight transactions. 3. Daemon crash lock cleanup (steveyegge#5): Use flock semantics instead of PID checking for stale lock detection. Add cleanup in panic handler. 4. Sync lock scope (steveyegge#6): Extend lock to cover entire sync operation including git pull/push, not just DB operations. 5. FlushManager shutdown timeout (steveyegge#7): Increase timeout from 30s to 120s for large databases. Add context-aware flush with progress logging. 6. WAL checkpoint error handling (steveyegge#8): Add 5-retry exponential backoff for checkpoint failures. Show user warning on persistent failure. Also includes comprehensive race condition test suite covering: - Concurrent export/import operations - Daemon lifecycle and lock handling - Sync/flush race scenarios - SQLite store concurrency Co-Authored-By: SageOx <[email protected]>
rsnodgrass
added a commit
to rsnodgrass/beads
that referenced
this pull request
Jan 23, 2026
Increase FlushManager shutdown timeout from 30s to 120s to accommodate large databases with 1000+ issues. Add WAL checkpoint retry logic with exponential backoff to prevent data loss when concurrent readers hold locks. Security fixes from SECURITY_AUDIT.md: - Issue steveyegge#7: Shutdown timeout causing pending writes to be lost - Issue steveyegge#8: WAL checkpoint failure silently losing data Changes: - Increase shutdownTimeout from 30s to 120s - Add 5-retry exponential backoff for WAL checkpoint on close - Warn user if checkpoint fails after retries - Add flush_manager_race_test.go for concurrency testing - Add wal_test.go for checkpoint with concurrent readers Co-Authored-By: SageOx <[email protected]>
2 tasks
rsnodgrass
added a commit
to rsnodgrass/beads
that referenced
this pull request
Jan 23, 2026
Add WAL checkpoint retry logic with exponential backoff to prevent data loss when concurrent readers hold locks during database close. Security fixes from SECURITY_AUDIT.md: - Issue steveyegge#8: WAL checkpoint failure silently losing data Changes: - Add 5-retry exponential backoff for WAL checkpoint in Close() - Warn user if checkpoint fails after retries (potential data loss) - Add flush_manager_race_test.go for concurrency testing - Add wal_test.go for checkpoint with concurrent readers Co-Authored-By: SageOx <[email protected]>
markthebest12
added a commit
to markthebest12/beads
that referenced
this pull request
Jan 23, 2026
Add --prefer-local, --prefer-gitlab, and --prefer-newer flags to bd gitlab sync: - --prefer-local: Always keep local beads version on conflict - --prefer-gitlab: Always use GitLab version on conflict - --prefer-newer: Use most recently updated version (default) Features: - Unified resolveGitLabConflicts() function with strategy parameter - Flag validation (only one conflict strategy allowed) - Automatic conflict detection and resolution in bidirectional sync - 7 new tests for conflict resolution Co-Authored-By: Claude Opus 4.5 <[email protected]>
markthebest12
added a commit
to markthebest12/beads
that referenced
this pull request
Jan 23, 2026
Add --prefer-local, --prefer-gitlab, and --prefer-newer flags to bd gitlab sync: - --prefer-local: Always keep local beads version on conflict - --prefer-gitlab: Always use GitLab version on conflict - --prefer-newer: Use most recently updated version (default) Features: - Unified resolveGitLabConflicts() function with strategy parameter - Flag validation (only one conflict strategy allowed) - Automatic conflict detection and resolution in bidirectional sync - 7 new tests for conflict resolution Co-Authored-By: Claude Opus 4.5 <[email protected]>
steveyegge
pushed a commit
that referenced
this pull request
Jan 31, 2026
Add --prefer-local, --prefer-gitlab, and --prefer-newer flags to bd gitlab sync: - --prefer-local: Always keep local beads version on conflict - --prefer-gitlab: Always use GitLab version on conflict - --prefer-newer: Use most recently updated version (default) Features: - Unified resolveGitLabConflicts() function with strategy parameter - Flag validation (only one conflict strategy allowed) - Automatic conflict detection and resolution in bidirectional sync - 7 new tests for conflict resolution Co-Authored-By: Claude Opus 4.5 <[email protected]> Executed-By: beads/crew/emma Rig: beads Role: crew
steveyegge
pushed a commit
that referenced
this pull request
Feb 5, 2026
Add WAL checkpoint retry logic with exponential backoff to prevent data loss when concurrent readers hold locks during database close. Security fixes from SECURITY_AUDIT.md: - Issue #8: WAL checkpoint failure silently losing data Changes: - Add 5-retry exponential backoff for WAL checkpoint in Close() - Warn user if checkpoint fails after retries (potential data loss) - Add flush_manager_race_test.go for concurrency testing - Add wal_test.go for checkpoint with concurrent readers Co-Authored-By: SageOx <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes race condition where multiple
bd createprocesses fail with UNIQUE constraint errors.Problem
Multiple processes reading the same max ID from the database try to create issues with the same ID, causing UNIQUE constraint violations.
After importing issues with high IDs (e.g., bd-100), the auto-increment counter wasn't synced, so new issues would try to use bd-4, bd-5, etc., colliding with existing issues.
Solution
Tests
TestMultiProcessIDGeneration- 20 concurrent processes creating issuesTestImportCounterSyncAfterHighID- Import bd-100, verify next auto-generated is bd-101Closes #6