Fix parallel issue creation race condition by v4rgas · Pull Request #8 · steveyegge/beads

v4rgas · 2025-10-13T21:58:08Z

Fixes race condition where multiple bd create processes fail with UNIQUE constraint errors.

Problem

Multiple processes reading the same max ID from the database try to create issues with the same ID, causing UNIQUE constraint violations.

After importing issues with high IDs (e.g., bd-100), the auto-increment counter wasn't synced, so new issues would try to use bd-4, bd-5, etc., colliding with existing issues.

Solution

Use atomic database counter instead of in-memory counter
Sync counter on every CreateIssue call before generating IDs (a migration would probably be cleaner, but this is fast to implement and works retroactively for any existing database)

Tests

TestMultiProcessIDGeneration - 20 concurrent processes creating issues
TestImportCounterSyncAfterHighID - Import bd-100, verify next auto-generated is bd-101

Closes #6

Add TestMultiProcessIDGeneration to reproduce the bug where multiple bd create processes fail with UNIQUE constraint errors when run simultaneously. Each goroutine opens a separate database connection to simulate independent processes. Test currently fails with 17/20 processes getting UNIQUE constraint errors, confirming the race condition in the in-memory ID counter.

Replace the in-memory nextID counter with an atomic database-backed counter using the issue_counters table. This fixes race conditions when multiple processes create issues concurrently. Changes: - Add issue_counters table with atomic INSERT...ON CONFLICT pattern - Remove in-memory nextID field and sync.Mutex from SQLiteStorage - Implement getNextIDForPrefix() for atomic ID generation - Update CreateIssue() to use database counter instead of memory - Update RemapCollisions() to use database counter for collision resolution - Clean up old planning and bug documentation files Fixes the multi-process ID generation race condition tested in cmd/bd/race_test.go.

TestImportCounterSyncAfterHighID demonstrates that importing an issue with a high explicit ID (bd-100) doesn't sync the auto-increment counter, causing the next auto-generated ID to be bd-4 instead of bd-101. This test currently fails and documents the expected behavior.

When importing issues with explicit high IDs (e.g., bd-100), the issue_counters table wasn't being updated. This caused the next auto-generated issue to collide with existing IDs (bd-4 instead of bd-101). Changes: - Add SyncAllCounters() to scan all issues and update counters atomically - Add SyncCounterForPrefix() for granular counter synchronization - Call SyncAllCounters() in import command after creating issues - Add comprehensive tests for counter sync functionality - Update TestImportCounterSyncAfterHighID to verify fix The fix uses a single efficient SQL query to prevent ID collisions with subsequently auto-generated issues.

…se 6)

Move counter sync from import to CreateIssue to handle parallel issue creation. This ensures the counter is always up-to-date before generating new IDs, preventing collisions when multiple processes create issues concurrently. Remove unused SyncCounterForPrefix method and its test.

- Replace in-memory ID counter with atomic database-backed counter - Add issue_counters table for atomic, cross-process ID generation - Add SyncAllCounters() and SyncCounterForPrefix() methods - Preserve dirty_issues table and dirty tracking from main - Both features (atomic counter + dirty tracking) now work together - All tests passing Closes bd-62

This commit addresses all critical follow-up issues identified in the code review of PR #8 (atomic counter implementation). ## bd-64: Fix SyncAllCounters performance bottleneck (P0) - Replace SyncAllCounters() on every CreateIssue with lazy initialization - Add ensureCounterInitialized() that only scans prefix-specific issues on first use - Performance improvement: O(n) full table scan → O(1) for subsequent creates - Add comprehensive tests in lazy_init_test.go ## bd-65: Add migration for issue_counters table (P1) - Add migrateIssueCountersTable() similar to migrateDirtyIssuesTable() - Checks if table is empty and syncs from existing issues on first open - Handles both fresh databases and migrations from old databases - Add comprehensive tests in migration_test.go (3 scenarios) ## bd-66: Make import counter sync failure fatal (P1) - Change SyncAllCounters() failure from warning to fatal error in import - Prevents ID collisions when counter sync fails - Data integrity > convenience ## bd-67: Update test comments (P2) - Update TestMultiProcessIDGeneration comments to reflect fix is in place - Change "With the bug, we expect errors" → "After the fix, all should succeed" All tests pass. Atomic counter implementation is now production-ready. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

steveyegge · 2025-10-14T09:02:44Z

Thanks @v4rgas for this excellent fix! I've merged your changes with some critical follow-up improvements (bd-64, bd-65, bd-66, bd-67). All tests passing. The atomic counter implementation is now production-ready. Your original commits are preserved with your authorship. Thanks again for tackling this tricky race condition!

steveyegge · 2025-10-14T09:02:56Z

Follow-up improvements included:

bd-64 (P0): Fixed performance bottleneck

Replaced SyncAllCounters() on every CreateIssue with lazy initialization
Only scans prefix-specific issues on first use
Performance: O(n) full table scan → O(1) for subsequent creates

bd-65 (P1): Added migration for issue_counters table

Existing databases now auto-sync counters on first open
Prevents ID collisions on migrated databases
Comprehensive migration tests

bd-66 (P1): Made import counter sync failure fatal

Changed from warning to error to prevent data corruption
Data integrity > convenience

bd-67 (P2): Updated test comments

Removed confusing 'with the bug' comments
Now accurately reflects that fix is in place

All changes in commit 5c2cff4. Your commits are in main with full authorship preserved!

Release notes: - --deps flag for one-command issue creation (#18) - External reference tracking for linking to external trackers - Critical bug fixes (dep tree, auto-import, parallel creation) - Windows build support and Go extension examples - Community PRs merged (#8, #10, #12, #14, #15, #17) See CHANGELOG.md for full details.

Security fixes from SECURITY_AUDIT.md: 1. JSONL file locking (#1 & steveyegge#3): Add flock-based shared/exclusive locking for JSONL reads (auto-import) and writes (export/flush) to prevent corruption when sync and daemon flush race. 2. Transaction tracking (steveyegge#2): Add atomic counter to track active transactions, ensuring reconnect waits for in-flight transactions. 3. Daemon crash lock cleanup (steveyegge#5): Use flock semantics instead of PID checking for stale lock detection. Add cleanup in panic handler. 4. Sync lock scope (steveyegge#6): Extend lock to cover entire sync operation including git pull/push, not just DB operations. 5. FlushManager shutdown timeout (steveyegge#7): Increase timeout from 30s to 120s for large databases. Add context-aware flush with progress logging. 6. WAL checkpoint error handling (steveyegge#8): Add 5-retry exponential backoff for checkpoint failures. Show user warning on persistent failure. Also includes comprehensive race condition test suite covering: - Concurrent export/import operations - Daemon lifecycle and lock handling - Sync/flush race scenarios - SQLite store concurrency Co-Authored-By: SageOx <[email protected]>

Increase FlushManager shutdown timeout from 30s to 120s to accommodate large databases with 1000+ issues. Add WAL checkpoint retry logic with exponential backoff to prevent data loss when concurrent readers hold locks. Security fixes from SECURITY_AUDIT.md: - Issue steveyegge#7: Shutdown timeout causing pending writes to be lost - Issue steveyegge#8: WAL checkpoint failure silently losing data Changes: - Increase shutdownTimeout from 30s to 120s - Add 5-retry exponential backoff for WAL checkpoint on close - Warn user if checkpoint fails after retries - Add flush_manager_race_test.go for concurrency testing - Add wal_test.go for checkpoint with concurrent readers Co-Authored-By: SageOx <[email protected]>

Add WAL checkpoint retry logic with exponential backoff to prevent data loss when concurrent readers hold locks during database close. Security fixes from SECURITY_AUDIT.md: - Issue steveyegge#8: WAL checkpoint failure silently losing data Changes: - Add 5-retry exponential backoff for WAL checkpoint in Close() - Warn user if checkpoint fails after retries (potential data loss) - Add flush_manager_race_test.go for concurrency testing - Add wal_test.go for checkpoint with concurrent readers Co-Authored-By: SageOx <[email protected]>

Add --prefer-local, --prefer-gitlab, and --prefer-newer flags to bd gitlab sync: - --prefer-local: Always keep local beads version on conflict - --prefer-gitlab: Always use GitLab version on conflict - --prefer-newer: Use most recently updated version (default) Features: - Unified resolveGitLabConflicts() function with strategy parameter - Flag validation (only one conflict strategy allowed) - Automatic conflict detection and resolution in bidirectional sync - 7 new tests for conflict resolution Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add --prefer-local, --prefer-gitlab, and --prefer-newer flags to bd gitlab sync: - --prefer-local: Always keep local beads version on conflict - --prefer-gitlab: Always use GitLab version on conflict - --prefer-newer: Use most recently updated version (default) Features: - Unified resolveGitLabConflicts() function with strategy parameter - Flag validation (only one conflict strategy allowed) - Automatic conflict detection and resolution in bidirectional sync - 7 new tests for conflict resolution Co-Authored-By: Claude Opus 4.5 <[email protected]> Executed-By: beads/crew/emma Rig: beads Role: crew

Add WAL checkpoint retry logic with exponential backoff to prevent data loss when concurrent readers hold locks during database close. Security fixes from SECURITY_AUDIT.md: - Issue #8: WAL checkpoint failure silently losing data Changes: - Add 5-retry exponential backoff for WAL checkpoint in Close() - Warn user if checkpoint fails after retries (potential data loss) - Add flush_manager_race_test.go for concurrency testing - Add wal_test.go for checkpoint with concurrent readers Co-Authored-By: SageOx <[email protected]>

v4rgas added 5 commits October 13, 2025 18:54

chore: restore .beads folder from main

b2b75f1

v4rgas marked this pull request as ready for review October 13, 2025 23:04

fix: renumber phases in import.go (Phase 4.5 → Phase 5, Phase 5 → Pha…

1e1ae3f

…se 6)

v4rgas marked this pull request as draft October 13, 2025 23:17

v4rgas marked this pull request as ready for review October 13, 2025 23:31

steveyegge closed this Oct 14, 2025

steveyegge mentioned this pull request Oct 14, 2025

Rapid issue creation leads to errors #6

Closed

rsnodgrass mentioned this pull request Jan 23, 2026

fix: add WAL checkpoint robustness with retry logic #1281

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix parallel issue creation race condition#8

Fix parallel issue creation race condition#8
v4rgas wants to merge 7 commits intosteveyegge:mainfrom
v4rgas:fix/parallel-issue-creation-race

v4rgas commented Oct 13, 2025 •

edited

Loading

Uh oh!

steveyegge commented Oct 14, 2025

Uh oh!

steveyegge commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

v4rgas commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Tests

Uh oh!

steveyegge commented Oct 14, 2025

Uh oh!

steveyegge commented Oct 14, 2025

Follow-up improvements included:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

v4rgas commented Oct 13, 2025 •

edited

Loading