Skip to content

Comments

Fix parallel issue creation race condition#8

Closed
v4rgas wants to merge 7 commits intosteveyegge:mainfrom
v4rgas:fix/parallel-issue-creation-race
Closed

Fix parallel issue creation race condition#8
v4rgas wants to merge 7 commits intosteveyegge:mainfrom
v4rgas:fix/parallel-issue-creation-race

Conversation

@v4rgas
Copy link
Contributor

@v4rgas v4rgas commented Oct 13, 2025

Fixes race condition where multiple bd create processes fail with UNIQUE constraint errors.

Problem

Multiple processes reading the same max ID from the database try to create issues with the same ID, causing UNIQUE constraint violations.

After importing issues with high IDs (e.g., bd-100), the auto-increment counter wasn't synced, so new issues would try to use bd-4, bd-5, etc., colliding with existing issues.

Solution

  • Use atomic database counter instead of in-memory counter
  • Sync counter on every CreateIssue call before generating IDs (a migration would probably be cleaner, but this is fast to implement and works retroactively for any existing database)

Tests

  • TestMultiProcessIDGeneration - 20 concurrent processes creating issues
  • TestImportCounterSyncAfterHighID - Import bd-100, verify next auto-generated is bd-101

Closes #6

Add TestMultiProcessIDGeneration to reproduce the bug where multiple
bd create processes fail with UNIQUE constraint errors when run
simultaneously. Each goroutine opens a separate database connection
to simulate independent processes.

Test currently fails with 17/20 processes getting UNIQUE constraint
errors, confirming the race condition in the in-memory ID counter.
Replace the in-memory nextID counter with an atomic database-backed
counter using the issue_counters table. This fixes race conditions
when multiple processes create issues concurrently.

Changes:
- Add issue_counters table with atomic INSERT...ON CONFLICT pattern
- Remove in-memory nextID field and sync.Mutex from SQLiteStorage
- Implement getNextIDForPrefix() for atomic ID generation
- Update CreateIssue() to use database counter instead of memory
- Update RemapCollisions() to use database counter for collision resolution
- Clean up old planning and bug documentation files

Fixes the multi-process ID generation race condition tested in
cmd/bd/race_test.go.
TestImportCounterSyncAfterHighID demonstrates that importing an issue
with a high explicit ID (bd-100) doesn't sync the auto-increment counter,
causing the next auto-generated ID to be bd-4 instead of bd-101.

This test currently fails and documents the expected behavior.
When importing issues with explicit high IDs (e.g., bd-100), the
issue_counters table wasn't being updated. This caused the next
auto-generated issue to collide with existing IDs (bd-4 instead of bd-101).

Changes:
- Add SyncAllCounters() to scan all issues and update counters atomically
- Add SyncCounterForPrefix() for granular counter synchronization
- Call SyncAllCounters() in import command after creating issues
- Add comprehensive tests for counter sync functionality
- Update TestImportCounterSyncAfterHighID to verify fix

The fix uses a single efficient SQL query to prevent ID collisions
with subsequently auto-generated issues.
@v4rgas v4rgas marked this pull request as ready for review October 13, 2025 23:04
@v4rgas v4rgas marked this pull request as draft October 13, 2025 23:17
Move counter sync from import to CreateIssue to handle parallel issue creation.
This ensures the counter is always up-to-date before generating new IDs,
preventing collisions when multiple processes create issues concurrently.

Remove unused SyncCounterForPrefix method and its test.
@v4rgas v4rgas marked this pull request as ready for review October 13, 2025 23:31
steveyegge added a commit that referenced this pull request Oct 14, 2025
- Replace in-memory ID counter with atomic database-backed counter
- Add issue_counters table for atomic, cross-process ID generation
- Add SyncAllCounters() and SyncCounterForPrefix() methods
- Preserve dirty_issues table and dirty tracking from main
- Both features (atomic counter + dirty tracking) now work together
- All tests passing

Closes bd-62
steveyegge added a commit that referenced this pull request Oct 14, 2025
This commit addresses all critical follow-up issues identified in the
code review of PR #8 (atomic counter implementation).

## bd-64: Fix SyncAllCounters performance bottleneck (P0)
- Replace SyncAllCounters() on every CreateIssue with lazy initialization
- Add ensureCounterInitialized() that only scans prefix-specific issues on first use
- Performance improvement: O(n) full table scan → O(1) for subsequent creates
- Add comprehensive tests in lazy_init_test.go

## bd-65: Add migration for issue_counters table (P1)
- Add migrateIssueCountersTable() similar to migrateDirtyIssuesTable()
- Checks if table is empty and syncs from existing issues on first open
- Handles both fresh databases and migrations from old databases
- Add comprehensive tests in migration_test.go (3 scenarios)

## bd-66: Make import counter sync failure fatal (P1)
- Change SyncAllCounters() failure from warning to fatal error in import
- Prevents ID collisions when counter sync fails
- Data integrity > convenience

## bd-67: Update test comments (P2)
- Update TestMultiProcessIDGeneration comments to reflect fix is in place
- Change "With the bug, we expect errors" → "After the fix, all should succeed"

All tests pass. Atomic counter implementation is now production-ready.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@steveyegge
Copy link
Owner

Thanks @v4rgas for this excellent fix! I've merged your changes with some critical follow-up improvements (bd-64, bd-65, bd-66, bd-67). All tests passing. The atomic counter implementation is now production-ready. Your original commits are preserved with your authorship. Thanks again for tackling this tricky race condition!

@steveyegge steveyegge closed this Oct 14, 2025
@steveyegge
Copy link
Owner

Follow-up improvements included:

bd-64 (P0): Fixed performance bottleneck

  • Replaced SyncAllCounters() on every CreateIssue with lazy initialization
  • Only scans prefix-specific issues on first use
  • Performance: O(n) full table scan → O(1) for subsequent creates

bd-65 (P1): Added migration for issue_counters table

  • Existing databases now auto-sync counters on first open
  • Prevents ID collisions on migrated databases
  • Comprehensive migration tests

bd-66 (P1): Made import counter sync failure fatal

  • Changed from warning to error to prevent data corruption
  • Data integrity > convenience

bd-67 (P2): Updated test comments

  • Removed confusing 'with the bug' comments
  • Now accurately reflects that fix is in place

All changes in commit 5c2cff4. Your commits are in main with full authorship preserved!

steveyegge added a commit that referenced this pull request Oct 14, 2025
Release notes:
- --deps flag for one-command issue creation (#18)
- External reference tracking for linking to external trackers
- Critical bug fixes (dep tree, auto-import, parallel creation)
- Windows build support and Go extension examples
- Community PRs merged (#8, #10, #12, #14, #15, #17)

See CHANGELOG.md for full details.
rsnodgrass added a commit to rsnodgrass/beads that referenced this pull request Jan 14, 2026
Security fixes from SECURITY_AUDIT.md:

1. JSONL file locking (#1 & steveyegge#3): Add flock-based shared/exclusive locking
   for JSONL reads (auto-import) and writes (export/flush) to prevent
   corruption when sync and daemon flush race.

2. Transaction tracking (steveyegge#2): Add atomic counter to track active
   transactions, ensuring reconnect waits for in-flight transactions.

3. Daemon crash lock cleanup (steveyegge#5): Use flock semantics instead of PID
   checking for stale lock detection. Add cleanup in panic handler.

4. Sync lock scope (steveyegge#6): Extend lock to cover entire sync operation
   including git pull/push, not just DB operations.

5. FlushManager shutdown timeout (steveyegge#7): Increase timeout from 30s to 120s
   for large databases. Add context-aware flush with progress logging.

6. WAL checkpoint error handling (steveyegge#8): Add 5-retry exponential backoff
   for checkpoint failures. Show user warning on persistent failure.

Also includes comprehensive race condition test suite covering:
- Concurrent export/import operations
- Daemon lifecycle and lock handling
- Sync/flush race scenarios
- SQLite store concurrency

Co-Authored-By: SageOx <[email protected]>
rsnodgrass added a commit to rsnodgrass/beads that referenced this pull request Jan 23, 2026
Increase FlushManager shutdown timeout from 30s to 120s to accommodate
large databases with 1000+ issues. Add WAL checkpoint retry logic with
exponential backoff to prevent data loss when concurrent readers hold locks.

Security fixes from SECURITY_AUDIT.md:
- Issue steveyegge#7: Shutdown timeout causing pending writes to be lost
- Issue steveyegge#8: WAL checkpoint failure silently losing data

Changes:
- Increase shutdownTimeout from 30s to 120s
- Add 5-retry exponential backoff for WAL checkpoint on close
- Warn user if checkpoint fails after retries
- Add flush_manager_race_test.go for concurrency testing
- Add wal_test.go for checkpoint with concurrent readers

Co-Authored-By: SageOx <[email protected]>
rsnodgrass added a commit to rsnodgrass/beads that referenced this pull request Jan 23, 2026
Add WAL checkpoint retry logic with exponential backoff to prevent data
loss when concurrent readers hold locks during database close.

Security fixes from SECURITY_AUDIT.md:
- Issue steveyegge#8: WAL checkpoint failure silently losing data

Changes:
- Add 5-retry exponential backoff for WAL checkpoint in Close()
- Warn user if checkpoint fails after retries (potential data loss)
- Add flush_manager_race_test.go for concurrency testing
- Add wal_test.go for checkpoint with concurrent readers

Co-Authored-By: SageOx <[email protected]>
markthebest12 added a commit to markthebest12/beads that referenced this pull request Jan 23, 2026
Add --prefer-local, --prefer-gitlab, and --prefer-newer flags to bd gitlab sync:
- --prefer-local: Always keep local beads version on conflict
- --prefer-gitlab: Always use GitLab version on conflict
- --prefer-newer: Use most recently updated version (default)

Features:
- Unified resolveGitLabConflicts() function with strategy parameter
- Flag validation (only one conflict strategy allowed)
- Automatic conflict detection and resolution in bidirectional sync
- 7 new tests for conflict resolution

Co-Authored-By: Claude Opus 4.5 <[email protected]>
markthebest12 added a commit to markthebest12/beads that referenced this pull request Jan 23, 2026
Add --prefer-local, --prefer-gitlab, and --prefer-newer flags to bd gitlab sync:
- --prefer-local: Always keep local beads version on conflict
- --prefer-gitlab: Always use GitLab version on conflict
- --prefer-newer: Use most recently updated version (default)

Features:
- Unified resolveGitLabConflicts() function with strategy parameter
- Flag validation (only one conflict strategy allowed)
- Automatic conflict detection and resolution in bidirectional sync
- 7 new tests for conflict resolution

Co-Authored-By: Claude Opus 4.5 <[email protected]>
steveyegge pushed a commit that referenced this pull request Jan 31, 2026
Add --prefer-local, --prefer-gitlab, and --prefer-newer flags to bd gitlab sync:
- --prefer-local: Always keep local beads version on conflict
- --prefer-gitlab: Always use GitLab version on conflict
- --prefer-newer: Use most recently updated version (default)

Features:
- Unified resolveGitLabConflicts() function with strategy parameter
- Flag validation (only one conflict strategy allowed)
- Automatic conflict detection and resolution in bidirectional sync
- 7 new tests for conflict resolution

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Executed-By: beads/crew/emma
Rig: beads
Role: crew
steveyegge pushed a commit that referenced this pull request Feb 5, 2026
Add WAL checkpoint retry logic with exponential backoff to prevent data
loss when concurrent readers hold locks during database close.

Security fixes from SECURITY_AUDIT.md:
- Issue #8: WAL checkpoint failure silently losing data

Changes:
- Add 5-retry exponential backoff for WAL checkpoint in Close()
- Warn user if checkpoint fails after retries (potential data loss)
- Add flush_manager_race_test.go for concurrency testing
- Add wal_test.go for checkpoint with concurrent readers

Co-Authored-By: SageOx <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rapid issue creation leads to errors

2 participants