Skip to content

Fix embedding hang on CPU-only systems by auto-detectin optimal workers#8

Open
sheikhlimon wants to merge 1 commit intorhel-lightspeed:mainfrom
sheikhlimon:fix/cpu-workers-default
Open

Fix embedding hang on CPU-only systems by auto-detectin optimal workers#8
sheikhlimon wants to merge 1 commit intorhel-lightspeed:mainfrom
sheikhlimon:fix/cpu-workers-default

Conversation

@sheikhlimon
Copy link
Copy Markdown
Contributor

Fixes #6 - Embedding step hangs on CPU-only systems.

  • Auto-detect CPU-only systems and default to 1 worker (avoids PyTorch fork deadlocks)
  • Default to 2 workers on GPU systems for parallel processing
  • Warn users who manually set --workers 2+ on CPU-only systems

Changes

  • src/docs2db/embed.py: Added CPU detection and smart worker defaults

Testing

Command Result
docs2db pipeline tests/fixtures/input --skip-context ✅ Works (auto-detects CPU, uses 1 worker)
docs2db embed --workers 2 ✅ Shows warning, still allows user choice

- Default to 1 worker on CPU-only systems (avoids PyTorch fork deadlocks)
- Default to 2 workers on GPU systems (parallel processing)
- Warn users who manually set --workers 2+ on CPU-only systems

Co-Authored-By: Claude (glm-5)
@sheikhlimon sheikhlimon requested a review from a team as a code owner April 1, 2026 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Embedding step hangs on CPU-only systems

1 participant