-
Notifications
You must be signed in to change notification settings - Fork 8
Embedding step hangs on CPU-only systems #6
Copy link
Copy link
Open
Description
What I observed:
When running docs2db pipeline on my system, the embedding step got stuck at 0% for 4+ hours. I'm using a CPU-only setup (Intel GPU + Linux, PyTorch CPU version).
System details:
- OS: Arch Linux
- CPU: x86_64 (Intel)
- GPU: Intel UHD Graphics 620 (not supported by PyTorch on Linux)
- PyTorch: CPU-only version
What I tried:
Running docs2db embed --workers 1 fixed the issue and completed successfully.
Questions:
- Is this expected behavior for CPU-only systems?
- The
--workers 1flag was added in v0.4.3 - should this be mentioned more prominently in docs? - I noticed the default is still 2 workers - would it make sense to auto-detect CPU-only and default to 1 worker, or add a warning message?
- Without the
--workers 1flag, users might not realize the option exists and could get stuck waiting indefinitely
Possible improvements (just suggestions):
- Change default to 1 worker for CPU-only systems, OR
- Add a warning message like "CPU-only detected, consider using
--workers 1", OR - Add a note in README/Getting Started about this flag
Just wanted to share this observation - maybe it's helpful for improving the user experience!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels