Skip to content

Embedding step hangs on CPU-only systems #6

@sheikhlimon

Description

@sheikhlimon

What I observed:

When running docs2db pipeline on my system, the embedding step got stuck at 0% for 4+ hours. I'm using a CPU-only setup (Intel GPU + Linux, PyTorch CPU version).

System details:

  • OS: Arch Linux
  • CPU: x86_64 (Intel)
  • GPU: Intel UHD Graphics 620 (not supported by PyTorch on Linux)
  • PyTorch: CPU-only version

What I tried:

Running docs2db embed --workers 1 fixed the issue and completed successfully.

Questions:

  1. Is this expected behavior for CPU-only systems?
  2. The --workers 1 flag was added in v0.4.3 - should this be mentioned more prominently in docs?
  3. I noticed the default is still 2 workers - would it make sense to auto-detect CPU-only and default to 1 worker, or add a warning message?
  4. Without the --workers 1 flag, users might not realize the option exists and could get stuck waiting indefinitely

Possible improvements (just suggestions):

  • Change default to 1 worker for CPU-only systems, OR
  • Add a warning message like "CPU-only detected, consider using --workers 1", OR
  • Add a note in README/Getting Started about this flag

Just wanted to share this observation - maybe it's helpful for improving the user experience!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions