Skip to content

[Bug] add-resource sends oversized input to OpenAI embeddings API - missing chunk size validation #634

@Financier-Nuri

Description

@Financier-Nuri

Bug Description

When importing a GitHub repository with ov add-resource, semantic processing fails during embedding because chunks exceed the embedding model's token limit. The server should validate and split content before sending to the API.

Steps to Reproduce

  1. Start the OpenViking server
  2. Run: ov add-resource https://github.com/volcengine/OpenViking --wait
  3. Check imported tree: ov ls viking://resources/ -l 256 -n 256
  4. Inspect generated semantic files: ov cat viking://resources/volcengine/.overview.md
  5. Observe empty/default content

Expected Behavior

OpenViking should:

  1. Chunk/truncate content before sending to embedding provider
  2. Handle oversized content gracefully
  3. Log which file/resource failed

Actual Behavior

  • Chunks exceed 8192 tokens (model limit)
  • OpenAI returns: This model's maximum context length is 8192 tokens, however you requested 13327 tokens
  • Generated semantic artifacts (.overview.md, .abstract.md) remain empty/default

Root Cause Analysis

Code Location: openviking/models/embedder/openai_embedders.py:99-104

Problem:

  1. The embedder accepts any size text without validation
  2. No chunking logic is applied before the API call
  3. No graceful degradation when embedding fails
  4. Silent failure leaves semantic artifacts empty

Environment

  • OpenViking: 0.2.6
  • Python: 3.13.7
  • OS: Windows
  • Model Backend: OpenAI

Proposed Solution

Option 1: Add chunking logic

Option 2: Add validation with truncation warning

Option 3: Add graceful error handling

Additional Context

  • This affects all large file imports
  • Users cannot import repositories with large files without manual preprocessing

Labels

bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions