-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Labels
enhancementNew feature or requestNew feature or request
Description
Problem Statement
Retry behavior for model calls is currently inconsistent across OpenViking.
- Some VLM paths go through
VLMConfigand inheritvlm.max_retries, while others call backend instances directly and bypass that default. - Embedding providers do not share a single retry contract today: some have custom retry logic, some have fixed SDK-level retries, and some have no config-driven retry behavior.
- This makes rate limiting and other transient failures unevenly handled across semantic processing, parsing, and embedding flows.
Proposed Solution
Unify retry at the model-call layer for both VLM and embedding.
Proposed shape:
- Keep the config surface minimal.
- Use
vlm.max_retriesand addembedding.max_retries. - Default both to
3. - Treat
max_retries = 0as retry disabled. - Remove function-level
max_retriesparameters from VLM interfaces and make retry fully config-driven. - Apply a shared transient retry policy at the backend/provider layer for:
- VLM text + vision, sync + async
- embedding
embed()+embed_batch()across providers
- Retry only known transient errors such as
429,5xx,TooManyRequests,RateLimit,RequestBurstTooFast, timeout, and connection-reset/refused scenarios. - Do not retry permanent errors such as
400,401,403, or account/billing failures.
Alternatives Considered
- Add business-layer wrappers such as
_llm_with_retry()in each call site.
This is useful as a local patch, but it does not scale and will keep producing inconsistent behavior across modules. - Keep retry configurable per-call via function parameters.
We decided against this because these are internal call paths and a config-driven bottom-layer policy is simpler and more consistent. - Add a larger retry policy abstraction with many config knobs.
Rejected for now in favor of a minimalmax_retries-only design.
Feature Area
Model Integration
Use Case
OpenViking should handle transient model failures consistently regardless of whether a request originates from semantic indexing, structured VLM parsing, or embedding generation. Users should be able to control retry behavior from config without having to rely on module-specific wrappers or implementation details.
Example API (Optional)
# config-driven only
vlm:
max_retries: 3
embedding:
max_retries: 3Additional Context
This came up while reviewing PR #889. The local fix in that PR is useful, but we want to address the root cause by unifying retry semantics at the model backend layer instead of adding more module-specific wrappers.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Type
Projects
Status
In progress