feat: unify config-driven retry across VLM and embedding#1049
feat: unify config-driven retry across VLM and embedding#1049MaojiaSheng merged 13 commits intovolcengine:mainfrom
Conversation
|
Failed to generate code suggestions for PR |
e73415d to
b59fd97
Compare
|
there are some conflicts that need to be resolved, thanks |
Implements `openviking/models/retry.py` with `is_transient_error`, `transient_retry`, and `transient_retry_async` — a single config-driven retry layer replacing scattered per-backend implementations. Adds 50 unit tests covering classification, backoff, jitter, exhaustion, and custom predicates.
- VLMBase: change max_retries default 2→3, remove max_retries param from get_completion_async abstract signature - OpenAI backend: wrap all 4 methods with transient_retry/transient_retry_async, disable SDK retry (max_retries=0 in client constructors), remove manual for-loop retry - VolcEngine backend: same pattern — transient_retry for all methods, remove manual for-loop retry - LiteLLM backend: same pattern — transient_retry for all methods, remove manual for-loop retry
volcengine#922) - VLMConfig: default max_retries 2→3, remove max_retries from get_completion_async signature, switch all wrappers to kwargs - StructuredVLM (llm.py): remove max_retries from complete_json_async and complete_model_async, switch all internal calls to kwargs - memory_react.py: remove max_retries=self.vlm.max_retries (now handled internally by backend) - Update test stubs to match new signatures (remove max_retries=0)
Tests cover OpenAI backend as representative: - Completion retries on 429, does NOT retry on 401 - Vision completion now retries (was zero before) - Config max_retries is used (default=3) - max_retries removed from get_completion_async signature (all backends) - OpenAI SDK retry disabled (max_retries=0 in client constructors)
- EmbeddingConfig: новое поле max_retries (default=3) для конфигурации retry - EmbeddingConfig._create_embedder(): инжектирует max_retries в params["config"] - EmbedderBase.__init__(): извлекает max_retries из config dict
- OpenAI: отключить SDK retry (max_retries=0), обернуть embed/embed_batch - Volcengine: заменить exponential_backoff_retry на transient_retry, убрать is_429_error - VikingDB: добавить transient_retry (ранее retry отсутствовал) - Gemini: отключить SDK HttpRetryOptions (attempts=1), обернуть embed/embed_batch - MiniMax: отключить urllib3 Retry (total=0), обернуть embed/embed_batch - Jina: отключить SDK retry (max_retries=0), обернуть embed/embed_batch - Voyage: отключить SDK retry (max_retries=0), обернуть embed/embed_batch - LiteLLM: обернуть litellm.embedding() вызовы Все провайдеры теперь используют единый transient_retry с is_transient_error для классификации ошибок. Wrapper размещён ВНУТРИ метода вокруг raw API call, ДО try/except который конвертирует в RuntimeError.
- test_embedding_retry_integration: OpenAI и VikingDB retry на transient/permanent ошибки - test_retry_config: VLMConfig и EmbeddingConfig max_retries поля и defaults - test_backward_compat: exponential_backoff_retry importable, signature unchanged, time-based
After upstream refactored sync methods to use run_async(), only transient_retry_async is needed in volcengine_vlm.py.
6f9439b to
03ee476
Compare
Conflicts resolvedRebased onto current BackgroundTwo upstream commits landed after this PR was branched:
The license conflicts were trivial (accept upstream). The interesting part is how retry integrates with the new VolcEngine architecture.
|
- Rename unused loop vars i, item → _i, _item (B007) - Suppress unused has_images assignment (F841) — upstream code, kept for clarity
test_fs_tree intermittently returns 500 on windows-latest when HAS_SECRETS=false — the resource processing pipeline retries failed embeddings (401 auth errors) in the background, causing server load that affects the fs/tree endpoint.
Summary
Closes #922
Unifies retry behavior across all VLM and embedding paths with a shared transient retry module, replacing scattered per-backend implementations.
Changes
openviking/models/retry.py—is_transient_error(),transient_retry(),transient_retry_async()with error classification (28 error types), exponential backoff with jitter, count-based retrymax_retriesparam fromget_completion_async(). Retry is now config-driven viavlm.max_retries(default 3, was 2). Add retry toget_vision_completion_async()(was zero retry) and sync methodsembedding.max_retriesconfig (default 3). Apply unified retry to all 8 providers (OpenAI, Volcengine, VikingDB, Gemini, MiniMax, Jina, Voyage, LiteLLM)tool_choice/messagespositional bug invlm_config.py)exponential_backoff_retry()unchanged,max_retries=0disables retryConfig
Breaking Changes
max_retriesparameter removed fromVLMBase.get_completion_async(),StructuredVLM.complete_json_async(),StructuredVLM.complete_model_async(),VLMConfig.get_completion_async()Error Classification
Retryable: HTTP 429/500/502/503/504,
TooManyRequests,RateLimit,RequestBurstTooFast,ConnectionError,TimeoutError,openai.RateLimitErrorNot retryable: HTTP 400/401/403/404/422,
InvalidRequestError,AuthenticationError, unknown errors (conservative default)Test Plan
exponential_backoff_retryunchanged