-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Bug Description
Summary
When configuring OpenViking with provider: jina and model: jina-code-embeddings-1.5b, the server fails at startup with a 422 Validation Error from the Jina API. The root cause is that OpenViking internally passes task: "retrieval.query" to the Jina embeddings endpoint, but jina-code-embeddings-1.5b is a code-specialized model that only accepts code-specific task types.
Environment
- OpenViking version: latest (pip)
- OS: macOS (Apple M4, arm64)
- Python: 3.12.11 (pyenv)
- Jina model:
jina-code-embeddings-1.5b
Expected Behavior
OpenViking should either:
- Auto-detect the correct task type based on the model name (e.g. when model is
jina-code-embeddings-*, usenl2code.query/nl2code.passageas defaults) - Expose a config field (e.g.
task_type) so users can explicitly set the task prefix inov.conf
Suggested Fix
In embedding_config.py or jina_embedders.py, add model-aware task mapping:
CODE_EMBEDDING_MODELS = {"jina-code-embeddings-1.5b", "jina-code-embeddings-0.5b"}
def get_task_type(model: str, role: str) -> str:
if model in CODE_EMBEDDING_MODELS:
return f"nl2code.{role}" # role = "query" or "passage"
return f"retrieval.{role}"Steps to Reproduce
Config (ov.conf)
{
"embedding": {
"dense": {
"provider": "jina",
"model": "jina-code-embeddings-1.5b",
"dimension": 1024
}
}
}Error Message
openviking: agent memories search failed: Error: OpenViking request failed [INTERNAL]:
Jina API error: Error code: 422 - {
'detail': "Validation error: 'body -> jina-code-embeddings-1.5b -> task'
Input should be 'nl2code.query', 'nl2code.passage', 'qa.query', 'qa.passage',
'code2code.query', 'code2code.passage', 'code2nl.query', 'code2nl.passage',
'code2completion.query' or 'code2completion.passage'.",
...
'errors': [{
'field': 'body -> jina-code-embeddings-1.5b -> task',
'message': "Input should be 'nl2code.query' ...",
'type': 'literal_error',
'input': 'retrieval.query'
}]
}
Root Cause
The Jina embedder implementation in jina_embedders.py appears to hardcode a generic task type (retrieval.query / retrieval.passage) for all Jina models. However, jina-code-embeddings-1.5b is a code-specialized model that uses a different task namespace:
| Use Case | Query Task | Passage Task |
|---|---|---|
| Natural language → Code | nl2code.query |
nl2code.passage |
| Code → Code | code2code.query |
code2code.passage |
| Technical Q&A | qa.query |
qa.passage |
| Code → Comment | code2nl.query |
code2nl.passage |
| Code completion | code2completion.query |
code2completion.passage |
Expected Behavior
OpenViking should either:
- Auto-detect the correct task type based on the model name (e.g. when model is
jina-code-embeddings-*, usenl2code.query/nl2code.passageas defaults) - Expose a config field (e.g.
task_type) so users can explicitly set the task prefix inov.conf
Suggested Fix
In embedding_config.py or jina_embedders.py, add model-aware task mapping:
CODE_EMBEDDING_MODELS = {"jina-code-embeddings-1.5b", "jina-code-embeddings-0.5b"}
def get_task_type(model: str, role: str) -> str:
if model in CODE_EMBEDDING_MODELS:
return f"nl2code.{role}" # role = "query" or "passage"
return f"retrieval.{role}"Workaround
Switch to jina-embeddings-v3 which supports the generic retrieval.query task type. However, this sacrifices the significantly better code retrieval quality of jina-code-embeddings-1.5b.
Happy to submit a PR if the fix direction is confirmed. Thanks!
Actual Behavior
Error Message
openviking: agent memories search failed: Error: OpenViking request failed [INTERNAL]:
Jina API error: Error code: 422 - {
'detail': "Validation error: 'body -> jina-code-embeddings-1.5b -> task'
Input should be 'nl2code.query', 'nl2code.passage', 'qa.query', 'qa.passage',
'code2code.query', 'code2code.passage', 'code2nl.query', 'code2nl.passage',
'code2completion.query' or 'code2completion.passage'.",
...
'errors': [{
'field': 'body -> jina-code-embeddings-1.5b -> task',
'message': "Input should be 'nl2code.query' ...",
'type': 'literal_error',
'input': 'retrieval.query'
}]
}
Root Cause
The Jina embedder implementation in jina_embedders.py appears to hardcode a generic task type (retrieval.query / retrieval.passage) for all Jina models. However, jina-code-embeddings-1.5b is a code-specialized model that uses a different task namespace:
| Use Case | Query Task | Passage Task |
|---|---|---|
| Natural language → Code | nl2code.query |
nl2code.passage |
| Code → Code | code2code.query |
code2code.passage |
| Technical Q&A | qa.query |
qa.passage |
| Code → Comment | code2nl.query |
code2nl.passage |
| Code completion | code2completion.query |
code2completion.passage |
Workaround
Switch to jina-embeddings-v3 which supports the generic retrieval.query task type. However, this sacrifices the significantly better code retrieval quality of jina-code-embeddings-1.5b.
Happy to submit a PR if the fix direction is confirmed. Thanks!
Minimal Reproducible Example
Error Logs
### Error Message
openviking: agent memories search failed: Error: OpenViking request failed [INTERNAL]:
Jina API error: Error code: 422 - {
'detail': "Validation error: 'body -> jina-code-embeddings-1.5b -> task'
Input should be 'nl2code.query', 'nl2code.passage', 'qa.query', 'qa.passage',
'code2code.query', 'code2code.passage', 'code2nl.query', 'code2nl.passage',
'code2completion.query' or 'code2completion.passage'.",
...
'errors': [{
'field': 'body -> jina-code-embeddings-1.5b -> task',
'message': "Input should be 'nl2code.query' ...",
'type': 'literal_error',
'input': 'retrieval.query'
}]
}OpenViking Version
openviking 0.2.6
Python Version
Python 3.12.11
Operating System
macOS
Model Backend
Other
Additional Context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status