Skip to content

[Bug]: jina-code-embeddings-1.5b fails with retrieval.query task — incompatible task type passed to Jina API #912

@SparkleBo

Description

@SparkleBo

Bug Description

Summary

When configuring OpenViking with provider: jina and model: jina-code-embeddings-1.5b, the server fails at startup with a 422 Validation Error from the Jina API. The root cause is that OpenViking internally passes task: "retrieval.query" to the Jina embeddings endpoint, but jina-code-embeddings-1.5b is a code-specialized model that only accepts code-specific task types.

Environment

  • OpenViking version: latest (pip)
  • OS: macOS (Apple M4, arm64)
  • Python: 3.12.11 (pyenv)
  • Jina model: jina-code-embeddings-1.5b

Expected Behavior

OpenViking should either:

  1. Auto-detect the correct task type based on the model name (e.g. when model is jina-code-embeddings-*, use nl2code.query / nl2code.passage as defaults)
  2. Expose a config field (e.g. task_type) so users can explicitly set the task prefix in ov.conf

Suggested Fix

In embedding_config.py or jina_embedders.py, add model-aware task mapping:

CODE_EMBEDDING_MODELS = {"jina-code-embeddings-1.5b", "jina-code-embeddings-0.5b"}

def get_task_type(model: str, role: str) -> str:
    if model in CODE_EMBEDDING_MODELS:
        return f"nl2code.{role}"  # role = "query" or "passage"
    return f"retrieval.{role}"

Steps to Reproduce

Config (ov.conf)

{
  "embedding": {
    "dense": {
      "provider": "jina",
      "model": "jina-code-embeddings-1.5b",
      "dimension": 1024
    }
  }
}

Error Message

openviking: agent memories search failed: Error: OpenViking request failed [INTERNAL]:
Jina API error: Error code: 422 - {
  'detail': "Validation error: 'body -> jina-code-embeddings-1.5b -> task'
  Input should be 'nl2code.query', 'nl2code.passage', 'qa.query', 'qa.passage',
  'code2code.query', 'code2code.passage', 'code2nl.query', 'code2nl.passage',
  'code2completion.query' or 'code2completion.passage'.",
  ...
  'errors': [{
    'field': 'body -> jina-code-embeddings-1.5b -> task',
    'message': "Input should be 'nl2code.query' ...",
    'type': 'literal_error',
    'input': 'retrieval.query'
  }]
}

Root Cause

The Jina embedder implementation in jina_embedders.py appears to hardcode a generic task type (retrieval.query / retrieval.passage) for all Jina models. However, jina-code-embeddings-1.5b is a code-specialized model that uses a different task namespace:

Use Case Query Task Passage Task
Natural language → Code nl2code.query nl2code.passage
Code → Code code2code.query code2code.passage
Technical Q&A qa.query qa.passage
Code → Comment code2nl.query code2nl.passage
Code completion code2completion.query code2completion.passage

Expected Behavior

OpenViking should either:

  1. Auto-detect the correct task type based on the model name (e.g. when model is jina-code-embeddings-*, use nl2code.query / nl2code.passage as defaults)
  2. Expose a config field (e.g. task_type) so users can explicitly set the task prefix in ov.conf

Suggested Fix

In embedding_config.py or jina_embedders.py, add model-aware task mapping:

CODE_EMBEDDING_MODELS = {"jina-code-embeddings-1.5b", "jina-code-embeddings-0.5b"}

def get_task_type(model: str, role: str) -> str:
    if model in CODE_EMBEDDING_MODELS:
        return f"nl2code.{role}"  # role = "query" or "passage"
    return f"retrieval.{role}"

Workaround

Switch to jina-embeddings-v3 which supports the generic retrieval.query task type. However, this sacrifices the significantly better code retrieval quality of jina-code-embeddings-1.5b.


Happy to submit a PR if the fix direction is confirmed. Thanks!

Actual Behavior

Error Message

openviking: agent memories search failed: Error: OpenViking request failed [INTERNAL]:
Jina API error: Error code: 422 - {
  'detail': "Validation error: 'body -> jina-code-embeddings-1.5b -> task'
  Input should be 'nl2code.query', 'nl2code.passage', 'qa.query', 'qa.passage',
  'code2code.query', 'code2code.passage', 'code2nl.query', 'code2nl.passage',
  'code2completion.query' or 'code2completion.passage'.",
  ...
  'errors': [{
    'field': 'body -> jina-code-embeddings-1.5b -> task',
    'message': "Input should be 'nl2code.query' ...",
    'type': 'literal_error',
    'input': 'retrieval.query'
  }]
}

Root Cause

The Jina embedder implementation in jina_embedders.py appears to hardcode a generic task type (retrieval.query / retrieval.passage) for all Jina models. However, jina-code-embeddings-1.5b is a code-specialized model that uses a different task namespace:

Use Case Query Task Passage Task
Natural language → Code nl2code.query nl2code.passage
Code → Code code2code.query code2code.passage
Technical Q&A qa.query qa.passage
Code → Comment code2nl.query code2nl.passage
Code completion code2completion.query code2completion.passage

Workaround

Switch to jina-embeddings-v3 which supports the generic retrieval.query task type. However, this sacrifices the significantly better code retrieval quality of jina-code-embeddings-1.5b.


Happy to submit a PR if the fix direction is confirmed. Thanks!

Minimal Reproducible Example

Error Logs

### Error Message


openviking: agent memories search failed: Error: OpenViking request failed [INTERNAL]:
Jina API error: Error code: 422 - {
  'detail': "Validation error: 'body -> jina-code-embeddings-1.5b -> task'
  Input should be 'nl2code.query', 'nl2code.passage', 'qa.query', 'qa.passage',
  'code2code.query', 'code2code.passage', 'code2nl.query', 'code2nl.passage',
  'code2completion.query' or 'code2completion.passage'.",
  ...
  'errors': [{
    'field': 'body -> jina-code-embeddings-1.5b -> task',
    'message': "Input should be 'nl2code.query' ...",
    'type': 'literal_error',
    'input': 'retrieval.query'
  }]
}

OpenViking Version

openviking 0.2.6

Python Version

Python 3.12.11

Operating System

macOS

Model Backend

Other

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions