Skip to content

feat(memory): add gemini-embedding-2-preview as supported embedding model #42487

@BillChirico

Description

@BillChirico

Summary

Add support for gemini-embedding-2-preview as an option alongside the existing gemini-embedding-001 default. Users should be able to opt in via config.

Background

Google released gemini-embedding-2-preview (docs) with significantly better specs than gemini-embedding-001:

gemini-embedding-001 gemini-embedding-2-preview
Modalities Text only Text, image, video, audio, PDF
Dimensions 768 3072 (configurable: 768, 1536, 3072)
Input tokens 2048 8192
Task types ✅ (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
Matryoshka ✅ (truncatable dimensions)

The multimodal support is the headline feature — the same embedding space covers text, images, video frames, audio clips, and PDFs, enabling cross-modal retrieval (e.g. search text, get an image back and vice versa). Higher input token limit (8192 vs 2048) means fewer chunk splits for long memory entries.

Required Changes

1. src/memory/embedding-model-limits.ts

Add the new model's token limit:

"gemini:gemini-embedding-2-preview": 8192,

2. src/memory/embeddings-gemini.ts

  • Support the outputDimensionality parameter (3072 default; configurable)
  • Support the taskType field — pass RETRIEVAL_DOCUMENT for storage, RETRIEVAL_QUERY for search
  • Support multimodal parts in the request body: inlineData (base64 image/audio/video) and fileData (URI for PDFs, videos via File API)
  • Both fields are absent for older models (backward compatible)
  • Both single (embedContent) and batch (batchEmbedContents) endpoints support multimodal parts

Text request shape (backward compat):

{
  "model": "models/gemini-embedding-2-preview",
  "content": { "parts": [{ "text": "..." }] },
  "taskType": "RETRIEVAL_DOCUMENT",
  "outputDimensionality": 3072
}

Multimodal request shape (image example):

{
  "model": "models/gemini-embedding-2-preview",
  "content": {
    "parts": [
      { "inlineData": { "mimeType": "image/png", "data": "<base64>" } },
      { "text": "optional caption or context" }
    ]
  },
  "taskType": "RETRIEVAL_DOCUMENT",
  "outputDimensionality": 3072
}

Supported MIME types: image/png, image/jpeg, image/webp, image/gif, video/mp4, audio/mp3, audio/wav, application/pdf, and others supported by the Gemini API.

3. src/memory/embeddings-model-normalize.ts

Ensure gemini-embedding-2-preview passes through normalization without being mangled.

4. New: src/memory/embeddings-gemini-multimodal.ts (or extend existing)

Helper to build a multimodal content object from a file path or URL, detecting MIME type and encoding inline vs. File API upload based on size. This keeps embeddings-gemini.ts clean.

5. Config / docs

  • Update the memorySearch.model config reference to list gemini-embedding-2-preview as a valid option
  • Document supported input types beyond text
  • Add a note that switching models requires re-indexing existing memory (vector dimensions change: 768 → 3072)
  • DEFAULT_GEMINI_EMBEDDING_MODEL stays gemini-embedding-001 — existing configs unaffected

Acceptance Criteria

  • model: "gemini-embedding-2-preview" in config produces valid text embeddings
  • Token limit resolved as 8192 for the new model
  • outputDimensionality defaults to 3072; configurable via optional memorySearch.dimensionality field
  • taskType passed appropriately for write vs. search operations
  • Multimodal parts (image, audio, video, PDF) can be embedded when using the new model
  • Existing gemini-embedding-001 behavior unchanged (text-only, no extra fields)
  • Unit tests cover limit resolution, text request shape, and multimodal part construction
  • Docs updated with model option, multimodal support, and re-index warning

Notes

  • gemini-embedding-2-preview is in preview — model name may change on GA. Consider aliasing once stable.
  • Batch endpoint (batchEmbedContents) also accepts outputDimensionality at the top level — ensure both paths pass it.
  • For large files (video, long audio), use the Gemini File API upload path rather than inline base64.
  • Cross-modal retrieval (embed image, search with text) works out of the box since both live in the same vector space.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions