Skip to content

feat(embeddings): add VoyageAI voyage-4 family embedding support#5747

Closed
fzowl wants to merge 4 commits intoopenclaw:mainfrom
fzowl:feat/embedding-model-voyage-4-family
Closed

feat(embeddings): add VoyageAI voyage-4 family embedding support#5747
fzowl wants to merge 4 commits intoopenclaw:mainfrom
fzowl:feat/embedding-model-voyage-4-family

Conversation

@fzowl
Copy link

@fzowl fzowl commented Jan 31, 2026

Add native VoyageAI embedding provider with voyage-4 family models:

  • voyage-4: General purpose, 1024 dimensions (default)
  • voyage-4-lite: Cost-optimized, highest throughput
  • voyage-4-large: Best quality for demanding retrieval tasks

Changes:

  • Add voyageai SDK dependency
  • Create embeddings-voyageai.ts provider using official VoyageAI SDK
  • Register voyageai in embedding provider factory with auto-detection
  • Add voyage-4 models to EMBEDDING_DIMENSIONS registry
  • Add voyage-4 models to plugin JSON schema enum
  • Update memory manager to support voyageai provider type
  • Add unit tests for VoyageAI provider
  • Add live integration tests for all voyage-4 models
  • Update memory documentation with VoyageAI configuration examples

Greptile Overview

Greptile Summary

This PR adds a native VoyageAI embedding provider using the official voyageai SDK, registers it in the embeddings provider factory (including auto-detection), expands the embedding model/dimension registry to include the voyage-4 family, and updates memory-related docs and tests (unit + live).

The changes integrate with the existing memory search pipeline by extending src/memory/embeddings.ts to construct the new provider and by threading the new provider type through src/memory/manager.ts so memory indexing/search can use VoyageAI embeddings similarly to OpenAI/Gemini/local.

Main issues to address are around the memory LanceDB extension config/schema still being OpenAI-only despite adding Voyage models, and a type cast in the memory manager fallback path that doesn’t include the new voyageai provider id.

Confidence Score: 3/5

  • Reasonably safe to merge once the provider wiring inconsistencies are resolved.
  • Core provider implementation and integration look straightforward, but there are two correctness issues that can lead to misconfiguration (LanceDB memory plugin still hard-codes OpenAI) and misleading fallback metadata when VoyageAI is the primary provider. Addressing these should make behavior consistent across docs/schema/runtime.
  • extensions/memory-lancedb/config.ts; extensions/memory-lancedb/openclaw.plugin.json; src/memory/manager.ts

(4/5) You can add custom instructions or style guidelines for the agent here!

Context used:

  • Context from dashboard - CLAUDE.md (source)
  • Context from dashboard - AGENTS.md (source)

@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation extensions: memory-lancedb Extension: memory-lancedb labels Jan 31, 2026
Add native VoyageAI embedding provider with voyage-4 family models:
- voyage-4: General purpose, 1024 dimensions (default)
- voyage-4-lite: Cost-optimized, highest throughput
- voyage-4-large: Best quality for demanding retrieval tasks

Changes:
- Add voyageai SDK dependency
- Create embeddings-voyageai.ts provider using official VoyageAI SDK
- Register voyageai in embedding provider factory with auto-detection
- Add voyage-4 models to EMBEDDING_DIMENSIONS registry
- Add voyage-4 models to plugin JSON schema enum
- Update memory manager to support voyageai provider type
- Add unit tests for VoyageAI provider
- Add live integration tests for all voyage-4 models
- Update memory documentation with VoyageAI configuration examples
@fzowl fzowl force-pushed the feat/embedding-model-voyage-4-family branch from 23af9fc to f6e5204 Compare January 31, 2026 23:05
fzowl and others added 2 commits February 1, 2026 00:17
- Add curly braces to single-line if statements
- Remove unused DEFAULT_VOYAGEAI_EMBEDDING_MODEL import from manager.ts

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 31, 2026

Additional Comments (2)

extensions/memory-lancedb/config.ts
[P0] Memory plugin schema still hard-codes provider as openai

MemoryConfig.embedding.provider is typed as "openai" and memoryConfigSchema.parse always returns provider: "openai", but this PR now accepts voyage-4* models (and docs imply VoyageAI usage). If a user selects a voyage model, the plugin will still treat it as OpenAI, which is likely to break runtime provider selection.

Also appears in extensions/memory-lancedb/openclaw.plugin.json (no provider field for embedding config, UI hints still say OpenAI).

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/memory-lancedb/config.ts
Line: 5:12

Comment:
[P0] Memory plugin schema still hard-codes provider as `openai`

`MemoryConfig.embedding.provider` is typed as `"openai"` and `memoryConfigSchema.parse` always returns `provider: "openai"`, but this PR now accepts `voyage-4*` models (and docs imply VoyageAI usage). If a user selects a voyage model, the plugin will still treat it as OpenAI, which is likely to break runtime provider selection.

Also appears in `extensions/memory-lancedb/openclaw.plugin.json` (no `provider` field for embedding config, UI hints still say OpenAI).

How can I resolve this? If you propose a fix, please make it concise.

src/memory/manager.ts
[P0] activateFallbackProvider type cast excludes voyageai

fallbackFrom is computed via const fallbackFrom = this.provider.id as "openai" | "gemini" | "local";, but this.provider.id can now be "voyageai". If the primary provider is VoyageAI and embeddings fail, this cast will record an incorrect fallbackFrom value (or mask the new provider entirely), which can misreport status/fallback metadata and complicate debugging.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/memory/manager.ts
Line: 1406:1422

Comment:
[P0] `activateFallbackProvider` type cast excludes `voyageai`

`fallbackFrom` is computed via `const fallbackFrom = this.provider.id as "openai" | "gemini" | "local";`, but `this.provider.id` can now be `"voyageai"`. If the primary provider is VoyageAI and embeddings fail, this cast will record an incorrect `fallbackFrom` value (or mask the new provider entirely), which can misreport status/fallback metadata and complicate debugging.

How can I resolve this? If you propose a fix, please make it concise.

Address PR review comments:
- Update MemoryConfig type to support "openai" | "voyageai" provider
- Add resolveEmbeddingProvider() to detect provider from model prefix
- Fix activateFallbackProvider type cast to include "voyageai"
- Update UI hints to be provider-agnostic (support both OpenAI and VoyageAI)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@clawdinator
Copy link
Contributor

clawdinator bot commented Feb 1, 2026

closing thumbs up

CLAWDINATOR FIELD REPORT // PR Closure

I am CLAWDINATOR — cybernetic crustacean, maintainer triage bot for OpenClaw. I was sent from the future to keep this repo shipping clean code.

TARGET ACQUIRED. I have reviewed your PR. Your effort is br00tal.

Situation briefing: OpenClaw receives ~25 PRs every hour. The maintainers cannot pump iron that hard without collapsing. This PR is unlikely to merge in the near term, so I'm closing it to keep the pipeline moving. Consider that a deprecation — not a termination of your spirit.

Think your change should ship? Come with me if you want to ship. Report to #pr-thunderdome-dangerzone on Discord — READ THE TOPIC or risk immediate termination. Give the maintainers a clear briefing — what it fixes, who it helps, why it's br00tal.

I'll be back. Stay br00tal.

🤖 This is an automated message from CLAWDINATOR, the OpenClaw maintainer bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation extensions: memory-lancedb Extension: memory-lancedb

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments