Skip to content

feat(web-search): set Exa as default provider via auto-detection (EXA_API_KEY activates Exa)#35062

Open
louiswalsh wants to merge 1 commit intoopenclaw:mainfrom
exa-labs:feat/exa-web-search-default
Open

feat(web-search): set Exa as default provider via auto-detection (EXA_API_KEY activates Exa)#35062
louiswalsh wants to merge 1 commit intoopenclaw:mainfrom
exa-labs:feat/exa-web-search-default

Conversation

@louiswalsh
Copy link
Copy Markdown

@louiswalsh louiswalsh commented Mar 4, 2026

Summary

  • Problem: OpenClaw has no native Exa search integration — users who want Exa's semantic search must use external workarounds.
  • Why it matters: Exa provides high-quality semantic search with token-efficient highlights, useful for AI assistant workflows.
  • What changed: Added Exa as a 6th web_search provider with auto-detection — setting EXA_API_KEY is sufficient to activate Exa (no explicit provider config required). Exa is first in the auto-detect priority chain; when the key is absent, the chain falls through to Brave.
  • What did NOT change (scope boundary): No changes to existing providers, no changes to web_fetch, no breaking changes to existing configs.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? Yes — new EXA_API_KEY env var / exa.apiKey config, follows same pattern as existing providers
  • New/changed network calls? Yes — new outbound call to https://api.exa.ai/search with x-exa-integration: "openclaw" header
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: API key is marked sensitive in Zod schema (same as all other provider keys). Network call fires automatically if EXA_API_KEY is present. Uses existing withTrustedWebSearchEndpoint wrapper with timeout and abort support.

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Node.js (vitest)
  • Model/provider: N/A (unit tests only)
  • Integration/channel (if any): N/A
  • Relevant config (redacted): { tools: { web: { search: { exa: { apiKey: "exa-..." } } } } } (no explicit provider needed)

Steps

  1. Set EXA_API_KEY in environment (no other config change required)
  2. Invoke web_search tool with a query
  3. Observe Exa auto-selected in verbose logs

Expected

  • Returns structured results with title, URL, highlights, published date, author
  • Results capped at numResults (default 5) with highlights limited to highlightsMaxChars (default 8000)

Actual

  • Matches expected (verified via unit tests)

Evidence

  • Failing test/log before + passing after

20 tests passing in config.web-search-provider.test.ts, 47 in web-search.test.ts.

Human Verification (required)

  • Verified scenarios: Auto-detection with EXA_API_KEY set, Exa priority over Brave when both keys present, fallback to Brave when EXA_API_KEY absent
  • Edge cases checked: Multiple keys present (Exa wins), no keys (falls back to Brave/error), explicit provider override still respected
  • What you did not verify: Live API call to Exa (unit tests mock the network layer)

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? Yes — new optional EXA_API_KEY env var and tools.web.search.exa config block
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Unset EXA_API_KEY or set provider: "brave" in config — falls back to Brave
  • Files/config to restore: N/A (additive change, no existing behavior modified)
  • Known bad symptoms reviewers should watch for: If EXA_API_KEY is invalid, the tool returns an API error message (same pattern as other providers)

Risks and Mitigations

  • Risk: Exa API availability/rate limits
    • Mitigation: If EXA_API_KEY is unset, there is zero impact on existing users
  • Risk: Unintended Exa activation for users who set EXA_API_KEY for other purposes
    • Mitigation: Key is namespaced (EXA_API_KEY) and only read by the web_search tool resolver

@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation agents Agent runtime and tooling size: M labels Mar 4, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c0475f7ac

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 35 to 36
suppressToolErrorWarnings: z.boolean().optional(),
lightContext: z.boolean().optional(),
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore heartbeat.lightContext in runtime config schema

HeartbeatSchema is strict, so removing lightContext here turns heartbeat.lightContext into an unknown key and causes config validation to fail for existing installations that already use this flag. The runtime still reads this option (for example, heartbeat execution checks heartbeat?.lightContext to choose lightweight bootstrap context), so this change introduces a backwards-incompatible break where valid prior configs are now rejected at load time.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR adds Exa as the 6th web_search provider and places it first in the auto-detection chain (activated by EXA_API_KEY). The Exa integration is well-structured — resolver helpers are properly guarded, the cache key correctly encodes Exa-specific parameters, and the HTTP integration uses the right authentication header. Tests cover auto-detection priority, config resolution edge cases, and invalid-input fallbacks.

Critical issue:

lightContext: z.boolean().optional() was removed from HeartbeatSchema in src/config/zod-schema.agent-runtime.ts. Since the schema uses .strict(), any user with heartbeat.lightContext: true in their config file will receive a Zod validation error at startup. The runtime code in heartbeat-runner.ts still reads heartbeat?.lightContext to enable lightweight context mode, so this is a functional regression that will break existing user configurations.

Minor issue:

The Zod schema for exa.numResults accepts any positive integer, but resolveSearchCount silently clamps the value to MAX_SEARCH_COUNT (10). A user setting exa.numResults: 25 will pass config validation but only receive 10 results with no warning or error. Adding .max(10) to the schema or documenting the 1–10 cap would make this behavior transparent.

Confidence Score: 2/5

  • Not safe to merge — a breaking change (removal of lightContext from HeartbeatSchema) will cause config validation errors for existing users.
  • The Exa integration itself is well-implemented and tested. However, the accidental removal of lightContext: z.boolean().optional() from the strict HeartbeatSchema is a critical breaking change that will cause validation failures for any user with heartbeat.lightContext: true set in their config. This must be reverted before merging. The minor issue with silent clamping of exa.numResults does not block merge but should be addressed for clarity.
  • src/config/zod-schema.agent-runtime.ts — must restore the lightContext field to HeartbeatSchema and optionally add .max(10) to the exa.numResults schema or document the cap in types.tools.ts.

Last reviewed commit: 3c0475f

Comment on lines 33 to 36
prompt: z.string().optional(),
ackMaxChars: z.number().int().nonnegative().optional(),
suppressToolErrorWarnings: z.boolean().optional(),
lightContext: z.boolean().optional(),
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lightContext: z.boolean().optional() was removed from HeartbeatSchema, but the field is still in use throughout the codebase:

  • src/infra/heartbeat-runner.ts:747 reads heartbeat?.lightContext === true
  • src/config/types.agent-defaults.ts:248 still declares lightContext?: boolean
  • Tests still verify this behavior

Because HeartbeatSchema uses .strict(), any user with heartbeat.lightContext: true in their config file will now receive a Zod validation error at startup. This is a breaking change.

Suggested change
prompt: z.string().optional(),
ackMaxChars: z.number().int().nonnegative().optional(),
suppressToolErrorWarnings: z.boolean().optional(),
lightContext: z.boolean().optional(),
})
ackMaxChars: z.number().int().nonnegative().optional(),
suppressToolErrorWarnings: z.boolean().optional(),
lightContext: z.boolean().optional(),
})
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/zod-schema.agent-runtime.ts
Line: 33-36

Comment:
`lightContext: z.boolean().optional()` was removed from `HeartbeatSchema`, but the field is still in use throughout the codebase:

- `src/infra/heartbeat-runner.ts:747` reads `heartbeat?.lightContext === true`
- `src/config/types.agent-defaults.ts:248` still declares `lightContext?: boolean`
- Tests still verify this behavior

Because `HeartbeatSchema` uses `.strict()`, any user with `heartbeat.lightContext: true` in their config file will now receive a Zod validation error at startup. This is a breaking change.

```suggestion
    ackMaxChars: z.number().int().nonnegative().optional(),
    suppressToolErrorWarnings: z.boolean().optional(),
    lightContext: z.boolean().optional(),
  })
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +309 to +316
exa: z
.object({
apiKey: SecretInputSchema.optional().register(sensitive),
numResults: z.number().int().positive().optional(),
highlightsMaxChars: z.number().int().positive().optional(),
})
.strict()
.optional(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Zod schema for exa.numResults accepts any positive integer, but at runtime resolveSearchCount() (called in createWebSearchTool) silently clamps the value to MAX_SEARCH_COUNT (10). A user who sets exa.numResults: 25 will pass config validation but only receive 10 results — with no warning.

Consider adding .max(10) to the schema to catch this at config-parse time, or add a comment in types.tools.ts documenting the 1–10 cap so users understand the constraint.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/zod-schema.agent-runtime.ts
Line: 309-316

Comment:
The Zod schema for `exa.numResults` accepts any positive integer, but at runtime `resolveSearchCount()` (called in `createWebSearchTool`) silently clamps the value to `MAX_SEARCH_COUNT` (10). A user who sets `exa.numResults: 25` will pass config validation but only receive 10 results — with no warning.

Consider adding `.max(10)` to the schema to catch this at config-parse time, or add a comment in `types.tools.ts` documenting the 1–10 cap so users understand the constraint.

How can I resolve this? If you propose a fix, please make it concise.

@louiswalsh louiswalsh force-pushed the feat/exa-web-search-default branch from 3c0475f to 3081754 Compare March 4, 2026 23:44
@louiswalsh louiswalsh changed the title feat(web-search): add Exa as 6th web_search provider with auto-detection (EXA_API_KEY) feat(web-search): set Exa as default provider via auto-detection (EXA_API_KEY activates Exa) Mar 4, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30817541c5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

url, // Keep raw for tool chaining
description: description ? wrapWebContent(description, "web_search") : "",
published: entry.publishedDate || undefined,
author: entry.author || undefined,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Wrap Exa author metadata as untrusted external content

runExaSearch marks the payload as externalContent.wrapped: true, but author is returned raw while title/description are wrapped. Exa author values come from web page metadata and can contain arbitrary text, so attacker-controlled pages can inject unwrapped instructions into web_search output despite the wrapper contract. This weakens the prompt-injection boundary for any workflow that consumes Exa search results.

Useful? React with 👍 / 👎.

Comment on lines +667 to +669
function resolveExaApiKey(exa?: ExaSearchConfig): string | undefined {
const fromConfig = normalizeApiKey(exa?.apiKey);
if (fromConfig) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve SecretInput refs when reading Exa API key

tools.web.search.exa.apiKey is validated as SecretInputSchema, but resolveExaApiKey only normalizes plain strings. If a user configures this field as a SecretRef object (which validation allows), it is treated as empty and the code falls back to env vars, producing a false missing_exa_api_key error when EXA_API_KEY is unset. This is a runtime mismatch between accepted config shape and key resolution.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5075f15bca

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

: undefined;
const paths = [
"apiKey",
"exa.apiKey",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include exa in selected-provider gating for web search secrets

Adding exa.apiKey to the resolved-path list without also recognizing provider: "exa" in selectedProvider causes explicit Exa configs to be treated like auto-detect mode (selectedProvider === undefined). In that state, secret refs for other providers (for example grok.apiKey) are incorrectly marked active and resolved; if those refs are unresolved/missing, prepareSecretsRuntimeSnapshot will throw and block startup even though those providers are inactive for this config.

Useful? React with 👍 / 👎.

Adds Exa (exa.ai) as a bundled web search provider using the plugin
architecture. Uses highlights-based content retrieval matching the
Hermes agent call signature, with x-exa-integration: openclaw header.

- New extensions/exa/ directory with plugin config, entry point, and
  search provider implementation
- autoDetectOrder: 5 (first in auto-detect chain when EXA_API_KEY set)
- Supports date_after/date_before filtering via ISO-8601 date ranges
- Updates bundled plugin IDs, contract registry, and all related tests

Co-Authored-By: Louis Walsh <[email protected]>
@devin-ai-integration devin-ai-integration bot force-pushed the feat/exa-web-search-default branch from 6e8e5b0 to 8937b55 Compare March 19, 2026 06:09
@openclaw-barnacle openclaw-barnacle bot added size: XL and removed docs Improvements or additions to documentation agents Agent runtime and tooling size: M labels Mar 19, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8937b55240

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +8 to +9
register(api) {
api.registerWebSearchProvider(createExaWebSearchProvider());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Wire Exa into the live web_search tool path

Registering the Exa plugin here does not make it reachable in production yet. src/agents/openclaw-tools.ts:87-90 still exposes createWebSearchTool(), and that implementation only resolves brave|perplexity|grok|gemini|kimi (src/agents/tools/web-search.ts:261-268 and 1337-1376). In other words, an agent session will keep using the old hard-coded provider stack even when EXA_API_KEY is set, so the feature added by this commit never activates.

Useful? React with 👍 / 👎.

Comment on lines +249 to +252
id: "exa",
label: "Exa Search",
hint: "AI-native semantic search · date filters · highlights",
envVars: ["EXA_API_KEY"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add exa to the runtime web-search config schema

This introduces a new provider id, but the validated config surface still rejects tools.web.search.provider: "exa". src/config/zod-schema.agent-runtime.ts:261-269 and src/config/types.tools.ts:433-475 only allow the pre-existing five providers, so users cannot explicitly pin Exa in config or have config-writing flows persist that selection. The only way to use Exa is to leave provider unset and rely on auto-detection, which is a regression for anyone who already has another search provider configured.

Useful? React with 👍 / 👎.

@vincentkoc
Copy link
Copy Markdown
Member

Good call raising this follow-up.

This is related, but I am not treating it as the same change. #52617 landed the Exa bundled provider itself. This PR is about a different policy question: whether Exa should become the default auto-detect choice when EXA_API_KEY is present.

I'm leaving this separate for now because that selection policy diverges from the landed scope. If you want this folded differently, tell me the intended auto-detect rule and I can reassess it quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants