-
-
Notifications
You must be signed in to change notification settings - Fork 69.3k
Feature: Add Exa as a native web_search provider #20134
Description
Problem
The web_search tool currently supports three providers: Brave, Perplexity, and Grok. Each has tradeoffs:
- Perplexity — great for synthesized answers, poor date filtering (returns stale results even with
freshness: pd) - Brave — good date range filtering, raw search results
- Grok — X/Twitter integration, newer
Missing: Exa (exa.ai) — a neural/semantic search API that excels at precise date filtering and content retrieval. It's particularly valuable for:
- Reliable freshness —
startPublishedDate/endPublishedDatewith ISO-8601 timestamps actually work (unlike Perplexity's fuzzy freshness) - Neural search — finds semantically relevant content, not just keyword matches
- Built-in content extraction — returns article text in the same API call (no separate fetch needed)
- No hallucinated dates — returns real publication dates from the source
- Auto-type detection — neural vs keyword search selected automatically
Real-world use case
We run a daily cron that searches for GPU/AI infrastructure news. With Perplexity, it consistently returns articles that are weeks or months old despite freshness: pd. Switching to Exa via exec/curl immediately fixed this — every result was from the last 48 hours.
Currently working around this by having cron agents call Exa via curl in exec, which:
- Requires the agent to know the API format
- Burns tokens on shell command construction
- Loses the clean
web_searchinterface and its caching/rate-limiting
Proposed config
tools:
web:
search:
provider: "exa" # new option alongside brave/perplexity/grok
exa:
apiKey: "..." # or read from env/keychain
type: "auto" # auto | neural | keyword
numResults: 10
contents: # optional — request content extraction inline
text:
maxCharacters: 1000
highlights: trueAPI mapping
Exa's API maps cleanly to OpenClaw's existing web_search parameters:
| OpenClaw param | Exa equivalent |
|---|---|
query |
query |
count |
numResults |
freshness: pd |
startPublishedDate: <yesterday> / endPublishedDate: <tomorrow> |
freshness: pw |
startPublishedDate: <7 days ago> |
freshness: pm |
startPublishedDate: <30 days ago> |
freshness: py |
startPublishedDate: <365 days ago> |
freshness: YYYY-MM-DDtoYYYY-MM-DD |
startPublishedDate / endPublishedDate directly |
country |
includeDomains (approximate) |
Exa API reference
- Endpoint:
POST https://api.exa.ai/search - Auth:
x-api-keyheader - Docs: https://docs.exa.ai
- Pricing: 1,000 searches/month free, then $5/1000 searches. Content extraction: $1/1000 pages.
Implementation notes
- Response format is simple:
{ results: [{ url, title, publishedDate, text, highlights }] } - Content extraction is optional (via
contentsparameter) — if enabled, eliminates the need for a separateweb_fetchcall type: "auto"lets Exa decide between neural and keyword search per query- Category filtering available:
company,research paper,news,tweet,github,pdf, etc.
Impact
Agents that need reliable date-filtered search (daily news crons, market monitoring, research) would get significantly better results. The current workaround (Exa via curl in exec) works but is fragile and token-expensive.