Skip to content

Feature: Add Exa as a native web_search provider #20134

@jimmytherobot-ai

Description

@jimmytherobot-ai

Problem

The web_search tool currently supports three providers: Brave, Perplexity, and Grok. Each has tradeoffs:

  • Perplexity — great for synthesized answers, poor date filtering (returns stale results even with freshness: pd)
  • Brave — good date range filtering, raw search results
  • Grok — X/Twitter integration, newer

Missing: Exa (exa.ai) — a neural/semantic search API that excels at precise date filtering and content retrieval. It's particularly valuable for:

  1. Reliable freshnessstartPublishedDate / endPublishedDate with ISO-8601 timestamps actually work (unlike Perplexity's fuzzy freshness)
  2. Neural search — finds semantically relevant content, not just keyword matches
  3. Built-in content extraction — returns article text in the same API call (no separate fetch needed)
  4. No hallucinated dates — returns real publication dates from the source
  5. Auto-type detection — neural vs keyword search selected automatically

Real-world use case

We run a daily cron that searches for GPU/AI infrastructure news. With Perplexity, it consistently returns articles that are weeks or months old despite freshness: pd. Switching to Exa via exec/curl immediately fixed this — every result was from the last 48 hours.

Currently working around this by having cron agents call Exa via curl in exec, which:

  • Requires the agent to know the API format
  • Burns tokens on shell command construction
  • Loses the clean web_search interface and its caching/rate-limiting

Proposed config

tools:
  web:
    search:
      provider: "exa"       # new option alongside brave/perplexity/grok
      exa:
        apiKey: "..."        # or read from env/keychain
        type: "auto"         # auto | neural | keyword
        numResults: 10
        contents:            # optional — request content extraction inline
          text:
            maxCharacters: 1000
          highlights: true

API mapping

Exa's API maps cleanly to OpenClaw's existing web_search parameters:

OpenClaw param Exa equivalent
query query
count numResults
freshness: pd startPublishedDate: <yesterday> / endPublishedDate: <tomorrow>
freshness: pw startPublishedDate: <7 days ago>
freshness: pm startPublishedDate: <30 days ago>
freshness: py startPublishedDate: <365 days ago>
freshness: YYYY-MM-DDtoYYYY-MM-DD startPublishedDate / endPublishedDate directly
country includeDomains (approximate)

Exa API reference

Implementation notes

  • Response format is simple: { results: [{ url, title, publishedDate, text, highlights }] }
  • Content extraction is optional (via contents parameter) — if enabled, eliminates the need for a separate web_fetch call
  • type: "auto" lets Exa decide between neural and keyword search per query
  • Category filtering available: company, research paper, news, tweet, github, pdf, etc.

Impact

Agents that need reliable date-filtered search (daily news crons, market monitoring, research) would get significantly better results. The current workaround (Exa via curl in exec) works but is fragile and token-expensive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions