[ 200 OK ][ HTML ][ JSON ][ CRAWL ]

>webclaw·The extraction engineThe web scraper
your AI agent
deserves

Name: webclaw
Author: Massi

Clean structured data for your agents. In milliseconds, not seconds.

118ms avg response90% fewer tokensDrop-in Firecrawl replacement

GITHUB

Global nodes online: 12

One-command setup · MCP + CLIAuto-detects your tools and configures everything.

Works with

Endpoints

Ten surfaces. One extraction engine.

Pick an endpoint to see what it does, how you'd call it, and where to dive into the reference docs.

/v1/scrape

Scrape

Single-page extraction

Fetch any URL and return clean markdown, JSON, HTML, or LLM-ready text. Chrome-grade TLS fingerprinting and automatic antibot escalation built-in.

OPEN DOCS →

webclaw ~ liveLIVE

pages extracted

bot walls bypassed

websites scraped

github stars

Every page.
Every defense.

Fast by default. Smart when needed.

118ms average for static pages. Firecrawl's published P95 is 3.4s. Multi-layer rendering pipeline for JS-heavy sites. The engine picks the fastest path automatically. You configure nothing.

Drop-in Firecrawl replacement.

Change your base URL. Keep your existing SDK code. The /v2 endpoints are fully compatible. Same API shape, same response format, no rewrite needed. Better extraction quality, faster response times.

Best-in-class bot protection.

Challenge pages, CAPTCHAs, browser fingerprinting, all handled transparently. No manual cookies, no config. Your requests just work, even on the hardest sites.

Every format, every extraction.

Markdown, JSON, plain text, LLM-optimized. Schema-based extraction, prompt-based extraction, summarization, brand identity, content diffing. 14 endpoints, one API key.

Built for AI agents.

MCP server with 12 tools for Claude, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP client. REST API for everything else. Web search, batch processing, crawling, sitemap discovery.

90% fewer tokens.

The LLM format runs a 9-step optimization pipeline. Strips nav, ads, boilerplate, repeated elements. Measured on 18 production sites, median page drops 95% in token count while preserving content. Your agent gets more, spends less.

Agentic scraping.

Give a goal, get structured data. The AI agent reasons about page content, clicks buttons, navigates, and extracts exactly what you asked for. Powered by the best available models.

Deep content recovery.

Embedded JSON, structured data, server-rendered payloads, extracted even when the visible DOM is empty. Auto-detects PDFs, DOCX, XLSX. Multiple fallback strategies. If the content exists, webclaw finds it.

FROM THE BLOG

Latest posts

VIEW ALL →

Jun 4, 2026

Apify Alternative for LLM Web Scraping and AI Agents

Compare Apify actors, the Apify marketplace, and Webclaw for any-URL markdown extraction, structured JSON, crawling, MCP access, and AI agent web tooling.

Jun 2, 2026

Bright Data Alternative for LLM Web Scraping

Compare Bright Data, Web Unlocker, and Webclaw for proxy infrastructure, markdown extraction, structured JSON, crawling, batching, and AI agent workflows.

May 28, 2026

Jina Reader Alternative for LLM Web Scraping

Compare Jina Reader, r.jina.ai, and Webclaw for URL to markdown, RAG input, crawling, batching, JavaScript rendering, anti-bot pages, and production extraction.

May 26, 2026

Crawl4AI vs Playwright for LLM Web Scraping

Compare Crawl4AI and Playwright for scraping dynamic sites, RAG input, markdown output, browser control, and production reliability.

VIEW ALL →

One credit.
One page.

One pool covers every endpoint. Heavier operations like protected site access or LLM extract use a few extra credits. Research has its own counter so deep runs cannot drain your budget.

STARTER

$19/mo

CREDITS················································································10,000/mo

RESEARCH················································································3 RUNS/mo

MAX SOURCES················································································10

CONCURRENCY················································································5

SUPPORT················································································EMAIL

7-day trial. Card required. Cancel before billing.

GROWTHPOPULAR

$49/mo

CREDITS················································································100,000/mo

RESEARCH················································································10 RUNS/mo

MAX SOURCES················································································20

CONCURRENCY················································································20

SUPPORT················································································PRIORITY

PRO

$99/mo

CREDITS················································································250,000/mo

RESEARCH················································································20 RUNS/mo

MAX SOURCES················································································30

CONCURRENCY················································································50

SUPPORT················································································PRIORITY

SCALE

$399/mo

CREDITS················································································1,000,000/mo

RESEARCH················································································60 RUNS/mo

MAX SOURCES················································································100

CONCURRENCY················································································100

SUPPORT················································································PRIORITY + SLACK

HOW CREDITS WORK

PLAIN PAGE················································································1 CREDIT

JS RENDER················································································+2 CREDITS

ANTIBOT SOLVE················································································+9 CREDITS

SEARCH / 10 RESULTS················································································2 CREDITS

SUMMARIZE················································································10 CREDITS

BRAND················································································5 CREDITS

DIFF················································································2 CREDITS

LLM EXTRACT················································································25 CREDITS

Research is metered separately as runs per month, with a per-tier cap on max sources so deep mode stays bounded.

DEDICATED

Unlimited pages. Unlimited research. 200 concurrent. Single-tenant on your cloud, your proxies, your rules. Dedicated Slack channel + SLA.

OPEN SOURCE

Self-host forever. AGPL-3.0 license. CLI + server + MCP server. No limits on your hardware.

VIEW ON GITHUB

Common questions

FAQ

Webclaw is a web extraction toolkit that turns any website into clean, structured data. Output formats include Markdown, JSON, HTML, plain text, and an LLM-optimized mode that strips noise and cuts token count by around 90% vs raw HTML.

Webclaw uses HTTP with TLS fingerprint impersonation instead of spinning up a headless browser. Sub-200ms response times, zero browser overhead, no Selenium or Playwright dependency. Content extraction runs via readability scoring plus a 9-step pipeline, no browser needed for most pages.

Yes. Starter comes with a 7-day free trial. Card required up front so we don't get drowned in throwaway signups, and you can cancel any time during the trial directly from the billing portal. No charge if you cancel before day 7. If you want to use Webclaw without paying ever, the open-source version (AGPL-3.0) runs locally with no limits on your hardware.

Yes. Webclaw is open source under AGPL-3.0. You can run the CLI, REST API server, or MCP server on your own infrastructure. Docker images and one-line deploy scripts are available.

Six formats: Markdown (clean readable text), JSON (structured with metadata), HTML (sanitized), plain text, LLM-optimized (stripped of noise for AI consumption), and raw HTML. The LLM format runs a 9-step optimization pipeline to minimize token usage.

Webclaw ships a Model Context Protocol server binary that exposes 12 tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research, vertical_scrape, and list_extractors. Works with any MCP client (Claude Desktop, Claude Code, Cursor, Windsurf, Codex, Antigravity) over stdio.

Your extracted content is never stored or logged on our servers. Requests are processed in real-time and the response is returned directly to you. If you use LLM features, content is sent to the AI provider for processing but is not retained. For full control, self-host the entire stack.

Webclaw can use language models to extract structured JSON from pages using a schema you define, answer questions about page content with prompt-based extraction, or generate summaries. It chains through local Ollama first, then falls back to cloud providers.

Ready to build?

Start extracting.

7-day Starter trial, card required. Cancel before billing. Deploy in under a minute, or self-host forever.

▌