The web crawling and scraping
API built for speed
One endpoint to crawl entire websites and extract structured data. Spider handles rendering, proxies, and anti-bot detection so you can focus on what you do with the data.
How web crawling works
Three steps from URL to structured data. No infrastructure to manage, no browsers to maintain.
Submit your URL
Send a target URL to the Spider API. Configure crawl depth, page limits, output format, and rendering mode in a single request.
Spider crawls the site
Spider discovers every page by following links, handles JavaScript rendering, rotates proxies, and bypasses anti-bot protections — automatically.
Get clean data back
Receive structured content in your chosen format — HTML, Markdown, plain text, JSON, screenshots, or PDF. Stream results or collect them all at once.
Choose your crawl mode
Pick the rendering strategy that fits your target. Switch modes per-request with a single parameter.
HTTP
FastestDirect HTTP fetching without a browser. Ideal for static sites, blogs, documentation, and any page that doesn't rely on client-side JavaScript.
~50ms per pageSmart
BalancedAuto-detects whether a page needs JavaScript rendering. Falls back to browser mode only when necessary, saving time and credits on static pages.
~200ms per pageBrowser
Full renderHeadless Chrome renders every page with full JavaScript execution. Handles SPAs, lazy-loaded content, infinite scroll, and shadow DOM elements.
~1s per pageStart crawling in minutes
A single API call is all you need. Pick your language and start extracting data.
curl -X POST https://api.spider.cloud/crawl \
-H "Authorization: Bearer $SPIDER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"limit": 50,
"return_format": "markdown"
}' Every format your pipeline needs
One crawl, multiple output formats. Switch between them with a single parameter.
HTML
Raw or cleaned HTML with optional tag filtering
Markdown
Clean Markdown ideal for LLMs and RAG pipelines
Plain Text
Stripped text content with no markup
JSON
Structured extraction via AI-powered schemas
Screenshot
Full-page PNG or viewport captures
Browser-rendered PDF exports of any page
Built for scale
Enterprise-grade infrastructure that handles millions of pages without breaking a sweat.
Elastic concurrency
Spider auto-scales concurrent connections to match your crawl volume. Go from 10 pages to 10 million without changing a line of code.
Smart proxy rotation
Automatic IP rotation across residential and datacenter proxies. Spider selects the optimal proxy type per-domain to maximize success rates.
Anti-bot bypass
Built-in stealth profiles handle Cloudflare, Akamai, PerimeterX, and other bot detection systems. Fingerprints rotate automatically per request.
HTTP caching
Previously crawled pages are cached and served instantly on repeat requests. Reduce costs and latency for recurring crawl jobs.
Webhook delivery
Get notified when crawls complete with real-time webhook callbacks. Process pages as they're discovered instead of waiting for the full crawl.
Web crawling FAQ
What is web crawling?
Web crawling is the automated process of discovering and fetching web pages by following links. A web crawler starts from one or more seed URLs, downloads the page content, extracts links, and repeats the process. Spider handles this entire workflow through a single API call — you provide a URL and Spider returns structured data from every reachable page.
Is web scraping legal?
Web scraping publicly available data is generally legal in the United States, supported by the hiQ Labs v. LinkedIn ruling. However, you should always respect robots.txt directives, terms of service, and avoid scraping personal data without consent. Spider provides built-in robots.txt compliance and rate limiting to help you crawl responsibly.
How many pages can Spider crawl?
Spider can crawl millions of pages per job with no hard limit. Our infrastructure auto-scales to handle crawls of any size — from a single page to an entire domain with hundreds of thousands of URLs. Crawl speed depends on your plan and concurrency settings, with enterprise users reaching 500+ pages per second.
What's the difference between crawling and scraping?
Web crawling is about discovery — following links to find pages across a site. Web scraping is about extraction — pulling specific data from those pages. Spider does both: it crawls your target site to discover all pages, then scrapes each page to return clean, structured data in your chosen format (HTML, Markdown, JSON, and more).
Does Spider handle JavaScript-rendered pages?
Yes. Spider offers three rendering modes: HTTP mode for fast static page fetching, Smart mode that auto-detects whether JavaScript rendering is needed, and Browser mode that uses a full headless browser to render JavaScript-heavy single-page applications. Browser mode supports the same interactions a real user would perform.
Start web crawling today
Get 500 credits free — no credit card required. Crawl your first site in under a minute.