Web Crawling & Scraping

The web crawling and scraping
API built for speed

One endpoint to crawl entire websites and extract structured data. Spider handles rendering, proxies, and anti-bot detection so you can focus on what you do with the data.

Start crawling free View API docs

api.spider.cloud

POST /crawl

→ url: example.com · limit: 100 · format: markdown

← 100 pages · 2.1s · $0.10

100k +

Pages per second

99.5 %

Success rate

Monthly minimum

How web crawling works

Three steps from URL to structured data. No infrastructure to manage, no browsers to maintain.

Submit your URL

Send a target URL to the Spider API. Configure crawl depth, page limits, output format, and rendering mode in a single request.

Spider crawls the site

Spider discovers every page by following links, handles JavaScript rendering, rotates proxies, and bypasses anti-bot protections — automatically.

Get clean data back

Receive structured content in your chosen format — HTML, Markdown, plain text, JSON, screenshots, or PDF. Stream results or collect them all at once.

Choose your crawl mode

Pick the rendering strategy that fits your target. Switch modes per-request with a single parameter.

HTTP

Fastest

Direct HTTP fetching without a browser. Ideal for static sites, blogs, documentation, and any page that doesn't rely on client-side JavaScript.

~50ms per page

Smart

Balanced

Auto-detects whether a page needs JavaScript rendering. Falls back to browser mode only when necessary, saving time and credits on static pages.

~200ms per page

Browser

Full render

Headless Chrome renders every page with full JavaScript execution. Handles SPAs, lazy-loaded content, infinite scroll, and shadow DOM elements.

~1s per page

Start crawling in minutes

A single API call is all you need. Pick your language and start extracting data.

curl -X POST https://api.spider.cloud/crawl \
  -H "Authorization: Bearer $SPIDER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limit": 50,
    "return_format": "markdown"
  }'

import requests

response = requests.post(
    "https://api.spider.cloud/crawl",
    headers={
        "Authorization": f"Bearer {SPIDER_API_KEY}",
    },
    json={
        "url": "https://example.com",
        "limit": 50,
        "return_format": "markdown",
    },
)

pages = response.json()
print(f"Crawled {len(pages)} pages")

const response = await fetch("https://api.spider.cloud/crawl", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SPIDER_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://example.com",
    limit: 50,
    return_format: "markdown",
  }),
});

const pages = await response.json();
console.log(`Crawled ${pages.length} pages`);

Every format your pipeline needs

One crawl, multiple output formats. Switch between them with a single parameter.

</>

HTML

Raw or cleaned HTML with optional tag filtering

# _

Markdown

Clean Markdown ideal for LLMs and RAG pipelines

Plain Text

Stripped text content with no markup

{}

JSON

Structured extraction via AI-powered schemas

▣

Screenshot

Full-page PNG or viewport captures

◼

PDF

Browser-rendered PDF exports of any page

Built for scale

Enterprise-grade infrastructure that handles millions of pages without breaking a sweat.

Elastic concurrency

Spider auto-scales concurrent connections to match your crawl volume. Go from 10 pages to 10 million without changing a line of code.

Smart proxy rotation

Automatic IP rotation across residential and datacenter proxies. Spider selects the optimal proxy type per-domain to maximize success rates.

Anti-bot bypass

Built-in stealth profiles handle Cloudflare, Akamai, PerimeterX, and other bot detection systems. Fingerprints rotate automatically per request.

HTTP caching

Previously crawled pages are cached and served instantly on repeat requests. Reduce costs and latency for recurring crawl jobs.

Webhook delivery

Get notified when crawls complete with real-time webhook callbacks. Process pages as they're discovered instead of waiting for the full crawl.

Web crawling FAQ

What is web crawling?

Web crawling is the automated process of discovering and fetching web pages by following links. A web crawler starts from one or more seed URLs, downloads the page content, extracts links, and repeats the process. Spider handles this entire workflow through a single API call — you provide a URL and Spider returns structured data from every reachable page.

Is web scraping legal?

Web scraping publicly available data is generally legal in the United States, supported by the hiQ Labs v. LinkedIn ruling. However, you should always respect robots.txt directives, terms of service, and avoid scraping personal data without consent. Spider provides built-in robots.txt compliance and rate limiting to help you crawl responsibly.

How many pages can Spider crawl?

Spider can crawl millions of pages per job with no hard limit. Our infrastructure auto-scales to handle crawls of any size — from a single page to an entire domain with hundreds of thousands of URLs. Crawl speed depends on your plan and concurrency settings, with enterprise users reaching 500+ pages per second.

What's the difference between crawling and scraping?

Web crawling is about discovery — following links to find pages across a site. Web scraping is about extraction — pulling specific data from those pages. Spider does both: it crawls your target site to discover all pages, then scrapes each page to return clean, structured data in your chosen format (HTML, Markdown, JSON, and more).

Does Spider handle JavaScript-rendered pages?

Yes. Spider offers three rendering modes: HTTP mode for fast static page fetching, Smart mode that auto-detects whether JavaScript rendering is needed, and Browser mode that uses a full headless browser to render JavaScript-heavy single-page applications. Browser mode supports the same interactions a real user would perform.

Start web crawling today

Get 500 credits free — no credit card required. Crawl your first site in under a minute.

Get started free Read the docs

The web crawling and scraping API built for speed