API Reference

The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standardized HTTP response codes, and authentication. The API supports bulk updates. You can work on multiple objects per request for the core endpoints.

Authentication

Include your API key in the authorization header.

Authorization: Bearer sk-xxxx...

Response formats

Set the content-type header to shape the response.

application/jsonapplication/xmltext/csvapplication/jsonl

Prefix any path with v1 to lock the version. Requests on this page consume live credits.

Just getting started? Quickstart guide →

Not a developer? Use Spider's no-code options to get started without writing code.

Base Url
https://api.spider.cloud

Client libraries

OpenAPI Specllms.txt

Common Parameters

These parameters are shared across Crawl, Scrape, Unblocker, Search, Links, Screenshot, and Fetch. Click any parameter to jump to its full description in the Crawl section.

Advanced(35)
NameTypeDefaultDescription
blacklistarray

Blacklist a set of paths that you do not want to crawl. You can use regex patterns to help with the list.

block_adsbooleantrue

Block advertisements when running request as chrome or smart. This can greatly increase performance.

block_analyticsbooleantrue

Block analytics when running request as chrome or smart. This can greatly increase performance.

block_stylesheetsbooleantrue

Block stylesheets when running request as chrome or smart. This can greatly increase performance.

budgetobject

Object that has paths with a counter for limiting the amount of pages. Use {"*":1} for only crawling the root page. The wildcard matches all routes and you can set child paths to limit depth, e.g. { "/docs/colors": 10, "/docs/": 100 }.

chunking_algobject

Use a chunking algorithm to segment your content output. Pass an object like { "type": "bysentence", "value": 2 } to split the text into an array by every 2 sentences. Works well with markdown or text formats.

concurrency_limitnumber

Set the concurrency limit to help balance request for slower websites. The default is unlimited.

crawl_timeoutobject

The crawl_timeout parameter allows you to put a max duration on the entire crawl. The default setting is 2 mins.

The values for the timeout duration are in the object shape { secs: 300, nanos: 0 }.

data_connectorsobject

Stream crawl results directly to cloud storage and data services. Configure one or more connectors to automatically receive page data as it is crawled. Supports S3, Google Cloud Storage, Google Sheets, Azure Blob Storage, and Supabase. { s3: { bucket, access_key_id, secret_access_key, region?, prefix?, content_type? }, gcs: { bucket, service_account_base64, prefix? }, google_sheets: { spreadsheet_id, service_account_base64, sheet_name? }, azure_blob: { connection_string, container, prefix? }, supabase: { url, anon_key, table }, on_find: bool, on_find_metadata: bool }

depthnumber25

The crawl limit for maximum depth. If 0, no limit will be applied.

disable_interceptbooleanfalse

Disable request interception when running request as chrome or smart. This may help bypass pages that use third-party scripts or external domains.

event_trackerobject

Track the event request, responses, and automation output when using browser rendering. Pass in the object with the following requests and responses for the network output of the page. automation will send detailed information including a screenshot of each automation step used under automation_scripts.

exclude_selectorstring

A CSS query selector to use for ignoring content from the markup of the response.

execution_scriptsobject

Run custom JavaScript on certain paths. Requires chrome or smart request mode. The values should be in the shape "/path_or_url": "custom js".

external_domainsarray

A list of external domains to treat as one domain. You can use regex paths to include the domains. Set one of the array values to * to include all domains.

full_resourcesboolean

Crawl and download all the resources for a website.

max_credits_allowednumber
Set the maximum number of credits to use per run. This will return a blocked by client if the initial response is empty. Credits are measured in decimal units, where 10,000 credits equal one dollar (100 credits per penny).
max_credits_per_pagenumber
Set the maximum number of credits to use per page. Credits are measured in decimal units, where 10,000 credits equal one dollar (100 credits per penny).
metadatabooleanfalse

Collect metadata about the content found like page title, description, keywords and etc. This could help improve AI interoperability.

preserve_hostbooleanfalse

Preserve the default HOST header for the client. This may help bypass pages that require a HOST, and when the TLS cannot be determined.

redirect_policystringLoose

The network redirect policy to use when performing HTTP request.

requeststringsmart

The request type to perform. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

request_timeoutnumber60

The timeout to use for request. Timeouts can be from 5-255 seconds.

root_selectorstring

The root CSS query selector to use extracting content from the markup for the response.

run_in_backgroundbooleanfalse

Run the request in the background. Useful if storing data and wanting to trigger crawls to the dashboard.

sessionbooleantrue

Persist the session for the client that you use on a website. This allows the HTTP headers and cookies to be set like a real browser session.

sitemapbooleanfalse

Include the sitemap results to crawl.

sitemap_onlybooleanfalse

Only include the sitemap results to crawl.

sitemap_pathstringsitemap.xml

The sitemap URL to use when using sitemap.

subdomainsbooleanfalse

Allow subdomains to be included.

tldbooleanfalse

Allow TLD's to be included.

user_agentstring

Add a custom HTTP user agent to the request. By default this is set to a random agent.

wait_forobject

The wait_for parameter allows you to specify various waiting conditions for a website operation. If provided, it contains the following sub-parameters:

The key idle_network specifies the conditions to wait for the network request to be idle within a period. It can include an optional timeout value.

The key idle_network0 specifies the conditions to wait for the network request to be idle with a max timeout. It can include an optional timeout value.

The key almost_idle_network0 specifies the conditions to wait for the network request to be almost idle with a max timeout. It can include an optional timeout value.

The key selector specifies the conditions to wait for a particular CSS selector to be found on the page. It includes an optional timeout value, and the CSS selector to wait for.

The key dom specifies the conditions to wait for a particular element to stop updating for a duration on the page. It includes an optional timeout value, and the CSS selector to wait for.

The key delay specifies a delay to wait for, with an optional timeout value.

The key page_navigationsset to true then waiting for all page navigations will be handled.

If wait_for is not provided, the default behavior is to wait for the network to be idle for 500 milliseconds. All of the durations are capped at capped at 60 seconds.

The values for the timeout duration are in the object shape { secs: 10, nanos: 0 }.

webhooksobject

Use webhooks to get notified on events like credit depleted, new pages, metadata, and website status. { destination: string, on_credits_depleted: bool, on_credits_half_depleted: bool, on_website_status: bool, on_find: bool, on_find_metadata: bool }

whitelistarray

Whitelist a set of paths that you want to crawl, ignoring all other routes that do not match the patterns. You can use regex patterns to help with the list.

Core(5)
NameTypeDefaultDescription
disable_hintsboolean

Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

limitnumber0

The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

lite_modeboolean

Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.

network_blackliststring[]

Blocks matching network requests from being fetched/loaded. Use this to reduce bandwidth and noise by preventing known-unneeded third-party resources from ever being requested.

Each entry is a string match pattern (commonly a hostname, domain, or URL substring). If both whitelist and blacklist are set, whitelist takes precedence.

  • Good targets: googletagmanager.com, doubleclick.net, maps.googleapis.com
  • Prefer specific domains over broad substrings to avoid breaking essential assets.
network_whiteliststring[]

Allows only matching network requests to be fetched/loaded. Use this for a strict "allowlist-first" approach: keep the crawl lightweight while still permitting the essential scripts/styles needed for rendering and JS execution.

Each entry is a string match pattern (commonly a hostname, domain, or URL substring). When set, requests not matching any whitelist entry are blocked by default.

  • Start with first-party: example.com, cdn.example.com
  • Add only what you observe you truly need (fonts/CDNs), then iterate.
Output(16)
NameTypeDefaultDescription
clean_htmlboolean

Clean the HTML of unwanted attributes.

css_extraction_mapobject

Use CSS or XPath selectors to scrape contents from the web page. Set the paths and the extraction object map to perform extractions per path or page.

encodingstring

The type of encoding to use like UTF-8, SHIFT_JIS, or etc.

filter_imagesboolean

Filter image elements from the markup.

filter_output_imagesboolean

Filter the images from the output.

filter_output_main_onlybooleantrue

Filter the nav, aside, and footer from the output.

filter_output_svgboolean

Filter the svg tags from the output.

filter_svgboolean

Filter SVG elements from the markup.

link_rewritejson

Optional URL rewrite rule applied to every discovered link before it's crawled. This lets you normalize or redirect URLs (for example, rewriting paths or mapping one host pattern to another).

The value must be a JSON object with a type field. Supported types:

  • "replace" – simple substring replacement.
    Fields:
    • host?: string (optional) – only apply when the link's host matches this value (e.g. "blog.example.com").
    • find: string – substring to search for in the URL.
    • replace_with: string – replacement substring.
  • "regex" – regex-based rewrite with capture groups.
    Fields:
    • host?: string (optional) – only apply for this host.
    • pattern: string – regex applied to the full URL.
    • replace_with: string – replacement string supporting $1, $2, etc.

Invalid or unsafe regex patterns (overly long, unbalanced parentheses, advanced lookbehind constructs, etc.) are rejected by the server and ignored.

readabilitybooleanfalse

Use readability to pre-process the content for reading. This may drastically improve the content for LLM usage.

return_cookiesbooleanfalse

Return the HTTP response cookies with the results.

return_embeddingsbooleanfalse

Include OpenAI embeddings for title and description. Requires metadata to be enabled.

return_formatstring | arrayraw

The format to return the data in. Possible values are markdown, commonmark, raw, text, xml, bytes, and empty. Use raw to return the default format of the page like HTML etc.

return_headersbooleanfalse

Return the HTTP response headers with the results.

return_json_databooleanfalse

Return the JSON data found in scripts used for SSR.

return_page_linksbooleanfalse

Return the links found on each page.

Config(7)
NameTypeDefaultDescription
cookiesstring

Add HTTP cookies to use for request.

fingerprintbooleantrue

Use advanced fingerprint detection for chrome.

headersobject

Forward HTTP headers to use for all requests. The object is expected to be a map of key value pairs.

proxy'residential' | 'mobile' | 'isp'

Select the proxy pool for this request. Leave blank to disable proxy routing. Using this param overrides all other proxy_* shorthand configurations. See the pricing table for full details. Alternatively, use Proxy-Mode to route standard HTTP traffic through Spider's proxy endpoint.

proxy_enabledbooleanfalse

Enable premium high performance proxies to prevent detection and increase speed. You can also use Proxy-Mode to route requests through Spider's proxy front-end instead.

remote_proxystring

Use a remote external proxy connection. You also save 50% on data transfer costs when you bring your own proxy.

stealthbooleantrue

Use stealth mode for headless chrome request to help prevent being blocked.

Performance(5)
NameTypeDefaultDescription
cacheboolean | { maxAge?: number; allowStale?: boolean; period?: string; skipBrowser?: boolean }true

Use HTTP caching for the crawl to speed up repeated runs. Defaults to true.

Accepts either:

  • true / false
  • A cache control object:
    • maxAge (ms) — freshness window (default: 172800000 = 2 days). Set 0 for always fetch fresh.
    • allowStale — serve cached results even if stale.
    • period — RFC3339 timestamp cutoff (overrides maxAge), e.g. "2025-11-29T12:00:00Z"
    • skipBrowser — skip browser entirely if cached HTML exists. Returns cached HTML directly without launching Chrome for instant responses.

Default behavior by route type:

  • Standard routes (/crawl, /scrape, /unblocker) — cache is true with skipBrowser enabled by default. Cached pages return instantly without re-launching Chrome. To force a fresh browser fetch, set cache: false or { "skipBrowser": false }.
  • AI routes (/ai/crawl, /ai/scrape, etc.) — cache is true but skipBrowser is not enabled. AI routes always use the browser to ensure live page content for extraction.
delaynumber0

Add a crawl delay of up to 60 seconds, disabling concurrency. The delay needs to be in milliseconds format.

respect_robotsbooleantrue

Respect the robots.txt file for crawling.

service_worker_enabledbooleantrue

Allow the website to use Service Workers as needed.

skip_config_checksbooleantrue

Skip checking the database for website configuration. This will increase performance for requests that use limit=1.

Automation(4)
NameTypeDefaultDescription
automation_scriptsobject

Run custom web automated tasks on certain paths. Requires chrome or smart request mode.

Below are the available actions for web automation:
  • Evaluate: Runs custom JavaScript code.
    { "Evaluate": "console.log('Hello, World!');" }
  • Click: Clicks on an element identified by a CSS selector.
    { "Click": "button#submit" }
  • ClickAll: Clicks on all elements matching a CSS selector.
    { "ClickAll": "button.loadMore" }
  • ClickPoint: Clicks at the position x and y coordinates.
    { "ClickPoint": { "x": 120.5, "y": 340.25 } }
  • ClickAllClickable: Clicks on common clickable elements (buttons/inputs/role=button/etc.).
    { "ClickAllClickable": true }
  • ClickHold: Clicks and holds on an element (via selector) for a duration in milliseconds.
    { "ClickHold": { "selector": "#sliderThumb", "hold_for_ms": 750 } }
  • ClickHoldPoint: Clicks and holds at a point for a duration in milliseconds.
    { "ClickHoldPoint": { "x": 250.0, "y": 410.0, "hold_for_ms": 750 } }
  • ClickDrag: Click-and-drag from one element to another (selector → selector) with optional modifier.
    { "ClickDrag": { "from": "#handle", "to": "#target", "modifier": 8 } }
  • ClickDragPoint: Click-and-drag from one point to another with optional modifier.
    { "ClickDragPoint": { "from_x": 100.0, "from_y": 200.0, "to_x": 500.0, "to_y": 220.0, "modifier": 0 } }
  • Wait: Waits for a specified duration in milliseconds.
    { "Wait": 2000 }
  • WaitForNavigation: Waits for the next navigation event.
    { "WaitForNavigation": true }
  • WaitFor: Waits for an element to appear identified by a CSS selector.
    { "WaitFor": "div#content" }
  • WaitForWithTimeout: Waits for an element to appear with a timeout (ms).
    { "WaitForWithTimeout": { "selector": "div#content", "timeout": 8000 } }
  • WaitForAndClick: Waits for an element to appear and then clicks on it, identified by a CSS selector.
    { "WaitForAndClick": "button#loadMore" }
  • WaitForDom: Waits for DOM updates to settle (quiet/stable) on a selector (or body) with timeout (ms).
    { "WaitForDom": { "selector": "main", "timeout": 12000 } }
  • ScrollX: Scrolls the screen horizontally by a specified number of pixels.
    { "ScrollX": 100 }
  • ScrollY: Scrolls the screen vertically by a specified number of pixels.
    { "ScrollY": 200 }
  • Fill: Fills an input element with a specified value.
    { "Fill": { "selector": "input#name", "value": "John Doe" } }
  • Type: Type a key into the browser with an optional modifier.
    { "Type": { "value": "John Doe", "modifier": 0 } }
  • InfiniteScroll: Scrolls the page until the end for certain duration.
    { "InfiniteScroll": 3000 }
  • Screenshot: Perform a screenshot on the page.
    { "Screenshot": { "full_page": true, "omit_background": true, "output": "out.png" } }
  • ValidateChain: Set this before a step to validate the prior action to break out of the chain.
    { "ValidateChain": true }
evaluate_on_new_documentstring
Set a custom script to evaluate on new document creation.
scrollnumber

Infinite scroll the page as new content loads, up to a duration in milliseconds. You may still need to use the wait_for parameters. Requires chrome request mode.

viewportobject

Configure the viewport for chrome.

Geolocation(2)
NameTypeDefaultDescription
country_codestring

Set a ISO country code for proxy connections. View the locations list for available countries.

localestring

The locale to use for request, example en-US.

Per-endpoint notes

Scrape & Unblocker exclude limit, depth, and delay. Single-page endpoints.

Screenshot exclude request, return_format, and readability. Returns image data.

Every endpoint below includes these parameters in its own parameter tabs with full descriptions. This section is a quick-reference index.

Crawl

Details

Start crawling website(s) to collect resources. You can pass an array of objects for the request body.

POSThttps://api.spider.cloud/crawl
  • urlCrawl API -
    stringrequired

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
  • limitCrawl API -
    number
    Default: 0

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
  • disable_hintsCrawl API -
    boolean

    Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

    Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

    If you're tuning filters, keep hints enabled and pair with event_tracker to see the complete URL list; once stable, you can flip disable_hints on to lock behavior.
Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"limit":5,"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/crawl', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "content": "<resource>...",
    "error": null,
    "status": 200,
    "duration_elapsed_ms": 122,
    "costs": {
      "ai_cost": 0,
      "compute_cost": 0.00001,
      "file_cost": 0.00002,
      "bytes_transferred_cost": 0.00002,
      "total_cost": 0.00004,
      "transform_cost": 0.0001
    },
    "url": "https://spider.cloud"
  },
  // more content...
]

Scrape

Details

Start scraping a single page on website(s) to collect resources. You can pass an array of objects for the request body. This endpoint is also available via Proxy-Mode.

POSThttps://api.spider.cloud/scrape
  • urlScrape API -
    stringrequired

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
  • disable_hintsScrape API -
    boolean

    Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

    Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

    If you're tuning filters, keep hints enabled and pair with event_tracker to see the complete URL list; once stable, you can flip disable_hints on to lock behavior.
  • lite_modeScrape API -
    boolean

    Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/scrape', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "content": "<resource>...",
    "error": null,
    "status": 200,
    "duration_elapsed_ms": 122,
    "costs": {
      "ai_cost": 0,
      "compute_cost": 0.00001,
      "file_cost": 0.00002,
      "bytes_transferred_cost": 0.00002,
      "total_cost": 0.00004,
      "transform_cost": 0.0001
    },
    "url": "https://spider.cloud"
  },
  // more content...
]

Unblocker

Details

Start unblocking challenging website(s) to collect data. You can pass an array of objects for the request body. Cost 10-40 credits additional per success.

POSThttps://api.spider.cloud/unblocker
  • urlUnblocker API -
    stringrequired

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
  • disable_hintsUnblocker API -
    boolean

    Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

    Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

    If you're tuning filters, keep hints enabled and pair with event_tracker to see the complete URL list; once stable, you can flip disable_hints on to lock behavior.
  • lite_modeUnblocker API -
    boolean

    Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/unblocker', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "url": "https://spider.cloud",
    "status": 200,
    "cookies": {
        "a": "something",
        "b": "something2"
    },
    "headers": {
        "x-id": 123,
        "x-cookie": 123
    },
    "status": 200,
    "costs": {
        "ai_cost": 0.001,
        "ai_cost_formatted": "0.0010",
        "bytes_transferred_cost": 3.1649999999999997e-9,
        "bytes_transferred_cost_formatted": "0.0000000031649999999999997240",
        "compute_cost": 0.0,
        "compute_cost_formatted": "0",
        "file_cost": 0.000029291250000000002,
        "file_cost_formatted": "0.0000292912499999999997868372",
        "total_cost": 0.0010292944150000001,
        "total_cost_formatted": "0.0010292944149999999997865612",
        "transform_cost": 0.0,
        "transform_cost_formatted": "0"
    },
    "content": "<html>...</html>",
    "error": null
  },
  // more content...
]
Details

Perform a Google search to gather a list of websites for crawling and resource collection, including fallback options if the query yields no results. You can pass an array of objects for the request body. This endpoint is also available via Proxy-Mode.

POSThttps://api.spider.cloud/search
  • limitSearch API -
    number
    Default: 0

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"search":"sports news today","search_limit":3,"limit":5,"return_format":"markdown"}

response = requests.post('https://api.spider.cloud/search', 
  headers=headers, json=json_data)

print(response.json())
Response
{
  "content": [
      {
          "description": "Visit ESPN for live scores, highlights and sports news. Stream exclusive games on ESPN+ and play fantasy sports.",
          "title": "ESPN - Serving Sports Fans. Anytime. Anywhere.",
          "url": "https://www.espn.com/"
      },
      {
          "description": "Sports Illustrated, SI.com provides sports news, expert analysis, highlights, stats and scores for the NFL, NBA, MLB, NHL, college football, soccer,&nbsp;...",
          "title": "Sports Illustrated",
          "url": "https://www.si.com/"
      },
      {
          "description": "CBS Sports features live scoring, news, stats, and player info for NFL football, MLB baseball, NBA basketball, NHL hockey, college basketball and football.",
          "title": "CBS Sports - News, Live Scores, Schedules, Fantasy ...",
          "url": "https://www.cbssports.com/"
      },
      {
          "description": "Sport is a form of physical activity or game. Often competitive and organized, sports use, maintain, or improve physical ability and skills.",
          "title": "Sport",
          "url": "https://en.wikipedia.org/wiki/Sport"
      },
      {
          "description": "Watch FOX Sports and view live scores, odds, team news, player news, streams, videos, stats, standings &amp; schedules covering NFL, MLB, NASCAR, WWE, NBA, NHL,&nbsp;...",
          "title": "FOX Sports News, Scores, Schedules, Odds, Shows, Streams ...",
          "url": "https://www.foxsports.com/"
      },
      {
          "description": "Founded in 1974 by tennis legend, Billie Jean King, the Women's Sports Foundation is dedicated to creating leaders by providing girls access to sports.",
          "title": "Women's Sports Foundation: Home",
          "url": "https://www.womenssportsfoundation.org/"
      },
      {
          "description": "List of sports · Running. Marathon · Sprint · Mascot race · Airsoft · Laser tag · Paintball · Bobsleigh · Jack jumping · Luge · Shovel racing · Card stacking&nbsp;...",
          "title": "List of sports",
          "url": "https://en.wikipedia.org/wiki/List_of_sports"
      },
      {
          "description": "Stay up-to-date with the latest sports news and scores from NBC Sports.",
          "title": "NBC Sports - news, scores, stats, rumors, videos, and more",
          "url": "https://www.nbcsports.com/"
      },
      {
          "description": "r/sports: Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, and leagues around the world.",
          "title": "r/sports",
          "url": "https://www.reddit.com/r/sports/"
      },
      {
          "description": "The A-Z of sports covered by the BBC Sport team. Find all the latest live sports coverage, breaking news, results, scores, fixtures, tables,&nbsp;...",
          "title": "AZ Sport",
          "url": "https://www.bbc.com/sport/all-sports"
      }
  ]
}
Details

Start crawling a website(s) to collect links found. You can pass an array of objects for the request body. This endpoint can save on latency if you only need to index the content URLs. Also available via Proxy-Mode.

POSThttps://api.spider.cloud/links
Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"limit":5,"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/links', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "url": "https://spider.cloud",
    "status": 200,
    "duration_elasped_ms": 112
    "error": null
  },
  // more content...
]

Screenshot

Details

Take screenshots of a website to base64 or binary encoding. You can pass an array of objects for the request body. This endpoint is also available via Proxy-Mode.

POSThttps://api.spider.cloud/screenshot
  • urlScreenshot API -
    stringrequired

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
  • limitScreenshot API -
    number
    Default: 0

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
  • disable_hintsScreenshot API -
    boolean

    Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

    Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

    If you're tuning filters, keep hints enabled and pair with event_tracker to see the complete URL list; once stable, you can flip disable_hints on to lock behavior.
Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"limit":5,"url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/screenshot', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "content": "<resource>...",
    "error": null,
    "status": 200,
    "duration_elapsed_ms": 122,
    "costs": {
      "ai_cost": 0,
      "compute_cost": 0.00001,
      "file_cost": 0.00002,
      "bytes_transferred_cost": 0.00002,
      "total_cost": 0.00004,
      "transform_cost": 0.0001
    },
    "url": "https://spider.cloud"
  },
  // more content...
]

Transform HTML

Details

Transform HTML into Markdown or plain text quickly. Each HTML transformation starts at 0.1 credits, while PDF transformations can cost up to 10 credits per page. You can submit up to 10 MB of data per request. The Transform API is also integrated into the /crawl endpoint via the return_format parameter.

POSThttps://api.spider.cloud/transform
  • dataTransform API -
    objectrequired

    A list of html data to transform. The object list takes the keys html and url. The url key is optional and only used when the readability is enabled.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/json',
}

json_data = {"return_format":"markdown","data":[{"html":"<html><body>\n<h1>Example Website</h1>\n<p>This is some example markup to use to test the transform function.</p>\n<p><a href=\"https://spider.cloud/guides\">Guides</a></p>\n</body></html>","url":"https://example.com"}]}

response = requests.post('https://api.spider.cloud/transform', 
  headers=headers, json=json_data)

print(response.json())
Response
{
    "content": [
      "# Example Website
This is some example markup to use to test the transform function.
[Guides](https://spider.cloud/guides)"
    ],
    "cost": {
        "ai_cost": 0,
        "compute_cost": 0,
        "file_cost": 0,
        "bytes_transferred_cost": 0,
        "total_cost": 0,
        "transform_cost": 0.0001
    },
    "error": null,
    "status": 200
  }

Proxy-Mode

Spider also offers a proxy front-end to the service. The Spider proxy will then handle requests just like any standard request, with the option to use high-performance and residential proxies up to 10GB per/s. Take a look at all of our proxy locations to see if we support the country.

Proxy-Mode works with all core endpoints: Crawl, Scrape, Screenshot, Search, and Links. Pass API parameters in the password field to configure rendering, proxies, and more.

**HTTP address**: proxy.spider.cloud:80**HTTPS address**: proxy.spider.cloud:443**Username**: YOUR-API-KEY**Password**: PARAMETERS
  • Residential — real-user IPs across 100+ countries. High anonymity, up to 1 GB/s. $1–4/GB
  • ISP — stable datacenter IPs with ISP-grade routing. Highest performance, up to 10 GB/s. $1/GB
  • Mobile — real 4G/5G device IPs for maximum stealth. $2/GB

Use country_code to set geolocation and proxy to select the pool type.

Proxy TypePriceMultiplierDescription
residential$2.00/GB×2-x4Entry-level residential pool
mobile$2.00/GB×24G/5G mobile proxies for stealth
isp$1.00/GB×1ISP-grade residential routing
Example proxy request
import requests, os


# Proxy configuration
proxies = {
    'http': f"http://{os.getenv('SPIDER_API_KEY')}:[email protected]:8888",
    'https': f"https://{os.getenv('SPIDER_API_KEY')}:[email protected]:8889"
}

# Function to make a request through the proxy
def get_via_proxy(url):
    try:
        response = requests.get(url, proxies=proxies)
        response.raise_for_status()
        print('Response HTTP Status Code: ', response.status_code)
        print('Response HTTP Response Body: ', response.content)
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Error: {e}")
        return None

# Example usage
if __name__ == "__main__":
     get_via_proxy("https://www.example.com")
     get_via_proxy("https://www.example.com/community")

Browser

Spider Browser is a Rust-based cloud browser for automation, scraping, and AI extraction. Connect via the browser.spider.cloud WebSocket endpoint using any Playwright or Puppeteer compatible client, or use the spider-browser TypeScript library for a higher-level API with built-in AI actions.

**WebSocket endpoint**: wss://browser.spider.cloud/v1/browser**Authentication**: ?token=YOUR-API-KEY**Protocol**: CDP WebDriver BiDi
  • AI extraction & actions — extract structured data or perform actions with natural language. Vision models handle complex pages.
  • Stealth & proxies — automatic fingerprint rotation, residential proxies, and a retry engine that recovers sessions on its own.
  • 100 concurrent browsers — per user on all plans. Pass stealth, browser, and country query params to configure each session.

Sessions can be recorded and replayed from the dashboard. See the spider-browser repo for full documentation and examples.

Basic usage — AI extract & act
import { SpiderBrowser } from "spider-browser"

const browser = new SpiderBrowser({
  apiKey: process.env.SPIDER_API_KEY!,
})
await browser.init()
await browser.page.goto("https://example.com")

// extract structured data with AI
const prices = await browser.extract("Get all product prices")

// perform actions with natural language
await browser.act("Add the cheapest item to the cart")

// take a screenshot
const screenshot = await browser.page.screenshot()

await browser.close()
Scrape & interact
import { SpiderBrowser } from "spider-browser"

const browser = new SpiderBrowser({
  apiKey: process.env.SPIDER_API_KEY!,
})
await browser.init()

// navigate and interact with the page
await browser.page.goto("https://example.com/search")
await browser.page.fill("input[name=q]", "web scraping")
await browser.page.press("Enter")
await browser.page.waitForSelector(".results")

// extract structured fields from the DOM
const data = await browser.page.extractFields({
  title: "h1",
  description: ".description",
  image: { selector: "img.hero", attribute: "src" },
})

await browser.close()
Session recording
import { SpiderBrowser } from "spider-browser"

// Enable session recording — replay later in the dashboard
const browser = new SpiderBrowser({
  apiKey: process.env.SPIDER_API_KEY!,
  record: true, // screencast + interaction capture
})
await browser.init()

await browser.page.goto("https://example.com")
await browser.act("Click the login button")
await browser.act("Fill in the email field with [email protected]")

// Recording is automatically saved when the session ends
await browser.close()
// View recordings at spider.cloud/account/recordings

Queries

Query the data that you collect during crawling and scraping. Add dynamic filters for extracting exactly what is needed.

Logs

Get the last 24 hours of logs.

GEThttps://api.spider.cloud/data/crawl_logs
  • urlLogs API -
    string

    Filter a single url record.

  • limitLogs API -
    string

    The limit of records to get.

  • domainLogs API -
    string

    Filter a single domain record.

  • pageLogs API -
    number

    The current page to get.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/jsonl',
}

response = requests.get('https://api.spider.cloud/data/crawl_logs?limit=5&return_format=markdown&url=https%253A%252F%252Fspider.cloud', 
  headers=headers)

print(response.json())
Response
{
  "data": {
    "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh",
    "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
    "domain": "spider.cloud",
    "url": "https://spider.cloud",
    "links": 1,
    "credits_used": 3,
    "mode": 2,
    "crawl_duration": 340,
    "message": null,
    "request_user_agent": "Spider",
    "level": "UI",
    "status_code": 0,
    "created_at": "2024-04-21T01:21:32.886863+00:00",
    "updated_at": "2024-04-21T01:21:32.886863+00:00"
  },
  "error": null
}

Credits

Get the remaining credits available.

GEThttps://api.spider.cloud/data/credits
Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/jsonl',
}

response = requests.get('https://api.spider.cloud/data/credits?limit=5&return_format=markdown&url=https%253A%252F%252Fspider.cloud', 
  headers=headers)

print(response.json())
Response
{
  "data": {
    "id": "8d662167-5a5f-41aa-9cb8-0cbb7d536891",
    "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
    "credits": 53334,
    "created_at": "2024-04-21T01:21:32.886863+00:00",
    "updated_at": "2024-04-21T01:21:32.886863+00:00"
  }
}

Scraper Configs Alpha

Browse optimized scraper configs for popular websites. Each config defines extraction rules (selectors, AI prompts, stealth settings, and more) curated for the best results out of the box.

Scraper Directory Alpha

Browse optimized scraper configs for popular websites. Filter by domain, category, or search term. Each config is curated to deliver the best extraction results out of the box. No authentication required.

GEThttps://api.spider.cloud/data/scraper-directory
  • urlScraper API -
    string

    Filter a single url record.

  • limitScraper API -
    string

    The limit of records to get.

  • domainScraper API -
    string

    Filter a single domain record.

  • pageScraper API -
    number

    The current page to get.

Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/jsonl',
}

response = requests.get('https://api.spider.cloud/data/scraper-directory?limit=5&return_format=markdown&url=https%253A%252F%252Fspider.cloud', 
  headers=headers)

print(response.json())
Response
{
  "data": [
    {
      "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "domain": "example.com",
      "path_pattern": "/blog/*",
      "display_name": "Example Blog Scraper",
      "description": "Extracts blog posts with title, author, and content.",
      "category": "news",
      "tags": ["blog", "articles"],
      "confidence_score": 0.95,
      "validation_count": 12,
      "slug": "example-com-blog",
      "created_at": "2025-12-01T10:00:00+00:00",
      "updated_at": "2026-01-15T08:30:00+00:00"
    }
  ],
  "total": 1,
  "page": 1,
  "limit": 20,
  "total_pages": 1
}

Fetch API Alpha

Per-website scraper endpoints that auto-configure themselves. POST /fetch/{domain}/{path} — AI discovers optimal CSS selectors, extraction schemas, and request settings on the first request, then caches and reuses them for fast, consistent structured data. Full documentation →

POSThttps://api.spider.cloud/fetch/example.com/
  • urlFetch API -
    stringrequired

    The URI resource to crawl. This can be a comma split list for multiple URLs.

    To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually.
  • limitFetch API -
    number
    Default: 0

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

    It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found.
  • disable_hintsFetch API -
    boolean

    Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating network_blacklist/network_whitelist recommendations based on observed request-pattern outcomes). Hints are enabled by default for all smart request modes.

    Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.

    If you're tuning filters, keep hints enabled and pair with event_tracker to see the complete URL list; once stable, you can flip disable_hints on to lock behavior.
Request
import requests

headers = {
    'Authorization': 'Bearer $SPIDER_API_KEY',
    'Content-Type': 'application/jsonl',
}

json_data = {"limit":5,"return_format":"markdown","url":"https://spider.cloud"}

response = requests.post('https://api.spider.cloud/fetch/example.com/', 
  headers=headers, json=json_data)

print(response.json())
Response
[
  {
    "url": "https://example.com/",
    "status": 200,
    "content": "{\n  \"title\": \"Example Domain\",\n  \"description\": \"This domain is for use in illustrative examples.\",\n  \"links\": [\"https://www.iana.org/domains/example\"]\n}",
    "error": null,
    "costs": {
      "total_cost": 0.001,
      "total_cost_formatted": "0.0010"
    }
  }
]