API Reference
The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standardized HTTP response codes, and authentication. The API supports bulk updates. You can work on multiple objects per request for the core endpoints.
Authentication
Include your API key in the authorization header.
Authorization: Bearer sk-xxxx...Response formats
Set the content-type header to shape the response.
Prefix any path with v1 to lock the version. Requests on this page consume live credits.
Just getting started? Quickstart guide →
Not a developer? Use Spider's no-code options to get started without writing code.
https://api.spider.cloudCommon Parameters
These parameters are shared across Crawl, Scrape, Unblocker, Search, Links, Screenshot, and Fetch. Click any parameter to jump to its full description in the Crawl section.
Advanced(35)
| Name | Type | Default | Description |
|---|---|---|---|
| blacklist | array | — | Blacklist a set of paths that you do not want to crawl. You can use regex patterns to help with the list. |
| block_ads | boolean | true | Block advertisements when running request as |
| block_analytics | boolean | true | Block analytics when running request as |
| block_stylesheets | boolean | true | Block stylesheets when running request as |
| budget | object | — | Object that has paths with a counter for limiting the amount of pages. Use |
| chunking_alg | object | — | Use a chunking algorithm to segment your content output. Pass an object like |
| concurrency_limit | number | — | Set the concurrency limit to help balance request for slower websites. The default is unlimited. |
| crawl_timeout | object | — | The The values for the timeout duration are in the object shape |
| data_connectors | object | — | Stream crawl results directly to cloud storage and data services. Configure one or more connectors to automatically receive page data as it is crawled. Supports S3, Google Cloud Storage, Google Sheets, Azure Blob Storage, and Supabase. |
| depth | number | 25 | The crawl limit for maximum depth. If |
| disable_intercept | boolean | false | Disable request interception when running request as |
| event_tracker | object | — | Track the event request, responses, and automation output when using browser rendering. Pass in the object with the following |
| exclude_selector | string | — | A CSS query selector to use for ignoring content from the markup of the response. |
| execution_scripts | object | — | Run custom JavaScript on certain paths. Requires |
| external_domains | array | — | A list of external domains to treat as one domain. You can use regex paths to include the domains. Set one of the array values to |
| full_resources | boolean | — | Crawl and download all the resources for a website. |
| max_credits_allowed | number | — | Set the maximum number of credits to use per run. This will return a blocked by client if the initial response is empty. Credits are measured in decimal units, where 10,000 credits equal one dollar (100 credits per penny). |
| max_credits_per_page | number | — | Set the maximum number of credits to use per page. Credits are measured in decimal units, where 10,000 credits equal one dollar (100 credits per penny). |
| metadata | boolean | false | Collect metadata about the content found like page title, description, keywords and etc. This could help improve AI interoperability. |
| preserve_host | boolean | false | Preserve the default HOST header for the client. This may help bypass pages that require a HOST, and when the TLS cannot be determined. |
| redirect_policy | string | Loose | The network redirect policy to use when performing HTTP request. |
| request | string | smart | The request type to perform. Use |
| request_timeout | number | 60 | The timeout to use for request. Timeouts can be from |
| root_selector | string | — | The root CSS query selector to use extracting content from the markup for the response. |
| run_in_background | boolean | false | Run the request in the background. Useful if storing data and wanting to trigger crawls to the dashboard. |
| session | boolean | true | Persist the session for the client that you use on a website. This allows the HTTP headers and cookies to be set like a real browser session. |
| sitemap | boolean | false | Include the sitemap results to crawl. |
| sitemap_only | boolean | false | Only include the sitemap results to crawl. |
| sitemap_path | string | sitemap.xml | The sitemap URL to use when using |
| subdomains | boolean | false | Allow subdomains to be included. |
| tld | boolean | false | Allow TLD's to be included. |
| user_agent | string | — | Add a custom HTTP user agent to the request. By default this is set to a random agent. |
| wait_for | object | — | The The key The key The key The key The key The key The key If The values for the timeout duration are in the object shape |
| webhooks | object | — | Use webhooks to get notified on events like credit depleted, new pages, metadata, and website status. |
| whitelist | array | — | Whitelist a set of paths that you want to crawl, ignoring all other routes that do not match the patterns. You can use regex patterns to help with the list. |
Core(5)
| Name | Type | Default | Description |
|---|---|---|---|
| disable_hints | boolean | — | Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs. |
| limit | number | 0 | The maximum amount of pages allowed to crawl per website. Remove the value or set it to |
| lite_mode | boolean | — | Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections. |
| network_blacklist | string[] | — | Blocks matching network requests from being fetched/loaded. Use this to reduce bandwidth and noise by preventing known-unneeded third-party resources from ever being requested. Each entry is a string match pattern (commonly a hostname, domain, or URL substring). If both whitelist and blacklist are set, whitelist takes precedence.
|
| network_whitelist | string[] | — | Allows only matching network requests to be fetched/loaded. Use this for a strict "allowlist-first" approach: keep the crawl lightweight while still permitting the essential scripts/styles needed for rendering and JS execution. Each entry is a string match pattern (commonly a hostname, domain, or URL substring). When set, requests not matching any whitelist entry are blocked by default.
|
Output(16)
| Name | Type | Default | Description |
|---|---|---|---|
| clean_html | boolean | — | Clean the HTML of unwanted attributes. |
| css_extraction_map | object | — | Use CSS or XPath selectors to scrape contents from the web page. Set the paths and the extraction object map to perform extractions per path or page. |
| encoding | string | — | The type of encoding to use like |
| filter_images | boolean | — | Filter image elements from the markup. |
| filter_output_images | boolean | — | Filter the images from the output. |
| filter_output_main_only | boolean | true | Filter the nav, aside, and footer from the output. |
| filter_output_svg | boolean | — | Filter the svg tags from the output. |
| filter_svg | boolean | — | Filter SVG elements from the markup. |
| link_rewrite | json | — | Optional URL rewrite rule applied to every discovered link before it's crawled. This lets you normalize or redirect URLs (for example, rewriting paths or mapping one host pattern to another). The value must be a JSON object with a
Invalid or unsafe regex patterns (overly long, unbalanced parentheses, advanced lookbehind constructs, etc.) are rejected by the server and ignored. |
| readability | boolean | false | Use readability to pre-process the content for reading. This may drastically improve the content for LLM usage. |
| return_cookies | boolean | false | Return the HTTP response cookies with the results. |
| return_embeddings | boolean | false | Include OpenAI embeddings for |
| return_format | string | array | raw | The format to return the data in. Possible values are |
| return_headers | boolean | false | Return the HTTP response headers with the results. |
| return_json_data | boolean | false | Return the JSON data found in scripts used for SSR. |
| return_page_links | boolean | false | Return the links found on each page. |
Config(7)
| Name | Type | Default | Description |
|---|---|---|---|
| cookies | string | — | Add HTTP cookies to use for request. |
| fingerprint | boolean | true | Use advanced fingerprint detection for chrome. |
| headers | object | — | Forward HTTP headers to use for all requests. The object is expected to be a map of key value pairs. |
| proxy | 'residential' | 'mobile' | 'isp' | — | Select the proxy pool for this request. Leave blank to disable proxy routing. Using this param overrides all other |
| proxy_enabled | boolean | false | Enable premium high performance proxies to prevent detection and increase speed. You can also use Proxy-Mode to route requests through Spider's proxy front-end instead. |
| remote_proxy | string | — | Use a remote external proxy connection. You also save 50% on data transfer costs when you bring your own proxy. |
| stealth | boolean | true | Use stealth mode for headless chrome request to help prevent being blocked. |
Performance(5)
| Name | Type | Default | Description |
|---|---|---|---|
| cache | boolean | { maxAge?: number; allowStale?: boolean; period?: string; skipBrowser?: boolean } | true | Use HTTP caching for the crawl to speed up repeated runs. Defaults to Accepts either:
Default behavior by route type:
|
| delay | number | 0 | Add a crawl delay of up to 60 seconds, disabling concurrency. The delay needs to be in milliseconds format. |
| respect_robots | boolean | true | Respect the robots.txt file for crawling. |
| service_worker_enabled | boolean | true | Allow the website to use Service Workers as needed. |
| skip_config_checks | boolean | true | Skip checking the database for website configuration. This will increase performance for requests that use limit=1. |
Automation(4)
| Name | Type | Default | Description |
|---|---|---|---|
| automation_scripts | object | — | Run custom web automated tasks on certain paths. Requires Below are the available actions for web automation:
|
| evaluate_on_new_document | string | — | Set a custom script to evaluate on new document creation. |
| scroll | number | — | Infinite scroll the page as new content loads, up to a duration in milliseconds. You may still need to use the |
| viewport | object | — | Configure the viewport for chrome. |
Geolocation(2)
| Name | Type | Default | Description |
|---|---|---|---|
| country_code | string | — | Set a ISO country code for proxy connections. View the locations list for available countries. |
| locale | string | — | The locale to use for request, example |
Per-endpoint notes
Scrape & Unblocker exclude limit, depth, and delay. Single-page endpoints.
Screenshot exclude request, return_format, and readability. Returns image data.
Every endpoint below includes these parameters in its own parameter tabs with full descriptions. This section is a quick-reference index.
Crawl
Start crawling website(s) to collect resources. You can pass an array of objects for the request body.
https://api.spider.cloud/crawl- urlCrawl API -stringrequired
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually. - limitCrawl API -numberDefault: 0
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0to crawl all pages.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found. - disable_hintsCrawl API -boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
If you're tuning filters, keep hints enabled and pair withevent_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior.
[ { "content": "<resource>...", "error": null, "status": 200, "duration_elapsed_ms": 122, "costs": { "ai_cost": 0, "compute_cost": 0.00001, "file_cost": 0.00002, "bytes_transferred_cost": 0.00002, "total_cost": 0.00004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Scrape
Start scraping a single page on website(s) to collect resources. You can pass an array of objects for the request body. This endpoint is also available via Proxy-Mode.
https://api.spider.cloud/scrape- urlScrape API -stringrequired
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually. - disable_hintsScrape API -boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
If you're tuning filters, keep hints enabled and pair withevent_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior. - lite_modeScrape API -boolean
Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.
[ { "content": "<resource>...", "error": null, "status": 200, "duration_elapsed_ms": 122, "costs": { "ai_cost": 0, "compute_cost": 0.00001, "file_cost": 0.00002, "bytes_transferred_cost": 0.00002, "total_cost": 0.00004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Unblocker
Start unblocking challenging website(s) to collect data. You can pass an array of objects for the request body. Cost 10-40 credits additional per success.
https://api.spider.cloud/unblocker- urlUnblocker API -stringrequired
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually. - disable_hintsUnblocker API -boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
If you're tuning filters, keep hints enabled and pair withevent_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior. - lite_modeUnblocker API -boolean
Lite mode reduces data transfer costs by 50%, with trade-offs in speed, accuracy, geo-targeting, and reliability. It’s best suited for non-urgent data collection or when targeting websites with minimal anti-bot protections.
[ { "url": "https://spider.cloud", "status": 200, "cookies": { "a": "something", "b": "something2" }, "headers": { "x-id": 123, "x-cookie": 123 }, "status": 200, "costs": { "ai_cost": 0.001, "ai_cost_formatted": "0.0010", "bytes_transferred_cost": 3.1649999999999997e-9, "bytes_transferred_cost_formatted": "0.0000000031649999999999997240", "compute_cost": 0.0, "compute_cost_formatted": "0", "file_cost": 0.000029291250000000002, "file_cost_formatted": "0.0000292912499999999997868372", "total_cost": 0.0010292944150000001, "total_cost_formatted": "0.0010292944149999999997865612", "transform_cost": 0.0, "transform_cost_formatted": "0" }, "content": "<html>...</html>", "error": null }, // more content... ]
Search
Perform a Google search to gather a list of websites for crawling and resource collection, including fallback options if the query yields no results. You can pass an array of objects for the request body. This endpoint is also available via Proxy-Mode.
https://api.spider.cloud/search- searchSearch API -stringrequired
The search query you want to search for.
- limitSearch API -numberDefault: 0
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0to crawl all pages.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found. - quick_searchSearch API -boolean
Prioritize speed over output quantity.
{ "content": [ { "description": "Visit ESPN for live scores, highlights and sports news. Stream exclusive games on ESPN+ and play fantasy sports.", "title": "ESPN - Serving Sports Fans. Anytime. Anywhere.", "url": "https://www.espn.com/" }, { "description": "Sports Illustrated, SI.com provides sports news, expert analysis, highlights, stats and scores for the NFL, NBA, MLB, NHL, college football, soccer, ...", "title": "Sports Illustrated", "url": "https://www.si.com/" }, { "description": "CBS Sports features live scoring, news, stats, and player info for NFL football, MLB baseball, NBA basketball, NHL hockey, college basketball and football.", "title": "CBS Sports - News, Live Scores, Schedules, Fantasy ...", "url": "https://www.cbssports.com/" }, { "description": "Sport is a form of physical activity or game. Often competitive and organized, sports use, maintain, or improve physical ability and skills.", "title": "Sport", "url": "https://en.wikipedia.org/wiki/Sport" }, { "description": "Watch FOX Sports and view live scores, odds, team news, player news, streams, videos, stats, standings & schedules covering NFL, MLB, NASCAR, WWE, NBA, NHL, ...", "title": "FOX Sports News, Scores, Schedules, Odds, Shows, Streams ...", "url": "https://www.foxsports.com/" }, { "description": "Founded in 1974 by tennis legend, Billie Jean King, the Women's Sports Foundation is dedicated to creating leaders by providing girls access to sports.", "title": "Women's Sports Foundation: Home", "url": "https://www.womenssportsfoundation.org/" }, { "description": "List of sports · Running. Marathon · Sprint · Mascot race · Airsoft · Laser tag · Paintball · Bobsleigh · Jack jumping · Luge · Shovel racing · Card stacking ...", "title": "List of sports", "url": "https://en.wikipedia.org/wiki/List_of_sports" }, { "description": "Stay up-to-date with the latest sports news and scores from NBC Sports.", "title": "NBC Sports - news, scores, stats, rumors, videos, and more", "url": "https://www.nbcsports.com/" }, { "description": "r/sports: Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, and leagues around the world.", "title": "r/sports", "url": "https://www.reddit.com/r/sports/" }, { "description": "The A-Z of sports covered by the BBC Sport team. Find all the latest live sports coverage, breaking news, results, scores, fixtures, tables, ...", "title": "AZ Sport", "url": "https://www.bbc.com/sport/all-sports" } ] }
Links
Start crawling a website(s) to collect links found. You can pass an array of objects for the request body. This endpoint can save on latency if you only need to index the content URLs. Also available via Proxy-Mode.
https://api.spider.cloud/links- urlGet API -stringrequired
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually. - limitGet API -numberDefault: 0
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0to crawl all pages.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found. - disable_hintsGet API -boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
If you're tuning filters, keep hints enabled and pair withevent_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior.
[ { "url": "https://spider.cloud", "status": 200, "duration_elasped_ms": 112 "error": null }, // more content... ]
Screenshot
Take screenshots of a website to base64 or binary encoding. You can pass an array of objects for the request body. This endpoint is also available via Proxy-Mode.
https://api.spider.cloud/screenshot- urlScreenshot API -stringrequired
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually. - limitScreenshot API -numberDefault: 0
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0to crawl all pages.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found. - disable_hintsScreenshot API -boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
If you're tuning filters, keep hints enabled and pair withevent_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior.
[ { "content": "<resource>...", "error": null, "status": 200, "duration_elapsed_ms": 122, "costs": { "ai_cost": 0, "compute_cost": 0.00001, "file_cost": 0.00002, "bytes_transferred_cost": 0.00002, "total_cost": 0.00004, "transform_cost": 0.0001 }, "url": "https://spider.cloud" }, // more content... ]
Transform HTML
Transform HTML into Markdown or plain text quickly. Each HTML transformation starts at 0.1 credits, while PDF transformations can cost up to 10 credits per page. You can submit up to 10 MB of data per request. The Transform API is also integrated into the /crawl endpoint via the return_format parameter.
https://api.spider.cloud/transform- dataTransform API -objectrequired
A list of html data to transform. The object list takes the keys
htmlandurl. The url key is optional and only used when the readability is enabled.
{ "content": [ "# Example Website This is some example markup to use to test the transform function. [Guides](https://spider.cloud/guides)" ], "cost": { "ai_cost": 0, "compute_cost": 0, "file_cost": 0, "bytes_transferred_cost": 0, "total_cost": 0, "transform_cost": 0.0001 }, "error": null, "status": 200 }
Proxy-Mode
Spider also offers a proxy front-end to the service. The Spider proxy will then handle requests just like any standard request, with the option to use high-performance and residential proxies up to 10GB per/s. Take a look at all of our proxy locations to see if we support the country.
Proxy-Mode works with all core endpoints: Crawl, Scrape, Screenshot, Search, and Links. Pass API parameters in the password field to configure rendering, proxies, and more.
**HTTP address**: proxy.spider.cloud:80**HTTPS address**: proxy.spider.cloud:443**Username**: YOUR-API-KEY**Password**: PARAMETERS- •Residential — real-user IPs across 100+ countries. High anonymity, up to 1 GB/s. $1–4/GB
- •ISP — stable datacenter IPs with ISP-grade routing. Highest performance, up to 10 GB/s. $1/GB
- •Mobile — real 4G/5G device IPs for maximum stealth. $2/GB
Use country_code to set geolocation and proxy to select the pool type.
| Proxy Type | Price | Multiplier | Description |
|---|---|---|---|
| residential | $2.00/GB | ×2-x4 | Entry-level residential pool |
| mobile | $2.00/GB | ×2 | 4G/5G mobile proxies for stealth |
| isp | $1.00/GB | ×1 | ISP-grade residential routing |
Browser
Spider Browser is a Rust-based cloud browser for automation, scraping, and AI extraction. Connect via the browser.spider.cloud WebSocket endpoint using any Playwright or Puppeteer compatible client, or use the spider-browser TypeScript library for a higher-level API with built-in AI actions.
**WebSocket endpoint**: wss://browser.spider.cloud/v1/browser**Authentication**: ?token=YOUR-API-KEY**Protocol**: CDP WebDriver BiDi- •AI extraction & actions — extract structured data or perform actions with natural language. Vision models handle complex pages.
- •Stealth & proxies — automatic fingerprint rotation, residential proxies, and a retry engine that recovers sessions on its own.
- •100 concurrent browsers — per user on all plans. Pass
stealth,browser, andcountryquery params to configure each session.
Sessions can be recorded and replayed from the dashboard. See the spider-browser repo for full documentation and examples.
Queries
Query the data that you collect during crawling and scraping. Add dynamic filters for extracting exactly what is needed.
Logs
Get the last 24 hours of logs.
https://api.spider.cloud/data/crawl_logs- urlLogs API -string
Filter a single url record.
- limitLogs API -string
The limit of records to get.
- domainLogs API -string
Filter a single domain record.
- pageLogs API -number
The current page to get.
{ "data": { "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg", "domain": "spider.cloud", "url": "https://spider.cloud", "links": 1, "credits_used": 3, "mode": 2, "crawl_duration": 340, "message": null, "request_user_agent": "Spider", "level": "UI", "status_code": 0, "created_at": "2024-04-21T01:21:32.886863+00:00", "updated_at": "2024-04-21T01:21:32.886863+00:00" }, "error": null }
Credits
Get the remaining credits available.
https://api.spider.cloud/data/credits{ "data": { "id": "8d662167-5a5f-41aa-9cb8-0cbb7d536891", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg", "credits": 53334, "created_at": "2024-04-21T01:21:32.886863+00:00", "updated_at": "2024-04-21T01:21:32.886863+00:00" } }
Scraper Configs Alpha
Browse optimized scraper configs for popular websites. Each config defines extraction rules (selectors, AI prompts, stealth settings, and more) curated for the best results out of the box.
Scraper Directory Alpha
Browse optimized scraper configs for popular websites. Filter by domain, category, or search term. Each config is curated to deliver the best extraction results out of the box. No authentication required.
https://api.spider.cloud/data/scraper-directory- urlScraper API -string
Filter a single url record.
- limitScraper API -string
The limit of records to get.
- domainScraper API -string
Filter a single domain record.
- pageScraper API -number
The current page to get.
{ "data": [ { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "domain": "example.com", "path_pattern": "/blog/*", "display_name": "Example Blog Scraper", "description": "Extracts blog posts with title, author, and content.", "category": "news", "tags": ["blog", "articles"], "confidence_score": 0.95, "validation_count": 12, "slug": "example-com-blog", "created_at": "2025-12-01T10:00:00+00:00", "updated_at": "2026-01-15T08:30:00+00:00" } ], "total": 1, "page": 1, "limit": 20, "total_pages": 1 }
Fetch API Alpha
Per-website scraper endpoints that auto-configure themselves. POST /fetch/{domain}/{path} — AI discovers optimal CSS selectors, extraction schemas, and request settings on the first request, then caches and reuses them for fast, consistent structured data. Full documentation →
https://api.spider.cloud/fetch/example.com/- urlFetch API -stringrequired
The URI resource to crawl. This can be a comma split list for multiple URLs.
To reduce latency, enhance performance, and save on rate limits batch multiple URLs into a single call. For large websites with high page limits, it's best to run requests individually. - limitFetch API -numberDefault: 0
The maximum amount of pages allowed to crawl per website. Remove the value or set it to
0to crawl all pages.It is better to set a limit upfront on websites where you do not know the size. Re-crawling can effectively use cache to keep costs low as new pages are found. - disable_hintsFetch API -boolean
Disables service-provided hints that automatically optimize request types, geo-region selection, and network filters (for example, updating
network_blacklist/network_whitelistrecommendations based on observed request-pattern outcomes). Hints are enabled by default for allsmartrequest modes.Enable this if you want fully manual control over filtering behavior, are debugging request load order/coverage, or need deterministic behavior across runs.
If you're tuning filters, keep hints enabled and pair withevent_trackerto see the complete URL list; once stable, you can flipdisable_hintson to lock behavior.
[ { "url": "https://example.com/", "status": 200, "content": "{\n \"title\": \"Example Domain\",\n \"description\": \"This domain is for use in illustrative examples.\",\n \"links\": [\"https://www.iana.org/domains/example\"]\n}", "error": null, "costs": { "total_cost": 0.001, "total_cost_formatted": "0.0010" } } ]