Turn Any Websites into MCP Servers Easily

If you’re working with Claude, Cursor, or other Model Control Protocol (MCP) clients, you’ve probably faced the challenge of connecting external knowledge sources. SiteMCP solves this problem by letting you fetch entire websites and use them as MCP servers – no complex setup required.

Sitemcp is an open-source TypeScript library – actually a fork of sitefetch by egoist, which is worth noting – that does one thing really well: it fetches an entire website (or parts of it) and spins up a local MCP server based on that content. This means you can directly query the site’s information from your MCP-compatible client.

Features

Website crawling and caching: Automatically explores and stores website content for faster future access
Selectable crawling patterns: Target specific sections of websites using micromatch patterns
Configurable concurrency: Adjust the number of simultaneous connections for faster scraping
Custom content selection: Use CSS selectors to precisely target the content you need
Tool name customization: Configure how the tool appears in your MCP client
Local caching: Stores fetched content locally to avoid redundant downloads
Multiple package manager support: Works with npm, pnpm, and bun

Live Demo

From sitemcp’s GitHub Repo

Use Cases

Querying Documentation: This is the big one for me. Point sitemcp at a framework’s docs (like https://xxx.com/guide/) and ask questions about specific APIs or configurations directly within Cursor or Claude Desktop. No more endless tab switching and searching.
Referencing Component Libraries: Similar to docs, if you’re working with something like DaisyUI (https://daisyui.com/components/), you can quickly ask for examples or usage details for specific components.
Integrating Internal Wikis: If your team has an internal knowledge base accessible via HTTP, you could use sitemcp to make it queryable within your AI tools (assuming no complex authentication is required).
Targeted Research: Need info only from a specific blog section or product category on a website? Use the --match flag to grab just that, creating a focused knowledge source for your AI.

How To Use It

1. Installation. You can run it directly without a global install using npx, bunx, or pnpx:

# Pick one:
npx sitemcp https://example.com
bunx sitemcp https://example.com
pnpx sitemcp https://example.com

Or, install it globally if you plan to use it often:

# Pick one:
npm install -g sitemcp
bun install -g sitemcp
pnpm install -g sitemcp

2. Basic Usage. Point it at the URL you want to fetch:

sitemcp https://scriptbyai.com

This will start fetching the site and make it available as an MCP server.

3. Speeding Things Up (Concurrency). For larger sites, fetching can take a while. You can try increasing the concurrency (default is often low, like 2 or 5):

sitemcp https://scriptbyai.com --concurrency 10

Be mindful not to overload the target server, though.

4. Controlling the Tool Name. By default, sitemcp uses the domain name for the MCP server name (e.g., indexOfScriptByAi, getDocumentOfScriptByAi). You can change this strategy:

# Use domain (default)
sitemcp https://scriptbyai.com -t domain
# Use subdomain: indexOfSubDomain / getDocumentOfSubDomain
sitemcp https://sub-domain.scriptbyai.com/ -t subdomain
# Use first pathname segment: indexOfFreeAi / getDocumentOfFreeAi
sitemcp https://scriptbyai.com/free-ai/ -t pathname

You can also limit the length if needed (some clients might truncate long names):

sitemcp https://scriptbyai.com -l 100 # Max 100 chars for tool name

5. Fetching Specific Pages. This is useful for large sites. Use the --match flag (you can use it multiple times) with micromatch patterns:

# Fetch only blog and tools sections from scriptbyai.com
sitemcp https://scriptbyai.com -m "/blog/**" -m "/tools/**"

Check the micromatch docs for pattern details. This tests against the URL’s pathname.

6. Refining Content Extraction. Sometimes, the automatic content extraction grabs sidebars or footers. If you inspect the page and find the main content lives inside an element with a specific class or ID (e.g., <main class="content">), you can tell sitemcp to look there:

sitemcp https://scriptbyai.com --content-selector ".main"

7. Configuring Your MCP Client. The key is to tell your MCP client how to run sitemcp. Here’s how you might configure Claude Desktop (the specific JSON structure might vary slightly for other clients like Cursor or Windsurf):

{
  "mcpServers": {
    "daisy-ui-components": {
      "command": "npx", // Or bunx, pnpx, or the direct path if globally installed
      "args": [
        "-y", // Tells npx to execute without confirmation prompt
        "sitemcp",
        "https://daisyui.com",
        "-m", // Only fetch component pages
        "/components/**",
        "--concurrency", // Use higher concurrency
        "8"
      ]
    }
  }
}

Important Tip: Fetching a large site can take time. It’s often better to run sitemcp once manually in your terminal with the desired flags (sitemcp https://yoursite.com -m "/docs/**"). Let it run and build the cache (~/.cache/sitemcp). Then, configure your MCP client to run the same command. The client will start the server much faster because the content is already cached.

Pros

No API key requirements – Works with any public website
Simple integration – Easy to set up with popular MCP clients
Selective crawling – Fetch only the content you need
Smart caching – Speeds up repeated access to the same sites
Customizable output – Control how content is presented to AI tools
Open-source foundation – Built on proven web crawling technology

Cons

Performance limitations – Large sites can take time to fetch initially
Content extraction quality varies – Some websites may not render correctly
No automatic updates – You must manually refresh the cache for new content
Limited by website structure – Heavily JavaScript-dependent sites may not work well
Client compatibility – Works only with MCP-compatible AI tools

Related Resources

Mozilla Readability – The content extraction library used by SiteMCP
Micromatch – Pattern matching library used for page selection
sitefetch – The original project that SiteMCP forked from
Claude Desktop – A compatible MCP client
What is Model Control Protocol (MCP) – Learn more about the underlying protocol
Curated MCP servers – A directory of curated & open-source Model Context Protocol servers.

FAQs

Q: How is sitemcp different from just scraping a site with Python?
A: While both fetch web content, sitemcp is specifically designed to serve that content via the MCP protocol, making it directly usable by compatible AI clients. It also includes features tailored for this, like tool naming and integration commands, plus the use of Readability for cleaner content extraction focused on the main text. It’s less about raw data extraction and more about creating a queryable knowledge source for AI.

Q: Does it work with sites that heavily rely on JavaScript to render content?
A: It might struggle. sitemcp primarily fetches the HTML source and then uses Readability. If critical content is only rendered after complex JavaScript execution in the browser, sitemcp might not capture it accurately. It works best with sites where the main content is present in the initial HTML or rendered fairly simply.

Q: Where exactly is the cache stored, and can I clear it?
A: By default, it’s in ~/.cache/sitemcp on Linux/macOS (likely C:\Users\YourUser\AppData\Local\sitemcp\Cache on Windows, though check the tool’s output). You can simply delete this directory to clear the cache. You can also run sitemcp with the --no-cache flag to bypass caching for a specific run.

Q: Can I use this for sites that require login/authentication?
A: Generally, no. sitemcp fetches publicly accessible URLs. It doesn’t include mechanisms for handling login forms, cookies, or authentication headers out of the box. It’s best suited for public documentation, blogs, or websites.