Ultimate Block List to Stop AI Bots
More than you might think, AI (Artificial Intelligence) and ML (Machine Learning) bots are crawling your site and scraping your content. They are collecting and using your data to train software like ChatGPT, OpenAI, DeepSeek, and thousands of other AI creations. Whether you or anyone approves of all this is not my concern for this post. The focus of this post is aimed at website owners who want to stop AI bots from crawling their web pages, as much as possible. To help people with this, I’ve been collecting data and researching AI bots for many months now, and have put together a “Mega Block List” to help stop AI bots from devouring your content.
The ultimate block list for stopping AI bots from crawling your site.
Contents
- Block AI Bots via BBQ Pro
- Block AI Bots via robots.txt
- Block AI Bots via Apache/.htaccess
- Block AI Bots via Nginx
- Download plain-text list
- Notes
- Changelog
- Disclaimer
- Show Support
- References
- Feedback
If you can edit a file, you can block a ton of AI bots.
Block AI Bots via BBQ Pro
🔥 Users of my BBQ Pro firewall plugin can add the Ultimate AI Block List with just a few clicks »
Block AI Bots via robots.txt
The easiest way for most website owners to block AI bots, is to append the following list to their site’s robots.txt file. There are many resources explaining the robots.txt file, and I encourage anyone not familiar to take a few moments to learn more.
In a nutshell, the robots.txt file is a file that contains rules for bots to obey. So you can add rules that limit where bots can crawl, whether individual pages or the entire site. Once you have added some rules, simply upload the robots file to the public root directory of your website. For example, here is my robots.txt for Perishable Press.
To block AI bots via your site’s robots.txt file, append the following rules. Understand that bots are not required to obey robots.txt rules. Robots rules are merely suggestions. Good bots will follow the rules, bad bots will ignore the rules and do whatever they want. To force compliance, you can add blocking rules via Apache/.htaccess. That in mind, here are the robots rules to block AI bots..
Blocks over 600+ AI bots and user agents.
Block list for robots.txt
Before using, read the Notes and Disclaimer.
# Ultimate AI Block List v1.7 20250924
# https://perishablepress.com/ultimate-ai-block-list/
# Allow all other bots full access
User-agent: *
Disallow:
# Block AI bots from all access
User-agent: .ai
User-agent: -ai
User-agent: _ai
User-agent: ai.
User-agent: ai-
User-agent: ai_
User-agent: ai=
User-agent: AddSearchBot
User-agent: Agentic
User-agent: AgentQL
User-agent: Agent 3
User-agent: Agent API
User-agent: AI Agent
User-agent: AI Article Writer
User-agent: AI Chat
User-agent: AI Content Detector
User-agent: AI Detection
User-agent: AI Dungeon
User-agent: AI Journalist
User-agent: AI Legion
User-agent: AI RAG
User-agent: AI Search
User-agent: AI SEO Crawler
User-agent: AI Training
User-agent: AI Web
User-agent: AI Writer
User-agent: AI2
User-agent: AIBot
User-agent: aiHitBot
User-agent: AIMatrix
User-agent: AISearch
User-agent: AITraining
User-agent: Alexa
User-agent: Alice Yandex
User-agent: AliGenie
User-agent: AliyunSec
User-agent: Alpha AI
User-agent: AlphaAI
User-agent: Amazon
User-agent: Amelia
User-agent: AndersPinkBot
User-agent: AndiBot
User-agent: Anonymous AI
User-agent: Anthropic
User-agent: AnyPicker
User-agent: Anyword
User-agent: Applebot
User-agent: Aria AI
User-agent: Aria Browse
User-agent: Articoolo
User-agent: Ask AI
User-agent: AutoGen
User-agent: AutoGLM
User-agent: Automated Writer
User-agent: AutoML
User-agent: Autonomous RAG
User-agent: AwarioRssBot
User-agent: AwarioSmartBot
User-agent: AWS Trainium
User-agent: Azure
User-agent: BabyAGI
User-agent: BabyCatAGI
User-agent: BardBot
User-agent: Basic RAG
User-agent: Bedrock
User-agent: Big Sur
User-agent: Bigsur
User-agent: Botsonic
User-agent: Brightbot
User-agent: Browser MCP Agent
User-agent: Browser Use
User-agent: Bytebot
User-agent: ByteDance
User-agent: Bytespider
User-agent: CarynAI
User-agent: CatBoost
User-agent: CC-Crawler
User-agent: CCBot
User-agent: Chai
User-agent: Character
User-agent: Charstar AI
User-agent: Chatbot
User-agent: ChatGLM
User-agent: Chatsonic
User-agent: ChatUser
User-agent: Chinchilla
User-agent: Claude
User-agent: ClearScope
User-agent: Clearview
User-agent: Cognitive AI
User-agent: Cohere
User-agent: Common Crawl
User-agent: CommonCrawl
User-agent: Content Harmony
User-agent: Content King
User-agent: Content Optimizer
User-agent: Content Samurai
User-agent: ContentAtScale
User-agent: ContentBot
User-agent: Contentedge
User-agent: ContentShake
User-agent: Conversion AI
User-agent: Copilot
User-agent: CopyAI
User-agent: Copymatic
User-agent: Copyscape
User-agent: CoreWeave
User-agent: Corrective RAG
User-agent: Cotoyogi
User-agent: CRAB
User-agent: Crawl4AI
User-agent: CrawlQ AI
User-agent: Crawlspace
User-agent: Crew AI
User-agent: CrewAI
User-agent: Crushon AI
User-agent: DALL-E
User-agent: DarkBard
User-agent: DataFor
User-agent: DataProvider
User-agent: Datenbank Crawler
User-agent: DeepAI
User-agent: Deep AI
User-agent: DeepL
User-agent: DeepMind
User-agent: Deep Research
User-agent: DeepResearch
User-agent: DeepSeek
User-agent: Devin
User-agent: Diffbot
User-agent: Doubao AI
User-agent: DuckAssistBot
User-agent: DuckDuckGo Chat
User-agent: DuckDuckGo-Enhanced
User-agent: Echobot
User-agent: Echobox
User-agent: Elixir
User-agent: FacebookBot
User-agent: FacebookExternalHit
User-agent: Factset
User-agent: Falcon
User-agent: FIRE-1
User-agent: Firebase
User-agent: Firecrawl
User-agent: Flux
User-agent: Flyriver
User-agent: Frase AI
User-agent: FriendlyCrawler
User-agent: Gato
User-agent: Gemini
User-agent: Gemma
User-agent: Gen AI
User-agent: GenAI
User-agent: Generative
User-agent: Genspark
User-agent: Gentoo-chat
User-agent: Ghostwriter
User-agent: GigaChat
User-agent: GLM
User-agent: GodMode
User-agent: Goose
User-agent: GPT
User-agent: Grammarly
User-agent: Grendizer
User-agent: Grok
User-agent: GT Bot
User-agent: GTBot
User-agent: GTP
User-agent: Hemingway Editor
User-agent: Hetzner
User-agent: Hugging
User-agent: Hunyuan
User-agent: Hybrid Search RAG
User-agent: Hypotenuse AI
User-agent: iAsk
User-agent: ICC-Crawler
User-agent: ImageGen
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: imgproxy
User-agent: INK Editor
User-agent: INKforall
User-agent: Instructor
User-agent: IntelliSeek
User-agent: Inferkit
User-agent: ISSCyberRiskCrawler
User-agent: Janitor AI
User-agent: Jasper
User-agent: Jenni AI
User-agent: Julius AI
User-agent: Kafkai
User-agent: Kaggle
User-agent: Kangaroo
User-agent: Keyword Density AI
User-agent: Kimi
User-agent: Knowledge
User-agent: KomoBot
User-agent: Kruti
User-agent: LangChain
User-agent: Le Chat
User-agent: Lensa
User-agent: Lightpanda
User-agent: LinerBot
User-agent: LLaMA
User-agent: LLM
User-agent: Local RAG Agent
User-agent: Lovable
User-agent: Magistral
User-agent: magpie-crawler
User-agent: Manus
User-agent: MarketMuse
User-agent: Meltwater
User-agent: Meta-AI
User-agent: Meta-External
User-agent: Meta-Webindexer
User-agent: Meta AI
User-agent: MetaAI
User-agent: MetaTagBot
User-agent: Middleware
User-agent: Midjourney
User-agent: Mini AGI
User-agent: MiniMax
User-agent: Mintlify
User-agent: Mistral
User-agent: Mixtral
User-agent: model-training
User-agent: Monica
User-agent: Narrative
User-agent: NeevaBot
User-agent: netEstate
User-agent: Neural Text
User-agent: NeuralSEO
User-agent: NinjaAI
User-agent: NodeZero
User-agent: Nova Act
User-agent: NovaAct
User-agent: OAI-SearchBot
User-agent: OAI SearchBot
User-agent: OASIS
User-agent: Olivia
User-agent: Omgili
User-agent: Open AI
User-agent: Open Interpreter
User-agent: OpenAGI
User-agent: OpenAI
User-agent: OpenBot
User-agent: OpenPi
User-agent: OpenRouter
User-agent: OpenText AI
User-agent: Operator
User-agent: Outwrite
User-agent: Page Analyzer AI
User-agent: PanguBot
User-agent: Panscient
User-agent: Paperlibot
User-agent: Paraphraser.io
User-agent: peer39_crawler
User-agent: Perflexity
User-agent: Perplexity
User-agent: Petal
User-agent: Phind
User-agent: PiplBot
User-agent: PoeBot
User-agent: PoeSearchBot
User-agent: ProWritingAid
User-agent: Proximic
User-agent: Puppeteer
User-agent: Python AI
User-agent: Qualified
User-agent: Quark
User-agent: QuillBot
User-agent: Qopywriter
User-agent: Qwen
User-agent: RAG Agent
User-agent: RAG Azure AI
User-agent: RAG Chatbot
User-agent: RAG Database
User-agent: RAG IS
User-agent: RAG Pipeline
User-agent: RAG Search
User-agent: RAG with
User-agent: RAG-
User-agent: RAG_
User-agent: Raptor
User-agent: React Agent
User-agent: Redis AI RAG
User-agent: RobotSpider
User-agent: Rytr
User-agent: SaplingAI
User-agent: SBIntuitionsBot
User-agent: Scala
User-agent: Scalenut
User-agent: Scrap
User-agent: ScriptBook
User-agent: Seekr
User-agent: SEObot
User-agent: SEO Content Machine
User-agent: SEO Robot
User-agent: SemrushBot
User-agent: Sentibot
User-agent: Serper
User-agent: ShapBot
User-agent: Sidetrade
User-agent: Simplified AI
User-agent: Sitefinity
User-agent: Skydancer
User-agent: SlickWrite
User-agent: SmartBot
User-agent: Sonic
User-agent: Sora
User-agent: Spider/2
User-agent: SpiderCreator
User-agent: Spin Rewrite
User-agent: Spinbot
User-agent: Stability
User-agent: StableDiffusionBot
User-agent: Sudowrite
User-agent: SummalyBot
User-agent: Super Agent
User-agent: Superagent
User-agent: SuperAGI
User-agent: Surfer AI
User-agent: TerraCotta
User-agent: Text Blaze
User-agent: TextCortex
User-agent: Thinkbot
User-agent: Thordata
User-agent: TikTokSpider
User-agent: Timpibot
User-agent: Tinybird
User-agent: Together AI
User-agent: Traefik
User-agent: TurnitinBot
User-agent: uAgents
User-agent: VelenPublicWebCrawler
User-agent: Venus Chub AI
User-agent: Vidnami AI
User-agent: Vision RAG
User-agent: WebSurfer
User-agent: WebText
User-agent: Webzio
User-agent: WeChat
User-agent: Whisper
User-agent: WordAI
User-agent: Wordtune
User-agent: WPBot
User-agent: Writecream
User-agent: WriterZen
User-agent: Writescope
User-agent: Writesonic
User-agent: xAI
User-agent: xBot
User-agent: YaML
User-agent: YandexAdditional
User-agent: YouBot
User-agent: Zendesk
User-agent: Zero
User-agent: Zhipu
User-agent: Zhuque AI
User-agent: Zimm
Disallow: /
Block AI Bots via Apache/.htaccess
To actually enforce the “Ultimate AI Block List”, you can add the following rules to your Apache configuration or main .htaccess file. Like many others, I’ve written extensively on Apache and .htaccess. So if you’re unfamiliar, there are plenty of great resources, including my book .htaccess made easy.
In a nutshell, you can add rules via Apache/.htaccess to customize the functionality of your website. For example, you can add directives that help control traffic, optimize caching, improve performance, and even block bad bots. And these rules operate at the server level. So while bots may ignore rules added via robots.txt, they can’t ignore rules added via Apache/.htaccess (unless they falsify their user agent).
To block AI bots via Apache/.htaccess, add the following rules to either your server configuration file, or add to the main (public root) .htaccess file. Before making any changes, be on the safe side and make a backup of your files. Just in case something unexpected happens, you can easily roll back. That in mind, here are the Apache rules to block AI bots..
Blocks over 600+ AI bots and user agents.
Block list for Apache/.htaccess
Before using, read the Notes and Disclaimer.
# Ultimate AI Block List v1.7 20250924
# https://perishablepress.com/ultimate-ai-block-list/
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (\.ai|-ai|_ai|ai\.|ai-|ai_|ai=|AddSearchBot|Agentic|AgentQL|Agent\ 3|Agent\ API|AI\ Agent|AI\ Article\ Writer|AI\ Chat|AI\ Content\ Detector|AI\ Detection|AI\ Dungeon|AI\ Journalist|AI\ Legion) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AI\ RAG|AI\ Search|AI\ SEO\ Crawler|AI\ Training|AI\ Web|AI\ Writer|AI2|AIBot|aiHitBot|AIMatrix|AISearch|AITraining|Alexa|Alice\ Yandex|AliGenie|AliyunSec|Alpha\ AI|AlphaAI|Amazon|Amelia) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AndersPinkBot|AndiBot|Anonymous\ AI|Anthropic|AnyPicker|Anyword|Applebot|Aria\ AI|Aria\ Browse|Articoolo|Ask\ AI|AutoGen|AutoGLM|Automated\ Writer|AutoML|Autonomous\ RAG|AwarioRssBot|AwarioSmartBot|AWS\ Trainium|Azure) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (BabyAGI|BabyCatAGI|BardBot|Basic\ RAG|Bedrock|Big\ Sur|Bigsur|Botsonic|Brightbot|Browser\ MCP\ Agent|Browser\ Use|Bytebot|ByteDance|Bytespider|CarynAI|CatBoost|CC-Crawler|CCBot|Chai|Character) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Charstar\ AI|Chatbot|ChatGLM|Chatsonic|ChatUser|Chinchilla|Claude|ClearScope|Clearview|Cognitive\ AI|Cohere|Common\ Crawl|CommonCrawl|Content\ Harmony|Content\ King|Content\ Optimizer|Content\ Samurai|ContentAtScale|ContentBot|Contentedge) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (ContentShake|Conversion\ AI|Copilot|CopyAI|Copymatic|Copyscape|CoreWeave|Corrective\ RAG|Cotoyogi|CRAB|Crawl4AI|CrawlQ\ AI|Crawlspace|Crew\ AI|CrewAI|Crushon\ AI|DALL-E|DarkBard|DataFor|DataProvider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Datenbank\ Crawler|DeepAI|Deep\ AI|DeepL|DeepMind|Deep\ Research|DeepResearch|DeepSeek|Devin|Diffbot|Doubao\ AI|DuckAssistBot|DuckDuckGo\ Chat|DuckDuckGo-Enhanced|Echobot|Echobox|Elixir|FacebookBot|FacebookExternalHit|Factset) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Falcon|FIRE-1|Firebase|Firecrawl|Flux|Flyriver|Frase\ AI|FriendlyCrawler|Gato|Gemini|Gemma|Gen\ AI|GenAI|Generative|Genspark|Gentoo-chat|Ghostwriter|GigaChat|GLM|GodMode) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Goose|GPT|Grammarly|Grendizer|Grok|GT\ Bot|GTBot|GTP|Hemingway\ Editor|Hetzner|Hugging|Hunyuan|Hybrid\ Search\ RAG|Hypotenuse\ AI|iAsk|ICC-Crawler|ImageGen|ImagesiftBot|img2dataset|imgproxy) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (INK\ Editor|INKforall|Instructor|IntelliSeek|Inferkit|ISSCyberRiskCrawler|Janitor\ AI|Jasper|Jenni\ AI|Julius\ AI|Kafkai|Kaggle|Kangaroo|Keyword\ Density\ AI|Kimi|Knowledge|KomoBot|Kruti|LangChain|Le\ Chat) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Lensa|Lightpanda|LinerBot|LLaMA|LLM|Local\ RAG\ Agent|Lovable|Magistral|magpie-crawler|Manus|MarketMuse|Meltwater|Meta-AI|Meta-External|Meta-Webindexer|Meta\ AI|MetaAI|MetaTagBot|Middleware|Midjourney) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Mini\ AGI|MiniMax|Mintlify|Mistral|Mixtral|model-training|Monica|Narrative|NeevaBot|netEstate|Neural\ Text|NeuralSEO|NinjaAI|NodeZero|Nova\ Act|NovaAct|OAI-SearchBot|OAI\ SearchBot|OASIS|Olivia) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Omgili|Open\ AI|Open\ Interpreter|OpenAGI|OpenAI|OpenBot|OpenPi|OpenRouter|OpenText\ AI|Operator|Outwrite|Page\ Analyzer\ AI|PanguBot|Panscient|Paperlibot|Paraphraser\.io|peer39_crawler|Perflexity|Perplexity|Petal) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Phind|PiplBot|PoeBot|PoeSearchBot|ProWritingAid|Proximic|Puppeteer|Python\ AI|Qualified|Quark|QuillBot|Qopywriter|Qwen|RAG\ Agent|RAG\ Azure\ AI|RAG\ Chatbot|RAG\ Database|RAG\ IS|RAG\ Pipeline|RAG\ Search) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (RAG\ with|RAG-|RAG_|Raptor|React\ Agent|Redis\ AI\ RAG|RobotSpider|Rytr|SaplingAI|SBIntuitionsBot|Scala|Scalenut|Scrap|ScriptBook|Seekr|SEObot|SEO\ Content\ Machine|SEO\ Robot|SemrushBot|Sentibot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Serper|ShapBot|Sidetrade|Simplified\ AI|Sitefinity|Skydancer|SlickWrite|SmartBot|Sonic|Sora|Spider/2|SpiderCreator|Spin\ Rewrite|Spinbot|Stability|StableDiffusionBot|Sudowrite|SummalyBot|Super\ Agent|Superagent) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (SuperAGI|Surfer\ AI|TerraCotta|Text\ Blaze|TextCortex|Thinkbot|Thordata|TikTokSpider|Timpibot|Tinybird|Together\ AI|Traefik|TurnitinBot|uAgents|VelenPublicWebCrawler|Venus\ Chub\ AI|Vidnami\ AI|Vision\ RAG|WebSurfer|WebText) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Webzio|WeChat|Whisper|WordAI|Wordtune|WPBot|Writecream|WriterZen|Writescope|Writesonic|xAI|xBot|YaML|YandexAdditional|YouBot|Zendesk|Zero|Zhipu|Zhuque\ AI|Zimm) [NC]
RewriteRule (.*) - [F,L]
</IfModule>
Block AI Bots via Nginx
As with the previous Apache rules, you can go beyond robots.txt and actually enforce the “Ultimate AI Block List”, by adding the following rules to your Nginx configuration file. Like Apache, Nginx directives operate at the server level. So while bots may ignore rules added via robots.txt, they can’t ignore rules added via Nginx configuration (unless they falsify their user agent).
To block AI bots via Nginx, add the following rules to your main server configuration file. Before making any changes, be on the safe side and make a backup of your files. Just in case something unexpected happens, you can easily roll back. That in mind, here are the Nginx-formatted rules to block AI bots..
Blocks over 600+ AI bots and user agents.
Block list for Nginx
Before using, read the Notes and Disclaimer.
# Ultimate AI Block List v1.7 20250924
# https://perishablepress.com/ultimate-ai-block-list/
if ($http_user_agent ~* "(\.ai|-ai|_ai|ai\.|ai-|ai_|ai=|AddSearchBot|Agentic|AgentQL|Agent\ 3|Agent\ API|AI\ Agent|AI\ Article\ Writer|AI\ Chat|AI\ Content\ Detector|AI\ Detection|AI\ Dungeon|AI\ Journalist|AI\ Legion)") {
return 403;
}
if ($http_user_agent ~* "(AI\ RAG|AI\ Search|AI\ SEO\ Crawler|AI\ Training|AI\ Web|AI\ Writer|AI2|AIBot|aiHitBot|AIMatrix|AISearch|AITraining|Alexa|Alice\ Yandex|AliGenie|AliyunSec|Alpha\ AI|AlphaAI|Amazon|Amelia)") {
return 403;
}
if ($http_user_agent ~* "(AndersPinkBot|AndiBot|Anonymous\ AI|Anthropic|AnyPicker|Anyword|Applebot|Aria\ AI|Aria\ Browse|Articoolo|Ask\ AI|AutoGen|AutoGLM|Automated\ Writer|AutoML|Autonomous\ RAG|AwarioRssBot|AwarioSmartBot|AWS\ Trainium|Azure)") {
return 403;
}
if ($http_user_agent ~* "(BabyAGI|BabyCatAGI|BardBot|Basic\ RAG|Bedrock|Big\ Sur|Bigsur|Botsonic|Brightbot|Browser\ MCP\ Agent|Browser\ Use|Bytebot|ByteDance|Bytespider|CarynAI|CatBoost|CC-Crawler|CCBot|Chai|Character)") {
return 403;
}
if ($http_user_agent ~* "(Charstar\ AI|Chatbot|ChatGLM|Chatsonic|ChatUser|Chinchilla|Claude|ClearScope|Clearview|Cognitive\ AI|Cohere|Common\ Crawl|CommonCrawl|Content\ Harmony|Content\ King|Content\ Optimizer|Content\ Samurai|ContentAtScale|ContentBot|Contentedge)") {
return 403;
}
if ($http_user_agent ~* "(ContentShake|Conversion\ AI|Copilot|CopyAI|Copymatic|Copyscape|CoreWeave|Corrective\ RAG|Cotoyogi|CRAB|Crawl4AI|CrawlQ\ AI|Crawlspace|Crew\ AI|CrewAI|Crushon\ AI|DALL-E|DarkBard|DataFor|DataProvider)") {
return 403;
}
if ($http_user_agent ~* "(Datenbank\ Crawler|DeepAI|Deep\ AI|DeepL|DeepMind|Deep\ Research|DeepResearch|DeepSeek|Devin|Diffbot|Doubao\ AI|DuckAssistBot|DuckDuckGo\ Chat|DuckDuckGo-Enhanced|Echobot|Echobox|Elixir|FacebookBot|FacebookExternalHit|Factset)") {
return 403;
}
if ($http_user_agent ~* "(Falcon|FIRE-1|Firebase|Firecrawl|Flux|Flyriver|Frase\ AI|FriendlyCrawler|Gato|Gemini|Gemma|Gen\ AI|GenAI|Generative|Genspark|Gentoo-chat|Ghostwriter|GigaChat|GLM|GodMode)") {
return 403;
}
if ($http_user_agent ~* "(Goose|GPT|Grammarly|Grendizer|Grok|GT\ Bot|GTBot|GTP|Hemingway\ Editor|Hetzner|Hugging|Hunyuan|Hybrid\ Search\ RAG|Hypotenuse\ AI|iAsk|ICC-Crawler|ImageGen|ImagesiftBot|img2dataset|imgproxy)") {
return 403;
}
if ($http_user_agent ~* "(INK\ Editor|INKforall|Instructor|IntelliSeek|Inferkit|ISSCyberRiskCrawler|Janitor\ AI|Jasper|Jenni\ AI|Julius\ AI|Kafkai|Kaggle|Kangaroo|Keyword\ Density\ AI|Kimi|Knowledge|KomoBot|Kruti|LangChain|Le\ Chat)") {
return 403;
}
if ($http_user_agent ~* "(Lensa|Lightpanda|LinerBot|LLaMA|LLM|Local\ RAG\ Agent|Lovable|Magistral|magpie-crawler|Manus|MarketMuse|Meltwater|Meta-AI|Meta-External|Meta-Webindexer|Meta\ AI|MetaAI|MetaTagBot|Middleware|Midjourney)") {
return 403;
}
if ($http_user_agent ~* "(Mini\ AGI|MiniMax|Mintlify|Mistral|Mixtral|model-training|Monica|Narrative|NeevaBot|netEstate|Neural\ Text|NeuralSEO|NinjaAI|NodeZero|Nova\ Act|NovaAct|OAI-SearchBot|OAI\ SearchBot|OASIS|Olivia)") {
return 403;
}
if ($http_user_agent ~* "(Omgili|Open\ AI|Open\ Interpreter|OpenAGI|OpenAI|OpenBot|OpenPi|OpenRouter|OpenText\ AI|Operator|Outwrite|Page\ Analyzer\ AI|PanguBot|Panscient|Paperlibot|Paraphraser\.io|peer39_crawler|Perflexity|Perplexity|Petal)") {
return 403;
}
if ($http_user_agent ~* "(Phind|PiplBot|PoeBot|PoeSearchBot|ProWritingAid|Proximic|Puppeteer|Python\ AI|Qualified|Quark|QuillBot|Qopywriter|Qwen|RAG\ Agent|RAG\ Azure\ AI|RAG\ Chatbot|RAG\ Database|RAG\ IS|RAG\ Pipeline|RAG\ Search)") {
return 403;
}
if ($http_user_agent ~* "(RAG\ with|RAG-|RAG_|Raptor|React\ Agent|Redis\ AI\ RAG|RobotSpider|Rytr|SaplingAI|SBIntuitionsBot|Scala|Scalenut|Scrap|ScriptBook|Seekr|SEObot|SEO\ Content\ Machine|SEO\ Robot|SemrushBot|Sentibot)") {
return 403;
}
if ($http_user_agent ~* "(Serper|ShapBot|Sidetrade|Simplified\ AI|Sitefinity|Skydancer|SlickWrite|SmartBot|Sonic|Sora|Spider/2|SpiderCreator|Spin\ Rewrite|Spinbot|Stability|StableDiffusionBot|Sudowrite|SummalyBot|Super\ Agent|Superagent)") {
return 403;
}
if ($http_user_agent ~* "(SuperAGI|Surfer\ AI|TerraCotta|Text\ Blaze|TextCortex|Thinkbot|Thordata|TikTokSpider|Timpibot|Tinybird|Together\ AI|Traefik|TurnitinBot|uAgents|VelenPublicWebCrawler|Venus\ Chub\ AI|Vidnami\ AI|Vision\ RAG|WebSurfer|WebText)") {
return 403;
}
if ($http_user_agent ~* "(Webzio|WeChat|Whisper|WordAI|Wordtune|WPBot|Writecream|WriterZen|Writescope|Writesonic|xAI|xBot|YaML|YandexAdditional|YouBot|Zendesk|Zero|Zhipu|Zhuque\ AI|Zimm)") {
return 403;
}
Download plain-text list
Here is a plain-text version of the list. This list contains only the user-agent names, nothing else :)
Notes
Note: The four block lists above (robots.txt, Apache, Nginx, and Plain Text) are synchronized and include/block the same set of AI bots.
Note: Numerous user agents are omitted from the block lists because the names are matched in wild-card fashion. Here is a list showing wild-card blocked AI bots.
Note: The block lists focus on AI-related bots. Some of those bots are used by giant corporations like Apple, Amazon, and Facebook. So please keep this in mind and feel free to remove any bots that you think should be allowed access to your site. Also be sure to check the list of wild-card blocked AI bots.
Note: Each of the differently formatted block lists are case-insensitive. The robots.txt rules are case-insensitive by default, the Apache rules are case-insensitive due to the inclusion of the [NC] flag, and the Nginx rules are case-insensitive due to the tilde and asterisk ~*. So don’t worry about mixed-case bot names, their user agents will be blocked, whether uppercase, lowercase, or mIxeD cAsE.
Note: If you don’t care about search results, you can add the following rules to your robots.txt file. All search-related AI bots were removed from the block list in version 1.4 (see changelog below). Include the following only if you want to block major search engines like Bing, Google, and DuckDuckGo.
User-agent: Applebot
User-agent: BingAI
User-agent: Bingbot-chat
User-agent: Duck
User-agent: Google Bard AI
User-agent: Google-CloudVertexBot
User-agent: Google-Extended
User-agent: Google Gemini
User-agent: GoogleOther
User-agent: MSBot
Changelog
Tip: The changelog below features the main changes like an overview, it does not chronicle every change to patterns, bots, details, and so forth. You can use a free online tool to compare block lists with previous versions, so you can figure out which bots have been added, removed, and other changes, etc.
v1.7 – 2025/09/24
- Removes Apple, adds Applebot
- Removes Brave Leo (Brave Browser Search)
- Adds 65 new patterns, matching many more user agents
- Streamlines numerous patterns, removes some patterns
- Improves pattern matching for more accurate blocking and better performance
- Updates the wild-card blocked AI bots plain-text list
v1.6 – 2025/07/18
- Adds many new AI bots
- Makes better use of wildcard matching to streamline the list while keeping false positives near zero
- Adds robots.txt rules to allow all other bots full access
- Improves alphabetization and formatting
- Replaces AI2Bot and AI2 Labs with AI2
- Replaces all specific Amazon bot names with Amazon
- Replaces Applebot with Apple
- Replaces LLMs with LLM
- Replaces WormsGTP with GTP
- Replaces Zero GTP and Zerochat with Zero
v1.5 – 2025/06/03
- Adds Applebot
- Changes Phindbot to Phind
- Changes AI Search Engine to AI Search
- Adds: AI Chat, ai-proxy, aiHitBot, AndiBot, AutoGLM, AutoML, BabyAGI, Brightbot, chatbot, Factset, imgproxy, Lensa, Lightpanda, Manus, Monica, NovaAct, Puppeteer, Qualified, Qwen, SemrushBot, TikTokSpider, Traefik, VelenPublicWebCrawler, YaML
v1.4 – 2025/04/17
- Removes Applebot
- Removes all 2 Bing agents
- Removes all 4 Google agents
- Changes PerplexityBot to Perplexity
- Adds: Azure, Falcon, Genspark, GLM, ImageGen, Knowledge, LLMs, Nova Act, Operator, Sitefinity, Sonic, Super Agent, Zhipu
- To restore blocking of search-related AI bots, check the notes above for a list
Previous versions
- Version 1.3 – 2025/03/10 – Adds more AI bots, refines list to make better use of wild-card pattern matching of user-agent names.
- Version 1.2 – 2025/02/12 – Adds 73 AI bots (Thanks to Robert DeVore)
- Version 1.1 – 2025/02/11 – Replaces
REQUEST_URIwithHTTP_USER_AGENT - Version 1.0 – 2025/02/11 – Initial release.
Disclaimer
The information shared on this page is provided “as-is”, with the intention of helping people protect their sites against AI bots. The two block lists (robots.txt and Apache/.htaccess) are open-source and free to use and modify without condition. By using either block list, you assume all risk and responsibility for anything that happens. So use wisely, test thoroughly, and enjoy the benefits of my work :)
Support my work
I spend countless hours digging through server logs, researching user agents, and compiling block lists to stop AI and other unwanted bots. I share my work freely with the hope that it will help make the Web a more secure place for everyone.
If you benefit from my work and want to show support, please make a donation or buy one of my books, such as .htaccess made easy. You’ll get a complete guide to .htaccess and a ton of awesome techniques for optimizing and securing your site.
Of course, tweets, likes, links, and shares also are super helpful and very much appreciated. Your generous support enables me to continue developing AI block lists and other awesome resources for the community. Thank you kindly :)
References
Thanks to the following resources for sharing their work with identifying and blocking AI bots.
- Dark Visitors ▸ Agents
- Blockin’ bots.
- Block the Bots that Feed AI Models by Scraping Your Website
- Go ahead and block AI web crawlers
- I’m blocking AI-crawlers
- GitHub ▸ Tina Ponting’s AI Robots + Scrapers
- GitHub ▸ Robert DeVore’s Block AI Crawlers
- GitHub ▸ ai.robots.txt
- Overview of OpenAI Crawlers
- How to stop your data from being used for AI training
- How to Block OpenAI ChatGPT From Using Your Website Content
- AI haters build tarpits to trap and trick AI scrapers
- Block AI Bots from Crawling Websites Using Robots.txt
- Blocking AI web crawlers
- Understanding the Bots Blocked by AI Scrape Protect
- Understanding and Blocking Abusive AI Website Crawlers
- Abusive IP Database
- Discover AI applications
- Perishable Press ▸ How to Block Bad Bots
- Perishable Press ▸ Apache Archive
- Perishable Press ▸ .htaccess Archive
- Perishable Press ▸ Blacklist Archive
- Perishable Press ▸ Bots Archive
- Perishable Press ▸ nG Firewall Archive
Feedback
Got more? Leave a comment below with your favorite AI bots to block. Or send privately via my contact form. Cheers! :)
50 responses to “Ultimate Block List to Stop AI Bots”
The User-Agent string “mai-IN” indicates that the request is coming from a mobile device in India. The “mai” likely refers to a mobile application, while “IN” specifies the country code for India. User-Agent strings are used by websites to identify the browser, operating system, and device type of the user, allowing them to tailor the content and layout accordingly.
Thank you Kristina, will include this information in the note about mai-IN.
Thanks Kristina for the update!
I’m getting hit by “Barkrowler” quite a lot right now.
It doesn’t look like AI/LLM but easy enough to block. Their own page explains how to block/slow via robots.txt.
Babbar – does not obey robots.txt, can be a “pain in the ass” BLOCK By: BBQ Pro or:
Ahrefsbot is another one Pain.
Hi, blocking “Apple” seems to be to way to restrictive as “Apple Webkit” is part of the user agent string of many Browser (Chrome and Chrome-based: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36) and crawlers like Googlebot (Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +https://www.google.com/bot.html). With current version v1.6 (htaccess version) ALL of them are blocked.
Yes this already is reported and on the list for next update. Until then remove Apple manually.
Sorry, overlooked that, thanks!
Hi Jeff, thanks for the snippets to block those AI’s.
With regards to the .htaccess variant: would adding `?:` (non-capturing group) at the beginning of the searchable groups not be a further performance optimalisation? In the end we don’t do anything with the found result so why caputuring it? Or is that just to little to notice?
So turning each
RewriteCond %{HTTP_USER_AGENT} (intoRewriteCond %{HTTP_USER_AGENT} (?:You definitely could add it, but yeah probably negligible difference in performance.
BLOCK apple does not block applewebkit in anyway! It only blocks Apple Ai, siri, content training! I know, got 250 blogs Afflite blogs for now.
Kruti[dot]ai.
Add their crawlers to the list
Thanks will confirm and add to the list.
Scratch my last comment submission. v1.6 is also blocking my Miniflux now (fetched it from Wayback Machine). But same issue — when I remove the AI blacklist rules, Miniflux is allowed. There may have been some other change which is causing Miniflux to be blocked when I have AI blocklist enabled, but I am not sure what it would be because I do not think I made any obvious changes to my site in the last day or two.
(Update to previous submissions)
I added version 1.7 of the rules to .htaccess. By default, they were blocking my Miniflux feed reader instance: “Mozilla/5.0 (compatible; Miniflux/2.2.13; +https://miniflux.app)” (UA)
Removing Flux resolves the issue for me.
I am not sure if that was expected behavior, but if possible — rules should not inadvertently target the Miniflux UA. Miniflux is a self-hosted open source RSS/ATOM/JSON/RDF feed reader with no AI functionality.
Thank you for the feedback, I will remove “Flux” from the list next update. In the meantime removing manually as you have done is recommended. Thank you, Nicholas.
Update: Just learned that Miniflux is an AI bot that scrapes content, so will remain blocked. You definitely are free to remove it manually.
I respectfully disagree for the following reasons (just my two cents, obviously your call as the project maintainer).
1. Miniflux is an open source feed reader with no recommendation system, much less AI (see official repo: https://github.com/miniflux/v2). The Miniflux crawler fetches RSS/ATOM/JSON feeds. The official repo has 8.2k stars on GitHub and it is one of the better-known self-hosted feed readers.
2. I had never heard of the miniflux-ai project noted by commenter Kristina, but from the GitHub repo in the comment, it appears to be a third-party plugin for people who run their own instances which generates AI summaries articles fetched in a given account. The repo has just 132 stars and 15 total issues. It requires an API key. While I would not use this plugin/extension, I do not think a single open source extension for Miniflux for generating AI summaries with an API key makes Miniflux an AI bot. Moreover, I believe this extension would not work with the blocklist since the AI bots it use would most likely be blocked.
3. I will venture the vast majority of Miniflux bots are from people like me using it as a normal feed reader, with relatively few using a small third-party GitHub extension to generate AI summaries of feed items.
4. I think blocking a legitimate open source feed reader by default because someone made an AI plugin/extension is too aggressive. I am sure there are AI plugins/extensions/forks for or of other open source feed readers, and I do not think most webmasters who offer feeds would want them all blocked.
5. Some commercial feed readers such as Feedly (https://feedly.com/) and Inoreader (https://www.inoreader.com/pricing) have AI baked into their paid product. Neither are blocked by the AI blocklist (I don’t think they should be, just noting that they heavily advertise AI features).
6. If the list stays as is, I would suggest including a note in the docs that “Flux” blocks Miniflux for sites like mine which promote RSS/ATOM/JSON can make an informed decision about whether to use the list as is or make a modification.
Regardless of your final decision, thank you for maintaining the list and taking the time to look into the issue.
Thanks for the detailed information. When it’s time to update the list again, I will dig in to all the miniflux stuff and make a decision. Basically my hardline rule for the AI Block List: if it is an AI bot or scraper or anything AI-related, then I block it. EXCEPT for Google and other major search engines, for reasons explained in the post. Once again, I appreciate you taking the time to explain further, cheers.
miniflux-ai Is Ai but also feed reader/steals your content:( https://github.com/Qetesh/miniflux-ai
Ah, interesting.. the plot thickens.. will investigate further and probably leave “flux” on the list. I didn’t realize it was AI.
https://huggingface.co/rain1011/pyramid-flow-miniflux
https://datadome.co/bots/miniflux/
https://www.minimax.io/
@Jeff Starr: I could simulate easily your AI list on really big http logs files from one web site and give you the results of related matched numbers by UA. I’m speaking about daily log file ranging from 1 to 4GB size.
Send me a PM that I could send you results that are too big to be posted here.
Feel free to reach me anytime via my contact form, thank you.