Crawler Index

A large sample of crawlers that are blocked by websites.

71.5%

of websites have at least a partial disallow command.

71.5%

of websites have at least a partial disallow command.

% of websites explicitly blocking user agent	% of websites blocking explicitly and with * command	Company	Purpose	User Agent
2.4%	62.0%	Open AI	GPT	GPTBot
2.0%	62.0%	Common Crawl Foundation	Public Web Archive	CCBot
1.5%	61.9%	Google	Bard/Gemini/PaLM/Bison	Google-Extended
0.6%	61.2%	OpenAI	Chat GPT	chatgpt-user
1.7%	63.4%	Amazon	Alexa	amazonbot
0.5%	61.5%	Meta AI	LIaMA	FacebookBot
0.5%	64.4%	Brandwatch	Magpie Crawler	magpie-crawler
1.7%	64.9%	ByteDance	ByteDance LLM N/A	Bytespider
0.5%	61.3%	Anthropic	Claude	Anthropic-AI
1.5%	63.2%	Anthropic	Claude	claudebot
0.3%	61.6%	Anthropic	Claude	claude-web
0.4%	61.6%	Perplexity	Chatbot	perplexitybot
0.3%	61.6%	Cohere	Cohere Command	Cohere-AI
1.1%	62.3%	Apple	Apple's foundational models	Applebot-Extended
0.3%	64.1%	Apple	Siri	Applebot
0.2%	64.2%	Diffbot	training data	diffbot
1.3%	63.3%	Meta	All Meta AI	meta-externalagent
0.2%	64.0%	OpenAI	SearchGPT	oai-searchbot
0.2%	64.2%	Timpi	Wilson AI	timpibot
0.1%	64.2%	webz.io	webzio-extended	webzio-extended
0.1%	64.1%	Google	Bard/Gemini/PaLM/Bison	googleother
0.01%	64.4%	Perplexity	perplexity-ai	perplexity-ai
0.1%	64.2%	Meta	All Meta AI	meta-externalfetcher

% of websites explicitly blocking user agent	% of websites blocking explicitly and with * command	Company	Purpose	User Agent
20.9%	84.3%	Open AI	Chat GPT	gptbot
17.3%	85.1%	Common Crawl Foundation	Public Web Archive	ccbot
13.6%	85.9%	Google	Bard/Gemini/PaLM/Bison	google-extended
12.1%	84.3%	Open AI	Chat GPT	chatgpt-user
12.5%	84.6%	Anthropic	Claude	anthropic-ai
14.0%	84.3%	Anthropic	Claude	claudebot
10.6%	84.7%	Anthropic	Claude	claude-web
8.9%	84.4%	Meta	LIaMA	facebookbot
12.3%	85.0%	ByteDance	ByteDance LLM N/A	bytespider
11.5%	84.0%	Perplexity	Chatbot	perplexitybot
10.1%	85.1%	Cohere	Cohere Command	cohere-ai
8.8%	85.2%	Apple.com	Apple's foundational models	applebot-extended
4.8%	86.8%	Brand Watch	Magpie Crawler	magpie-crawler
7.4%	84.8%	Amazon	Alexa	amazonbot
3.4%	86.1%	Apple	Siri	applebot
1.8%	86.6%	Google	Bard/Gemini/PaLM/Bison	googleother
2.8%	86.6%	Webz	webzio-extended	webzio-extended
4.1%	86.4%	Timpi	Wilson AI	timpibot
1.4%	87.3%	Perplexity	perplexity-ai	perplexity-ai
4.6%	86.6%	Meta	All Meta AI	meta-externalfetcher
7.1%	84.7%	Open AI	Search GPT	OAI-searchbot
8.8%	85.7%	Meta	All Meta AI	meta-externalagent

Methodology

Bright Data scrapes the world’s most sought-after public web data on billions of top websites. Through our compliance product, Bright Shield, we collect allow and disallow commands for user agents in robot.txt from the websites we scrape. Our current sample size of websites is 8,427,430 and we have collected about 33,000 unique user agents.

Our research team has identified the percentage of time each user agent of interest is explicitly blocked within our sample and each user agent that is blocked with the (*) command. We also track the overall percentage of websites that disallow all crawlers. Each user agent is identified to the best of our ability by company, use, and a link that includes additional information such as how to block it.

Comments on user agents? Email comments to [email protected]

Last updated November 12, 2025