Question 1

Why does this matter?

Accepted Answer

If your robots.txt blocks GPTBot or ClaudeBot, ChatGPT and Claude can't crawl your site for training. Block ChatGPT-User or Perplexity-User and the live AI assistants can't fetch your page when answering a user — meaning you literally cannot be cited.

Question 2

What's the difference between training and live-search crawlers?

Accepted Answer

Training crawlers (GPTBot, ClaudeBot, Google-Extended) ingest your content for the next model. Live-search crawlers (ChatGPT-User, Perplexity-User, claude-web) fetch your page in real time when a user asks a question. Blocking live-search hurts more than blocking training.

Question 3

Should I block training but allow search?

Accepted Answer

If you don't want your content used for AI training but DO want to be cited in answers, block GPTBot/ClaudeBot/Google-Extended and allow ChatGPT-User/Perplexity-User/claude-web. The audit shows you both columns so you can see the split.

Question 4

My site has no robots.txt. Is that bad?

Accepted Answer

No — missing robots.txt means everything is implicitly allowed. We report all 14 crawlers as "allowed". You can still write one to opt OUT of specific crawlers.

Question 5

Why these 14 crawlers?

Accepted Answer

They're the publicly documented user-agents used by ChatGPT, Claude, Perplexity, Gemini/Google AI, Apple Intelligence, Meta AI, ByteDance, and Common Crawl (which most open LLMs train on). New ones appear occasionally — we update the list as they're announced.

Question 6

Do you store my domain?

Accepted Answer

We log a hashed IP for rate limiting (15 audits per IP per 24 hours). Domains aren't persisted.

AI Crawler Audit

Three steps, fourteen crawlers

Frequently asked