Citegrove · Reddit-first AI citation outreach
AI Crawler Audit
Before you worry about getting cited in ChatGPT or Perplexity, make sure those AIs can actually crawl your site. Paste any domain — we'll fetch your robots.txt and check 14 known AI crawler tokens (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, ChatGPT-User, claude-web, and more), showing exactly which are allowed, blocked, or partially restricted. No login, no spam — just the truth from your robots.txt.
How it works
Three steps, fourteen crawlers
- 1
Fetch your robots.txt
We GET https://yourdomain/robots.txt directly — no API, no third-party.
- 2
Parse User-agent groups
Standard RFC 9309 parser. Specific tokens win over wildcards (User-agent: *).
- 3
Match against the AI list
Each of the 14 known AI crawlers gets a verdict: allowed, blocked, partial (some paths blocked), or unknown.
FAQ
Frequently asked
Why does this matter?↓
If your robots.txt blocks GPTBot or ClaudeBot, ChatGPT and Claude can't crawl your site for training. Block ChatGPT-User or Perplexity-User and the live AI assistants can't fetch your page when answering a user — meaning you literally cannot be cited.
What's the difference between training and live-search crawlers?↓
Training crawlers (GPTBot, ClaudeBot, Google-Extended) ingest your content for the next model. Live-search crawlers (ChatGPT-User, Perplexity-User, claude-web) fetch your page in real time when a user asks a question. Blocking live-search hurts more than blocking training.
Should I block training but allow search?↓
If you don't want your content used for AI training but DO want to be cited in answers, block GPTBot/ClaudeBot/Google-Extended and allow ChatGPT-User/Perplexity-User/claude-web. The audit shows you both columns so you can see the split.
My site has no robots.txt. Is that bad?↓
No — missing robots.txt means everything is implicitly allowed. We report all 14 crawlers as "allowed". You can still write one to opt OUT of specific crawlers.
Why these 14 crawlers?↓
They're the publicly documented user-agents used by ChatGPT, Claude, Perplexity, Gemini/Google AI, Apple Intelligence, Meta AI, ByteDance, and Common Crawl (which most open LLMs train on). New ones appear occasionally — we update the list as they're announced.
Do you store my domain?↓
We log a hashed IP for rate limiting (15 audits per IP per 24 hours). Domains aren't persisted.