AI TrainerCommon Crawl
Last verified: 2026-06-13 · maintained by Unsourced
CCBot belongs to Common Crawl, a non-profit whose open web archive is a major training-data source for many AI models. Blocking CCBot removes you from a dataset many labs train on.
There is no published IP-range feed or documented reverse-DNS footprint for CCBot, so its identity can't be verified by IP. Treat the User-Agent as an unverified claim.
A User-Agent string is just a claim — anyone can send CCBot in a header. Confirm identity two ways:
forward-confirmed reverse DNS (the IP resolves to Common Crawl, and that host resolves back to the IP), and, where published,
an IP inside the operator's official ranges. If neither holds, it's an impostor wearing the badge — not CCBot.
Recommended: keep. Feeds model training; blocking it can quietly remove you from future AI answers.
If you do choose to act in robots.txt (which crawlers honour but don't enforce):
# CCBot: recommended to ALLOW — blocking can cost you AI visibility User-agent: CCBot Disallow:
Is CCBot really from Common Crawl?
CCBot is Common Crawl's crawler, but a User-Agent header can be spoofed, so the claim alone isn't proof. Confirm it with forward-confirmed reverse DNS and, where published, a match against Common Crawl's official IP ranges.
What are CCBot's IP ranges?
There's no published IP-range feed or documented reverse-DNS footprint for CCBot, so it can't be verified by IP. Treat its User-Agent as an unverified claim.
Should I block CCBot?
Our recommendation: keep. Feeds model training; blocking it can quietly remove you from future AI answers.
Unsourced verifies every AI crawler against published ranges and reverse DNS, and shows which AI assistants cite you.
Check your site free →14-day free trial · no card required · cancel anytime