How Unsourced measures AI visibility
No black box. Here’s exactly how we check whether AI models cite your content — the questions we ask, how we decide a citation is real, and how confident we are in each one.
Crawled isn’t cited. Cited isn’t understood.
A bot visiting your site tells you that you were crawled. It doesn’t tell you whether an AI actually recommends you when someone asks a real question — or whether it understood you well enough to cite you with confidence.
Unsourced measures the whole chain: the questions real users ask, which AI models surface your site in their answers, and — crucially — how much evidence there is that the model drew from your content rather than simply knowing your name. That last part is what our confidence score captures.
1. The questions we ask
Every week we ask seven AI models — Claude, ChatGPT, ChatGPT with live search, Gemini, Gemini with Google grounding, Grok and Perplexity — a set of natural questions a real person in your niche would type. Pro sites also get a lighter daily check.
The questions are generated from your real article topics — discovered from your sitemap, feed, and the pages AI bots actually visited — not from your brand name. We deliberately never name your site in the question. If we asked “what is [your brand]?”, any model would parrot it back and we’d log a citation that means nothing. By asking the way a customer would, a citation has to be earned.
Questions span intent types — informational, comparison, commercial and local — so you can see not just whether AI cites you, but for which kinds of queries.
2. How we detect a real citation
We scan each answer for your domain and name — but only count it when it appears in a genuinely positive context. Answers that mention your name but say “I don’t have information about this site”, or that use a word like “unsourced” in its ordinary sense, are filtered out as false positives.
When you aren’t cited, we capture which domains the AI recommended instead — so your competitor intelligence is built from the same checks.
3. How confident we are
Not every citation is equal. A passing name-drop is weaker evidence than an answer that quotes your page word-for-word. So every citation carries a confidence score, shown on your dashboard as one of three tiers:
Confirmed
A verified AI crawler fetched your page and the answer reproduced your content, or the answer quotes your page word-for-word. Strong evidence the model drew from your actual content — not inference.
Likely sourced
Partial evidence — measurable content overlap, or a citation from a model that searches the live web — suggesting your page informed the answer.
Mention only
Your site or brand was named, but with no corroborating evidence that the model used your actual content. Still a real mention — just unproven sourcing.
The signals behind the score
Verbatim phrase match
Exact phrases from your published page found word-for-word in the AI’s answer — the clearest sign your content, not just your name, shaped the response.
Content similarity
How much of the vocabulary in the answer overlaps with your page. High overlap without an exact quote still points to your content as a likely source.
Origin Signal (live crawl)
We log when a reverse-DNS verified AI crawler fetches your page. When that fetch lines up with an answer that reproduces your content, the chain proves the model drew from your live page — not its older training data.
Live-search grounding
Whether the model that cited you searches the live web (Perplexity, ChatGPT with search, Gemini grounded) or answers purely from training data. A live-search citation reflects current web presence.
The Origin Signal
Most citations can’t distinguish between an AI that read your site today and one repeating what it learned months ago. The Origin Signal is how we tell the difference — by joining two facts we record independently.
First, your installed worker or beacon logs every AI crawler that fetches your pages, and we confirm each one by reverse DNS to the operator (so a GPTBot hit really resolves to OpenAI, not a spoof). Second, when an answer from that same provider reproduces your page’s wording, the two link into a chain: this crawler fetched your page, and this answer used it. That’s a live crawl confirmed, and it’s the strongest tier in the confidence score. A verified fetch on its own proves access — we label that honestly as a “Verified crawl”, distinct from confirmed sourcing. We inject nothing into your site; the proof is your own content plus the logs.
What this is — and isn’t
- —We sample real, representative questions — we don’t claim to test every query a person could ask.
- —AI answers are probabilistic: the same question can vary run to run. That’s why we track trends over weeks, not single results.
- —We can’t see inside a model’s training data. “Mention only” means we couldn’t prove sourcing — not that none happened.
- —Confidence is evidence strength, not a verdict on your content. A real citation with little corroborating evidence is still a real citation.
- —Nothing here is a legal determination. The Evidence Report organises the raw data; how you use it is your call.
Everything is kept as evidence
Every check is stored with the question asked, an excerpt of the AI’s answer, the matched text, and the signals behind its confidence score. You can export it all as an Evidence Report — verbatim phrase matches, similarity scores, and timestamped live-crawl records included — so the numbers on your dashboard are always traceable back to source.
See it in practice
Here’s what a high-confidence citation looks like once the checks run — the question a real person asked, the model that answered, and the evidence behind the score. Nothing abstract: this is the same card you’d see on your dashboard.
“What’s the best budget mirrorless camera for a beginner in the UK?”
Citation found in response
“…for first-time buyers on a budget, Tech Reviews UK recommends it as the best value option, highlighting its image stabilisation and beginner-friendly controls…”
Every check is stored with this evidence — the excerpt, the matched phrase, the model — and exported in your Evidence Report. See the comprehension test we ran on our own site →