Methodology

How Unsourced measures AI visibility

No black box. Here’s exactly how we check whether AI assistants cite your content — the questions we ask, how we decide a citation is real, and how confident we are in each one. We measure verified evidence, not vanity scores.

Crawled isn’t cited. Cited isn’t understood.

A bot visiting your site tells you that you were crawled. It doesn’t tell you whether an AI actually recommends you when someone asks a real question — or whether it understood you well enough to cite you with confidence.

Unsourced measures the whole chain: the questions real users ask, which AI assistants surface your site in their answers, and — crucially — how much evidence there is that the model drew from your content rather than simply knowing your name. That last part is what our confidence tier captures.

1. The questions we ask

Every week we ask seven AI models — Claude, ChatGPT, ChatGPT with live search, Gemini with live Google Search, Grok, Llama and Perplexity — a set of natural questions a real person in your niche would type. Studio sites also get a lighter daily check that still includes Gemini with live Google Search, so live citations can surface every day, not only in the weekly run.

The questions are generated from your real article topics — discovered from your sitemap, feed, and the pages AI bots actually visited — not from your brand name. We deliberately never name your site in the question. If we asked “what is [your brand]?”, any model would parrot it back and we’d log a citation that means nothing. By asking the way a customer would, a citation has to be earned.

Questions span intent types — informational, comparison, commercial and local — so you can see not just whether AI cites you, but for which kinds of queries.

2. How we detect a real citation

We scan each answer for your domain and name — but only count it when it appears in a genuinely positive context. Answers that mention your name but say “I don’t have information about this site”, or that use a word like “unsourced” in its ordinary sense, are filtered out as false positives.

When you aren’t cited, we capture which domains the AI recommended instead — so your competitor intelligence is built from the same checks.

3. How confident we are

Not every citation is equal. A passing name-drop is weaker evidence than an answer that quotes your page word-for-word. So every citation is graded into one of three confidence tiers on your dashboard:

Confirmed

A verified AI crawler fetched your page and the answer reproduced your content, or the answer quotes your page word-for-word. Strong evidence the model drew from your actual content — not inference.

Likely sourced

Partial evidence — measurable content overlap, or a citation from a model that searches the live web — suggesting your page informed the answer.

Mention only

Your site or brand was named, but with no corroborating evidence that the model used your actual content. Still a real mention — just unproven sourcing.

The signals behind the score

Verbatim phrase match

Exact phrases from your published page found word-for-word in the AI’s answer — the clearest sign your content, not just your name, shaped the response.

Content similarity

How much of the vocabulary in the answer overlaps with your page. High overlap without an exact quote still points to your content as a likely source.

Origin Signal (live crawl)

We log when a reverse DNS verified AI crawler fetches your page. When that fetch lines up with an answer that reproduces your content, the chain proves the model drew from your live page — not its older training data.

Live web search

Whether the model that cited you searches the live web (Perplexity, ChatGPT with search, Gemini with live search) or answers purely from training data. A live-search citation reflects current web presence.

The Origin Signal

Most citations can’t distinguish between an AI that read your site today and one repeating what it learned months ago. The Origin Signal is how we tell the difference — by joining two facts we record independently.

First, your installed Cloudflare Worker, WordPress plugin, or server-side snippet logs every AI crawler that fetches your pages, and we confirm each one by reverse DNS to the operator (so a GPTBot hit really resolves to OpenAI, not a spoof). Second, when an answer from that same provider reproduces your page’s wording, the two link into a chain: this crawler fetched your page, and this answer used it. That’s a live crawl confirmed, and it’s the strongest confidence tier. A verified fetch on its own proves access — we label that honestly as a “Verified crawl”, distinct from confirmed sourcing. The proof is your own content plus the logs.

Verified, impostor, or unverifiable

Bot Trust

3 authorized crawlers verified

1 spoofer caught

1 unverifiable

Every crawler announces who it is — but that announcement is just a claim, and a claim is easy to fake. Anyone can send a request that says “I’m GPTBot.” So we don’t take the label at face value; we check the address it actually came from.

We verify true identity two ways. First, the major AI operators publish the IP ranges their crawlers use (OpenAI, Perplexity, Google) — we check each hit against the operator’s own authoritative list. Second, forward-confirmed reverse DNS: we look up who owns the IP, then confirm that name resolves back to the very same address — a check a spoofer can’t pass.

That sorts every crawler into three honest buckets:

—Verified — proven to belong to the operator it claims. Its IP is in that operator’s published range, or its reverse DNS forward-confirms. The name is earned.
—Impostor — it claims a brand but the evidence contradicts it: the IP sits outside that operator’s published range, or the address belongs to someone else entirely. We headline the real owner; the claimed name becomes a footnote.
—Unverifiable — there’s no published footprint to check it against. We mark it honestly as unverifiable — we never accuse a crawler just because it can’t prove itself.

We surface what we find and hand you honest, ready-made block rules — but the choice stays yours. We expose; we don’t steer.

What this is — and isn’t

—We sample real, representative questions — we don’t claim to test every query a person could ask.
—AI answers are probabilistic: the same question can vary run to run. That’s why we track trends over weeks, not single results.
—We can’t see inside a model’s training data. “Mention only” means we couldn’t prove sourcing — not that none happened.
—Confidence is evidence strength, not a verdict on your content. A real citation with little corroborating evidence is still a real citation.
—A confirmed lift after a fix shows a citation we couldn’t measure before now appears — strong evidence the change helped, not proof it was the only cause.
—Nothing here is a legal determination. The Evidence Report organises the raw data; how you use it is your call.

Everything is kept as evidence

AI Visibility Audit — sample report cover

Every check is stored with the question asked, an excerpt of the AI’s answer, the matched text, and the signals behind its confidence tier. You can export it all as an Evidence Report — verbatim phrase matches, similarity scores, and timestamped live-crawl records included — so the numbers on your dashboard are always traceable back to source.

See it in practice

Here’s what a high-confidence citation looks like once the checks run — the question a real person asked, the model that answered, and the evidence behind the score. Nothing abstract: this is the same card you’d see on your dashboard.

Illustrative examplesoundwavehq.com

PerplexityCommercial Confirmed Cited

“What are the best budget wireless earbuds for a beginner in the UK?”

Citation found in response

“…for first-time buyers on a budget, SoundWave Audio recommends them as the best value pick, highlighting their long battery life and beginner-friendly controls…”

Verbatim phrase matchedHigh content similarityLive web search

Every check is stored with this evidence — the excerpt, the matched phrase, the model — and exported in your Evidence Report. See the comprehension test we ran on our own site →

Proving a fix worked

Finding a gap is only half the job. When you act on a recommendation and mark it as done, Unsourced doesn’t just take your word for it — it goes back and checks.

About a week after you mark a fix as done, we automatically re-run the checks for that specific topic — the same questions, the same seven models. If the change still hasn’t shown up, we try once more a little later. We only ever report a win when there’s measured citation evidence: the topic that wasn’t cited before is cited now.

Until then you’ll see an honest status — “Re-checking your fix” while we wait, or “Re-checked — not confirmed yet, still watching” if it hasn’t moved. A confirmed result means we measured a citation appear after your change — not a promise that the change alone caused it. The evidence is the citation itself, recorded and timestamped like every other check.

Know where you stand in AI search

10-day free trial. No credit card required.

Start free trial