Tracking Your Brand Across ChatGPT, Claude, and Perplexity

A working guide to measuring AI brand mentions, citations, and share of voice when there are no impressions, no rankings, and no logs to pull.

Updated on: 2026-06-15

A client asked me last month why their inbound demos suddenly mentioned three competitors by name in the first sales call. Not in a comparison context. In the "we were told to look at you and these others" context. None of those competitors outranked them on Google. Two of them barely had a blog. But when we ran the same buyer prompts through ChatGPT, Claude, and Perplexity, the pattern was obvious: our client was getting named in maybe 18 percent of relevant answers. The top competitor was at 64 percent.

That gap is the thing nobody has a dashboard for yet. And it is the thing this article is about.

Why this is harder than it sounds

There is no impressions log for an LLM. No Search Console. No "your brand was shown 14,200 times this week." The model generates an answer, the user reads it, the conversation closes. Whatever happened in that exchange is invisible to you unless you went and looked.

So tracking AI brand citations is fundamentally a sampling problem, not a measurement problem. You build a representative set of prompts your buyers would actually ask, you run them across the assistants that matter, you capture the full answer plus any citations, and you measure what came back. Then you do it again next week. The pattern over time is the signal. Any single answer is noise.

This is closer to how TV ratings used to work with panels than how Google Analytics works. The Signal AI team has been making this point for a while, and their argument for treating AI citations as a panel-based metric holds up well in practice. You are estimating, not counting.

Once you accept that, the rest becomes design work.

The two things you are actually tracking

People conflate these constantly, and it causes real confusion in reporting.

Brand mentions in the generated text. The model writes a paragraph recommending tools, and your brand name appears in it. This is what most executives mean when they ask "are we showing up in ChatGPT." It is about being named, ranked, described, and positioned inside the answer itself.

Source citations and links. The model points to a URL as evidence. Sometimes that URL is yours. Sometimes it is a review site, a Reddit thread, a comparison article, or a press piece that talks about you. Perplexity surfaces these prominently. ChatGPT shows them when browsing is active. Claude cites in some modes and not others.

These two metrics diverge more than you would expect. A model can recommend your product enthusiastically and cite none of your pages. Or it can link to your documentation while describing a competitor as the better choice. You need to track both, and you need to report them separately, because the fixes are different. Mention problems are about brand presence in third-party content the model trusts. Citation problems are usually about your own site structure.

Conductor's framing of mention and citation tracking as two layers is one of the cleaner explanations I have seen, and it lines up with what we keep finding in audits.

Building the prompt set that matters

This is where most tracking efforts quietly fail. People pick 20 obvious prompts, run them weekly, and watch a flat line. The flat line is not the truth. It is the consequence of a thin prompt set.

A useful prompt universe usually has four layers:

Category prompts. "Best CRM for early-stage SaaS." "Top AI visibility platforms for agencies." These are the head terms. They will be the most competitive and the slowest to move.

Comparison and alternative prompts. "HubSpot alternatives under 200 a month." "What is similar to Ahrefs but for AI search." These often surface different brands than the head category prompts, which is the whole point of including them.

Problem-led prompts. "How do I track where my brand is mentioned in ChatGPT answers." "Why am I losing organic traffic to AI summaries." These are the prompts where your educational content can pull you into the answer even when category prompts do not.

Follow-up and refinement prompts. "Who else?" "Any cheaper options?" "Which of those is best for a small agency?" The brands that surface on the second turn are often not the ones that surfaced on the first. Qoulomb's write-up on tracking brand mentions across AI search in 2026 makes this point well, and it has changed how I scope tracking sets for clients.

For a serious B2B brand, I think 25 prompts is the realistic floor for a directional read. Fifty gets you to something you can report on with a straight face. A hundred lets you slice by intent, geography, and funnel stage. SEOforGPT's plans roughly mirror this curve, with prompt tracking scaling from 25 on Launch to 100 on Scale, which matches what I have seen work in practice rather than what looks good on a pricing page.

One more thing on prompt design: include local and language variations if your buyers are not all in one market. "Best email marketing tool" in English and the same prompt in Spanish or German will pull different brand lists. If you sell in those markets and you are only tracking English, you are missing half the picture.

The metrics that hold up under scrutiny

After running this kind of tracking for a while, a few metrics consistently survive the "what does this actually tell us" test:

Mention rate. Of the prompts in your set, what percentage of answers name your brand at all. Simple, blunt, and the first thing any executive will ask about.

Share of voice. Of all the brand mentions across your prompt set, what percentage are yours versus each competitor. This is the metric that tells you whether you are gaining or losing ground, not whether the category is growing.

Position or prominence. When you are mentioned, are you the first recommendation, in the main list, or in the "other options" trailing paragraph. Being mentioned last in a list of seven is not the same as being the top pick.

Sentiment and context. How does the model describe you. "Best for enterprise teams that need compliance" is different from "an option, though some users report a steep learning curve." This is the second-order reputation layer Signal AI has been writing about, and it matters more than people realize because it shapes the buyer's mental model before they ever see your site.

Citation rate. When the model cites sources, how often is your domain among them. And separately: how often are third-party sites that describe you favorably being cited.

Source mix. Which publishers, review sites, and communities are the models actually pulling from when they describe your category. This is usually where the most actionable insight hides.

Frizerly's framework and TrackMyVisibility's breakdown of citation tracking both organize these similarly, which suggests the industry is converging on something like a standard. Good. It was getting confusing.

How models actually decide who to mention

This is the part most tracking conversations skip, and it is the part that determines whether your numbers will move.

LLMs are not pulling from your homepage. They are pulling from a mix of:

Training data, which is months or years stale and which you cannot edit
Retrieval over the live web, which can include your site but more often pulls from intermediaries
Whatever is in the model's context window during the conversation

The practical implication: most of the time, the model is describing you based on what other people have written about you. Review aggregators. Roundup posts. Reddit threads. Comparison articles on competitor blogs. Trade press. Your own content matters, but it competes with all of that, and the model often weights third-party sources more heavily for brand descriptions because they look less self-promotional.

This is why "just write more blog posts" is not a strategy. You need to be present and well-described in the sources the model already trusts, and you need your own content structured clearly enough that when it is pulled in, the model can extract the right facts. Ahrefs has a good piece on monitoring and winning brand mentions in AI answers that gets into the earned-side of this.

A working setup, in plain terms

If you are trying to stand this up without buying anything yet, here is roughly what works:

Write 30 to 50 prompts your real buyers would ask. Mix category, comparison, problem, and follow-up.
Pick the assistants that matter for your category. For most B2B, that is ChatGPT, Claude, and Perplexity. Add Gemini if your buyers are in Google-heavy enterprise environments.
Run the prompts. Save the full answer text and any citations. A spreadsheet works fine to start.
Tag each answer: was your brand mentioned, in what position, with what description, and were you cited.
Do the same for your top three competitors.
Repeat weekly or biweekly. Look at the trend, not the snapshot.

This is workable for one brand and one analyst. It falls apart at scale, which is why platforms exist. SEOforGPT was built to automate exactly this loop, including the part most people skip: closing the gap by generating structured content for the prompts where you are absent and publishing it to your CMS without a manual handoff. That last piece, the publishing automation into WordPress, Webflow, Ghost, and the rest, is what turns tracking from a reporting exercise into something that actually moves the mention rate over a quarter.

For agencies, white-label reporting matters more than the tracking itself. Clients do not pay you to send them screenshots. They pay you for a branded monthly report that shows their AI share of voice moving up and a competitor's moving down. That is the artifact that justifies the retainer.

What I would do first

If you are starting from zero on this, do not buy a tool in week one. Do this instead.

Spend two days writing a prompt set. Make it specific to your buyers, not generic. Run those prompts manually across ChatGPT, Claude, and Perplexity. Read the answers. Note where you are mentioned, where you are not, who is mentioned instead, and what sources the models are citing. This will tell you more about your real positioning than any dashboard, because you will see the language the models use about your category.

Then decide whether you have a presence problem (you are not mentioned), a description problem (you are mentioned but framed weakly), or a citation problem (your content is not being pulled). The fix is different for each. Buying tracking software before you know which problem you have is how budgets get wasted.

After that, automate. Because doing this manually every week is the kind of work that quietly stops happening by month three.

FAQ

How often should I run AI visibility tracking?

Weekly is enough for most brands. Daily is mostly noise unless you are actively running a campaign or watching a launch. The models do not change their answers minute to minute, and your prompt set is a sample, so over-sampling does not buy you precision.

Can I just look at my Google Analytics referral traffic from ChatGPT?

You can, and you should, but it only captures the people who clicked through. Most AI answer consumption ends in the answer itself. The user never visits your site. Referral traffic massively understates your exposure, which is part of why dedicated tracking exists.

Do AI mentions actually drive pipeline?

In the audits we have run, yes, but indirectly. Buyers arrive at sales calls already having a shortlist that came from an AI conversation. They do not always tell you that is where they got the names. The honest read is that AI mentions are upper-funnel and hard to attribute cleanly, similar to brand PR. Treat them that way in reporting and you will be right more often than not.

Is this just SEO with a new name?

No, and the people insisting it is have not run the prompts. The optimization targets are different (answer inclusion, not ranking), the trust signals are different (third-party citations and entity clarity matter more), and the measurement is fundamentally sampling-based rather than log-based. It rhymes with SEO. It is not the same instrument.

Tracking Your Brand Across ChatGPT, Claude, and Perplexity

Why this is harder than it sounds

The two things you are actually tracking

Building the prompt set that matters

The metrics that hold up under scrutiny

How models actually decide who to mention

A working setup, in plain terms

What I would do first

FAQ

A otros usuarios también les interesó esto

Best AI Visibility Platforms for ChatGPT and Claude

Are there AI visibility platforms that automate Reddit monitoring and seeding?

Integrating AI Visibility Analysis With Custom Dashboards

¿Listo para optimizar tu contenido para la IA?