AI Visibility Audit Workflow: How to Measure and Improve Brand Mentions in ChatGPT, Claude

A working method for measuring how ChatGPT, Claude, and Perplexity talk about your brand, then fixing what they get wrong or skip entirely.

Updated on: 2026-06-16

The first time I ran a proper AI visibility audit for a client, I expected to find that they were "missing" from ChatGPT. What I actually found was stranger. They were being mentioned, but only on generic prompts. The moment a prompt smelled like a buyer ready to pull out a credit card, three competitors got named and they vanished. Same industry. Same products. Same content footprint, roughly. The model just didn't trust them enough to cite them when it mattered.

That gap, between "we get mentioned" and "we get recommended to people who are about to buy," is the whole game now. And most of the audits I see floating around still treat AI visibility like a checkbox: ask ChatGPT once, screenshot it, call it a day.

Here's the workflow I actually use, with the parts that took the longest to learn.

Start with prompts, not pages

Traditional SEO audits start with the site. AI visibility audits start with prompts, because the model's answer is the unit of measurement, not the ranking of a URL.

Build a prompt set in three layers:

Category prompts. "What are the best tools for X?" "Who are the leading providers of Y?" These tell you general awareness.
Comparison prompts. "Brand A vs Brand B for [use case]." "Alternatives to [incumbent]." These reveal where you live in the model's mental map.
High-intent prompts. "Best [thing] for a 12-person agency under $300/month." "Which [solution] integrates with Webflow and offers white-label reporting?" These are the ones that drive pipeline.

If you only test category prompts, you'll feel better than you should. The pattern I keep seeing is that brands look strong on generic queries and weak on specific buying ones. PBJ Marketing's framing of citation share by intent bucket captures this well. It's not about how often you appear, it's about where on the intent curve you appear.

A reasonable starting set is 25 to 100 prompts depending on category breadth. Inside SEOforGPT, the Launch plan tracks 25 prompts, Growth tracks 50, and Scale handles 100. That spread isn't arbitrary. Below 25 you don't see the distribution clearly. Above 100, for most SMB brands, you're adding noise.

Run the same prompts across multiple assistants

ChatGPT, Claude, and Perplexity don't agree. Sometimes they don't even draw from the same source ecosystem. A brand can be cited heavily in Perplexity (which surfaces links explicitly) and barely mentioned in Claude (which leans on its training distribution and reasoning).

This is why single-engine "audits" mislead. The reality is closer to what Meltwater describes as multi-surface tracking: each model has its own behavior, citation logic, and freshness window. You need the same prompts run against each one, then compared.

What I track per prompt, per assistant:

Was the brand mentioned at all
Position in the answer (first, middle, footnote, "also worth considering")
Was a URL cited, and was it ours or a third party's
What attributes did the model assign to the brand (accurate, vague, wrong)
Which competitors appeared alongside

That last column is the one most audits skip and it's the most useful. Knowing G2 reviews and a specific Reddit thread are the dominant sources for your category tells you where to put your next ten hours of work.

Diagnose the gap before you fix anything

Once you have the matrix, three patterns usually emerge. Each one needs a different fix, and confusing them is how teams waste quarters.

Pattern 1: You're invisible. The model doesn't know you exist, or knows you so weakly it never surfaces you. This is almost always an authority problem, not a content problem. More blog posts won't help. You need earned mentions in the sources the models trust for your category: industry publications, review sites, podcasts with transcripts, comparison roundups.

Pattern 2: You're mentioned but not cited. The model name-drops you but doesn't link to your site. This is a content structure problem. Your pages aren't parseable as definitive sources on the questions being asked. Usually missing entity clarity, weak factual statements, no clear "this is what we do, for whom, with what proof" scaffolding.

Pattern 3: You're cited but described wrong. The model recommends you but gets your pricing, ICP, or feature set wrong. This is a freshness and clarity problem. The model is drawing from old content (your own or third parties') and you haven't updated the surface area enough for it to refresh.

The reason this matters: the fix for Pattern 1 is PR and partnerships. The fix for Pattern 2 is content architecture. The fix for Pattern 3 is ongoing publishing with tight factual structure. Same symptom (low recommendation rate), three different remediation paths.

What an audit actually looks like in practice

Here's roughly what I work through, in order, on a real engagement:

Stage	What I do	What I'm looking for
Prompt design	Build 30 to 80 prompts across category, comparison, high-intent	Coverage of the buyer's actual question set
Multi-engine run	Execute across ChatGPT, Claude, Perplexity	Cross-model visibility matrix
Citation analysis	Catalog which URLs the models cite per prompt	Who owns the source ecosystem in your category
Competitor benchmark	Compare share of voice and citation share	Where competitors get named and you don't
Content gap mapping	Match weak prompts to missing or thin pages	Pages to create, pages to restructure
Authority gap mapping	Identify third-party sources cited by competitors but not for you	PR and partnership targets
Technical AI audit	Check indexability, schema, entity clarity	Whether models can even parse your site cleanly
Reporting baseline	Lock in current scores so improvement is measurable	What "better" will look like in 90 days

This is roughly the structure SEOforGPT is built around, which is partly why I use it for client work. The Brand Visibility test plus Prompt analysis plus Competitor Intelligence covers the first six stages without me building spreadsheets. The Technical AI Audit (Launch plan and up) handles the parseability layer.

You can absolutely do this manually. I did, for about eighteen months. It just stops being practical past about 20 prompts and 2 competitors, and it's impossible to track week over week.

The publishing loop most teams get wrong

Once you know your gaps, you're going to want to publish content. This is where I watch teams burn budget.

The bad version: identify 30 prompts you're weak on, brief 30 articles to a writer, publish them, wait. Three months later, no movement. Why? Because the articles are generic, the structure isn't AI-native, and the publishing cadence isn't fast enough to compete with whoever is updating their stuff weekly.

The version that works: generate AI-native content (which means explicit entity statements, structured comparisons, clear factual claims, sources where claims need them), publish into your CMS, then re-test the same prompts on a schedule. If a prompt's visibility doesn't move within 4 to 8 weeks of publishing, the article isn't doing its job and needs surgery, not another article on a different topic.

This is the loop SEOforGPT automates. Content generation tied to specific weak prompts, direct publish to WordPress, Webflow, Notion, Ghost, or Wix, then weekly visibility re-testing to see if the work is actually shifting the model's behavior. The Growth plan generates 15 articles a month with 8 visibility tests, which is roughly the right cadence for a brand trying to move from "occasionally mentioned" to "consistently recommended" inside one or two quarters.

The trap to avoid: don't publish AI-generated content that reads like AI-generated content. Models can detect their own slop and increasingly down-weight it as a source. Structure should be machine-readable. Substance should still come from someone who knows the category.

What I would do first

If you're starting from zero on AI visibility and you have one week, do this in order:

Pick 20 prompts that map to real buying behavior. Not "what is X." Things your sales team has heard prospects say.
Run them across ChatGPT, Claude, and Perplexity. Note every brand mentioned and every URL cited.
Pick your three closest competitors. Run the same prompts focused on how they're described. This is your benchmark.
Identify the top five third-party sources cited across the prompts. Those are your authority targets.
Pick the three weakest high-intent prompts where you should appear but don't. Those are your first content pieces.

That's the audit. Everything after is execution.

The reason to start small: most teams over-design the prompt set, then never finish the analysis. Twenty prompts you actually review beats two hundred prompts in a dashboard you skim.

Reporting it so someone signs off on the work

This is the part that gets skipped and then bites you. If you're inside an agency, your client needs to see movement. If you're a growth lead, your CEO needs a number. If you're a founder, you need to know if the work was worth it.

The reporting that actually lands shows three things together:

Baseline vs current visibility score across the prompt set
Competitor delta (are you gaining share of voice, are they losing it)
Specific prompts that moved, with the before/after answer

That third one is what convinces skeptical stakeholders. Watching a prompt go from "competitor recommended, you not mentioned" to "you recommended first, with citation" is more persuasive than any aggregate score. SEOforGPT's exportable reports and Public Report Sharing (on Growth and Scale) are built around this, partly because agencies kept asking for something they could put in front of clients without rebuilding it in Notion.

A few honest tensions in this work

The field is still messy. A few things I'd flag:

There's no industry-standard visibility metric yet. Every vendor has a slightly different score. Pick one, stick with it, and measure deltas instead of absolute numbers. The trend matters more than the number.

AI-generated content quality is a real concern, and the models are getting better at filtering it. The right move is structured, factually dense content that happens to be drafted by AI and reviewed by a human who knows the category, not unsupervised volume.

Model behavior shifts. A prompt that worked last quarter may behave differently this quarter because the underlying model was updated. Weekly or biweekly testing isn't paranoid, it's the only way to know if your work is holding.

And the obvious one: this is still a smaller channel than organic search for most categories. But it's the one growing fastest, and the brands building citation share now are the ones being recommended by default in 18 months. That's the bet.

FAQ

How often should I re-run an AI visibility audit? For active optimization, weekly on a stable prompt set. For quarterly strategy reviews, refresh the prompt set itself. Model behavior changes too often to audit once a year and trust the result.

Do I need to optimize for every AI assistant? Probably not. Look at where your buyers actually use AI. B2B SaaS buyers skew toward ChatGPT and Perplexity. Research-heavy categories show up more in Claude. Optimize for the two your audience uses, not all six.

Can I just do this with ChatGPT prompts manually? For 10 prompts and 1 competitor, yes. Past that, you'll lose track of which answer you saw on which day from which model. The work isn't the prompting, it's the longitudinal tracking.

Will AI-generated content hurt my visibility? Depends entirely on what's in it. Thin AI content with no factual scaffolding gets ignored or down-weighted. Structured content with clear entities, accurate claims, and useful specificity gets cited, regardless of how it was drafted.

Is this replacing SEO? Not yet. It's a parallel channel that overlaps with SEO at the content layer and diverges at the measurement and authority layers. Treat it as adjacent, not as a replacement, until your own data tells you otherwise.

The AI Visibility Audit Workflow That Actually Works

Start with prompts, not pages

Run the same prompts across multiple assistants

Diagnose the gap before you fix anything

What an audit actually looks like in practice

The publishing loop most teams get wrong

What I would do first

Reporting it so someone signs off on the work

A few honest tensions in this work

FAQ

Further reading

Outros usuários também acharam isso interessante

Tracking Your Brand Across ChatGPT, Claude, and Perplexity

The Tool That Writes AI-Native Articles and Publishes Them For You

Best AI Visibility Platforms for Agencies Managing Multiple Clients

Pronto para otimizar seu conteúdo para IA?