How We Measure Success: The Metrics Behind Your AI Visibility Program

One of the first questions we hear from new clients is some version of: "This all makes sense, but how will I know it's working?" It is the right question. AI visibility is a genuinely new discipline, and the metrics that defined success in traditional SEO — keyword rankings, organic sessions, click-through rates — do not translate cleanly to a world where buyers ask ChatGPT for vendor recommendations and never visit a search results page at all.
Every client engagement at Windgrove AI is backed by a defined set of metrics, a consistent query set, and monthly reporting that shows exactly where your brand stands across the LLMs that matter.
Executive SummaryWe track AI visibility through four core metrics: Citation Rate, AI Share of Voice, Sentiment Score, and Branded Query Volume Lift — each one measuring a different dimension of how AI engines perceive and recommend your brand.Monthly reports cover your performance across ChatGPT, Perplexity, Claude, Gemini, and Grok, with Share of Voice calculated per platform so you can see where you lead and where you lag.We select queries based on buyer intent, prioritising both informational prompts ("how to choose...") and high-intent comparison queries ("X vs Y"), because both influence purchase decisions at different stages of the funnel.Success is defined as directional momentum: a rising citation rate, improving Share of Voice relative to competitors, and AI-referred traffic that converts at measurably higher rates than organic search.
Why Traditional Metrics No Longer Tell the Full Story
Traditional SEO gave us a clean scoreboard: rank position 1 to 100, organic sessions, impressions, click-through rate. Those numbers are still worth tracking, but they have a significant blind spot. AI Overviews now appear in approximately 48% of all Google searches, up from 34.5% in late 2025. ChatGPT has reached 883 million monthly users. Gemini's market share grew from 5% to 21% year over year.
The uncomfortable reality: only 38% of pages cited in AI Overviews also rank in the top 10 for the same query. You can hold position one on Google and still be completely invisible when a buyer asks an AI assistant for a recommendation in your category.
We see this pattern regularly with new clients. Their traditional SEO metrics look healthy. But when we run their first query audit across ChatGPT, Perplexity, and Gemini, they are absent from the conversation entirely. Their competitors are being named. They are not. That gap is the problem we fix, and it requires a different set of instruments to measure.
Traditional rank trackers were never designed to capture LLM citations. They cannot tell you whether ChatGPT recommended your brand this week, whether Perplexity mentioned you positively or as a cautionary example, or whether you are appearing first or fifth in a competitive category answer. For that, you need a purpose-built measurement framework.
The Four Metrics in Your Monthly Report
Every monthly report we produce for clients is built around four core metrics. Each one answers a distinct question about your AI visibility.
1. Citation Rate
Citation Rate is your North Star metric. It measures the percentage of tracked queries where at least one AI platform mentions your brand in its response. The formula is straightforward:
Citation Rate (%) = (Queries where your brand appears ÷ Total queries tested) × 100
If we test 100 buyer-intent questions across ChatGPT, Claude, and Perplexity, and your brand appears in 18 of those responses, your Citation Rate is 18%. For context, strong B2B companies typically target 10-15% citation rates on category queries, while market leaders exceed 30%. A rising Citation Rate is the clearest signal that our content and authority-building work is producing results.
2. AI Share of Voice (AI SOV)
Citation Rate tells you whether you are in the conversation. Share of Voice tells you how prominently. When an AI lists five vendors in a category response, are you mentioned first or last? Do you appear in 40% of those answers or 8%?
AI SOV (%) = (Your brand's mentions ÷ Total mentions of all brands in category) × 100
We track Share of Voice separately for each platform, because your position varies significantly across LLMs. You might capture 35% of mentions in ChatGPT but only 12% in Perplexity. Each platform pulls from different sources and weights authority differently. A complete picture requires tracking across all of them.
3. Sentiment Score
Not all citations are equal. An AI response that says "Brand X is the category leader for mid-market logistics companies" is worth far more than one that says "Brand X is one option, though several alternatives exist." We classify every citation as positive (recommended), neutral (listed), or negative (cautionary), and track how that distribution shifts month over month.
Optimising for citation volume while ignoring sentiment is a common mistake. A brand with a 25% Citation Rate but predominantly neutral mentions is losing ground to a competitor with 15% but overwhelmingly positive framing.
4. Branded Query Volume Lift
AI citations that do not generate an immediate click still produce a measurable downstream effect. When a buyer encounters your brand in a Perplexity or ChatGPT response, they often search for your brand name directly in the days that follow. We monitor branded query volume in Google Search Console as a lagging indicator of AI visibility. A sustained increase in branded impressions and clicks, correlated with improvements in your Citation Rate, provides concrete evidence that AI recommendations are driving real awareness.
Metric | What It Measures | Reporting Frequency |
|---|---|---|
Citation Rate | % of queries where your brand appears | Monthly |
AI Share of Voice | Your mentions vs. competitors per platform | Monthly |
Sentiment Score | Positive / neutral / negative citation breakdown | Monthly |
Branded Query Volume Lift | Downstream brand search growth in Google Search Console | Monthly |
How We Select Which Queries to Track
The queries we track are the foundation of everything. If we test the wrong prompts, the metrics are meaningless — we would be measuring visibility for questions your buyers are not actually asking. Query selection is one of the most consequential decisions we make at the start of each engagement, and we approach it with a clear methodology.
The Two Query Types We Prioritise
We build every client's query set around two categories of buyer intent:
Informational queries are the "how to" and "what is" questions buyers ask during the research phase. Examples: "how do I choose a logistics software platform for a mid-size operation?" or "what should I look for in a B2B payments provider?" These queries capture buyers early in the funnel. Appearing in these answers builds category authority and plants your brand name before the comparison stage begins.
High-intent comparison queries are the "X vs Y" and "best [category] for [use case]" questions buyers ask when they are close to a decision. Examples: "which freight management platform is best for cross-border shipping?" or "what are the best options for enterprise AP automation?" These queries have the highest conversion proximity. Research shows that 27% of visitors arriving from AI engine citations become sales-qualified leads, compared to the typical 2-5% from organic search. Winning these answers is where AI visibility converts directly to pipeline.
Where the Queries Come From
We do not guess. We source queries from four places:
- Your sales team's call recordings and CRM notes — the exact language your buyers use when they are evaluating options
- Google Search Console data — your top organic queries, which we then reformulate as conversational AI prompts
- Competitor content audits — the questions your competitors are optimising for, which reveal what the market considers important
- Synthetic prompt generation — we expand your core query list by generating natural language variants, which research from Profound's pilot programmes shows improves visibility coverage by 42% compared to relying on organic keywords alone
The result is a query set of 50-150 prompts that genuinely reflects how your buyers research in AI-powered search. We test this same set consistently every month, so month-over-month comparisons are meaningful rather than distorted by query drift.
Why We Do Not Track Every Query
Breadth without focus produces noise. AI platforms like ChatGPT and Perplexity typically cite 2-7 domains per response. The competitive surface is narrow. We concentrate your query set on the prompts where winning a citation has the highest business impact, not the longest list of tangentially related questions. Quality of query selection matters more than volume.
What "Good Progress" Actually Looks Like
A question we get often: "What should I expect to see in the first few months?" The honest answer is that AI visibility compounds over time, and the trajectory matters more than the absolute number at any given point.
The benchmark progression we target for clients:
- Months 1-2 (Baseline): We establish your starting Citation Rate across the full query set. Most new clients begin between 0-8% on category queries. This is the number we are working to move.
- Months 3-4 (Early gains): Structured content, schema improvements, and initial authority signals begin to register. A move from 5% to 10-12% Citation Rate in this window is a strong early result.
- Months 5-6+ (Compounding): Share of Voice begins to shift. You start appearing in comparison queries, not just informational ones. Branded search volume shows a measurable lift.
The benchmark that matters most is relative momentum. A brand moving from 8% to 14% AI SOV over 60 days is gaining ground. A brand stuck at 22% while a competitor climbs from 10% to 19% is losing competitive position, even with a higher raw number. We report both.
The revenue signal to watch: AI-sourced traffic converts at a fundamentally different rate than traditional organic. ChatGPT referrals convert at 15.9% compared to Google Organic at 1.76%, according to Seer Interactive's case study data. That gap is why Citation Rate is not just a vanity metric. Every percentage point of improvement represents a meaningful shift in the quality of inbound attention your brand receives.
We connect AI visibility gains to pipeline contribution by tracking AI-referred sessions in GA4 with custom channel groups, monitoring form submissions from cited pages, and mapping Citation Rate improvements to changes in demo requests over the same period. The goal is always to close the loop between "we appeared in more AI answers" and "here is what that produced commercially."
Frequently Asked Questions
Will my monthly reports include Share of Voice across different LLMs, or just one platform?
Yes, we report Share of Voice broken down by platform: ChatGPT, Perplexity, Claude, Gemini, and Grok are tracked separately. Your position varies across each one because they pull from different sources and weigh authority signals differently. A consolidated view would obscure where you are winning and where you have gaps. Platform-level breakdowns are what allow us to make targeted decisions about where to direct content and authority-building efforts each month.
How long before I see meaningful movement in my Citation Rate?
Most clients see initial movement between months 3 and 4, with more significant shifts by month 6. AI visibility is not a switch that flips; it compounds as structured content accumulates, third-party citations build, and LLMs update their understanding of your brand's authority. The first two months are primarily about establishing a clean baseline and correcting any entity conflicts or structural issues that are suppressing your visibility.
Do you track branded queries or only category-level queries?
Both. Category queries ("best [solution] for [use case]") measure your competitive visibility against alternatives. Branded queries ("how does [your brand] work?" or "[your brand] vs [competitor]") measure depth of brand understanding within the LLMs. Branded query performance is particularly important because it affects how accurately and favourably AI engines describe your product when buyers specifically ask about you.
What happens if an AI platform changes how it surfaces citations?
It happens, and we account for it. Each major LLM updates its retrieval and synthesis behaviour periodically. When we detect a significant shift in citation patterns across a platform, we flag it in your monthly report with context on what changed and how it affects your numbers. We adjust our content and authority strategies accordingly. The query set itself remains consistent so that platform-level changes are identifiable rather than confused with performance changes.
Can AI visibility metrics be gamed, and how do we ensure the data is reliable?
Short answer: no more than traditional SEO can be gamed sustainably. LLMs assess authority based on the breadth and quality of third-party citations, content depth, and entity consistency across the web. Superficial tactics do not move these signals. Our approach is systematic and durable: structured content that genuinely answers buyer questions, earned mentions in credible publications, and schema markup that helps AI systems understand what your brand does and for whom. The data we report is based on consistent, reproducible query testing, not one-off spot checks, which means the numbers reflect real, sustained performance.