Citations

The Technical AEO Audit: 12 Things Your Website Needs Before AI Tools Will Cite You

Spencer DukeJune 19, 202610 min read
The Technical AEO Audit: 12 Things Your Website Needs Before AI Tools Will Cite You

Most founders who discover their company isn't showing up in ChatGPT assume it's a content problem. They think they need more blog posts, a PR push, or a stronger social presence. Sometimes that's true. But more often, the issue is more fundamental: AI crawlers can't properly read your site, don't know who you are as an entity, or can't extract a clean answer from your pages.

Content and citation work can't move the needle on a technically broken foundation. This is the part most AEO conversations skip over. Before you write a single piece of AI-optimised content, your site needs to pass a basic infrastructure test.

Here are the 12 things we check in every audit, what breaking any one of them costs you in citations, and how to check where you stand today.

TL;DR

Before any content or citation work can move the needle, your site needs to be technically readable by LLMs. Most sites aren't. The 12 infrastructure items below are what unlock AI citation. Check each one before you spend a dollar on content.

Why Technical Infrastructure Comes First

AI engines don't browse your site the way a human does. They crawl it, parse structured signals, cross-reference your entity against third-party sources, and extract answers from your content. If any part of that chain is broken, your content never enters the citation pool, regardless of how good it is.

Think of it this way: you can write the most authoritative, well-structured answer to a target query. But if your robots.txt is blocking the crawler, or your canonical tags are pointing in the wrong direction, or your content has no author attribution, the LLM either can't access it or doesn't trust it enough to cite it.

The uncomfortable truth: most companies that are invisible in AI-generated answers have a technical problem, not a content problem.

The 12 items below fall into four categories:

Crawlability: Can AI systems actually access your site and the right pages on it?

Structured data: Have you told AI engines what your content is, who wrote it, and what organisation it belongs to?

Content readability: Is your content structured so that an LLM can extract a clean, citable answer?

Entity consistency: Does your brand present itself identically across every platform an AI engine might cross-reference?

Fix these before anything else. The content work compounds on top of a clean foundation. Without it, you're building on sand.

The 12-Item Technical AEO Audit

1. robots.txt

What it is: The file that tells crawlers which pages they can and can't access.

What breaking it costs you: If your robots.txt is blocking the crawlers that AI platforms use, your entire site is invisible to them. It doesn't matter what's on those pages.

How to audit it: Go to yourdomain.com/robots.txt. Look for any Disallow rules that might be blocking your key pages or entire directories. Specifically check whether you're blocking any user-agents you don't recognise. Many sites have legacy rules that made sense years ago and now block modern crawlers by accident.

What good looks like: Your key pages and content directories are fully accessible. You're not running blanket Disallow rules without understanding what they're blocking.


2. llms.txt

What it is: A newer file, analogous to robots.txt, but built specifically for LLM crawlers. It signals which content on your site is authoritative, citable, and intended for AI consumption.

What breaking it costs you: Without it, LLMs have no explicit signal about which of your pages to prioritise. They make their own inferences, which aren't always correct.

How to audit it: Go to yourdomain.com/llms.txt. If you get a 404, you don't have one. Most sites don't.

What good looks like: The file exists, points to your most important pages and content, and is structured according to the llms.txt specification.


3. XML Sitemap Completeness

What it is: The map that tells search engines and AI crawlers which pages exist on your site.

What breaking it costs you: Pages missing from your sitemap are pages that LLMs may never discover. Even if those pages are technically accessible, they're less likely to be crawled and indexed.

How to audit it: Go to yourdomain.com/sitemap.xml. Cross-reference the pages listed against the pages you actually want indexed. Look for missing blog posts, landing pages, or service pages. Also check that the sitemap is submitted in Google Search Console.

What good looks like: Every page you want cited is in the sitemap. The sitemap is current and has no broken URLs.


4. Canonical URL Resolution

What it is: Canonical tags tell crawlers which version of a URL is the "official" one when duplicates exist.

What breaking it costs you: www vs non-www conflicts, URL parameters, and canonical mismatches cause LLMs to treat your site as multiple separate entities. This splits your citation authority across versions of the same page and dilutes your overall signal.

How to audit it: Use a tool like Screaming Frog to crawl your site and flag canonical conflicts. Check that your www and non-www versions redirect to a single canonical. Look for pages where the canonical tag points somewhere unexpected.

What good looks like: One canonical version per page, consistently enforced. No conflicting signals across URL variants.


5. FAQ Schema

What it is: Structured data markup that labels individual question-and-answer pairs on your pages so AI engines can extract and cite them directly.

What breaking it costs you: Without FAQ schema, an LLM has to infer which part of your page answers a given query. With it, you're handing the engine a pre-packaged answer it can pull verbatim. This is one of the highest-leverage schema types for AI citation.

How to audit it: Use Google's Rich Results Test on your key pages. If FAQ schema is present and valid, it will show up. If not, it won't.

What good looks like: Every page targeting a question-based query has valid FAQ schema. Each Q&A pair is self-contained, specific, and directly answers the question it's paired with.


6. Organisation Schema

What it is: Structured data that tells AI engines who you are as a company: your name, URL, logo, contact information, social profiles, and founding details.

What breaking it costs you: Without it, LLMs are assembling a picture of your organisation from scattered signals. That picture is often incomplete or inconsistent, which reduces citation confidence.

How to audit it: Use Schema.org's validator or Google's Rich Results Test on your homepage. Check whether Organisation schema is present and whether it includes your name, URL, logo, and sameAs links to your social profiles and directories.

What good looks like: Valid Organisation schema on your homepage. All key fields populated. sameAs links pointing to your LinkedIn, Crunchbase, Google Business Profile, and any other authoritative directory listings.


7. Article Schema with Author Attribution

What it is: Structured data on your blog posts and articles that identifies the piece as an article, names the author, and includes the publication date.

What breaking it costs you: AI engines look for authorship signals when deciding whether to cite a piece of content. Anonymous content with no author attribution is treated as lower-trust. Content with a named author, credentials, and a publication date is treated as more citable.

How to audit it: Run your blog posts through Google's Rich Results Test. Check whether Article schema is present. Look for the author field and the datePublished field specifically.

What good looks like: Every article has Article schema. The author field points to a real person with a bio page. datePublished is accurate and present.


8. H1 Direct Answers

What it is: The practice of placing a direct, concise answer to the page's target question in the first 100 words of the content.

What breaking it costs you: LLMs look near the top of a page for extractable answers. If your content buries the answer 500 words in after a long introduction, the LLM either misses it or passes over your page in favour of one that answers the question immediately.

How to audit it: Open your key pages and read the first paragraph. Ask yourself: if someone searched for the query this page targets, is the answer clearly stated in the opening lines? Or does the page start with background, context, and scene-setting before getting to the point?

What good looks like: The first 100 words of every key page contain a direct, specific answer to the target query. No long wind-ups. No "In this article, we'll explore..." openings.


9. Publication Dates on Content

What it is: Visible and schema-tagged publication dates on your articles and blog posts.

What breaking it costs you: AI engines use publication dates to assess content freshness and relevance. Undated content is treated as lower-confidence. For queries where recency matters, undated content often loses to dated content even when the quality is higher.

How to audit it: Go through your blog and check whether each post displays a publication date visibly on the page. Then verify that the datePublished field is present in the Article schema.

What good looks like: Every piece of content has a visible, accurate publication date. The same date appears in the Article schema. If you've updated a piece, the dateModified field is also present.


What it is: The network of links between your own pages that signals content relationships and helps crawlers navigate your site.

What breaking it costs you: A site with weak internal linking is a collection of isolated pages from an AI crawler's perspective. LLMs use internal link patterns to understand which pages are most important and how topics relate to each other. Orphaned pages, or pages with no inbound internal links, are often invisible in practice even if they're technically indexed.

How to audit it: Use Screaming Frog to map your internal link structure. Look for pages with zero or very few inbound internal links. Check whether your most important content pages are linked from multiple relevant locations across the site.

What good looks like: Key pages are linked from multiple relevant locations. Topic clusters are connected logically. No important pages are sitting as orphans.


11. Google Business Profile Consistency

What it is: Your Google Business Profile listing, which AI engines use as a primary signal for local and category-based queries.

What breaking it costs you: An incomplete, inconsistent, or unclaimed GBP means you're invisible for category-level queries, not just location-specific ones. LLMs pull from GBP data when generating recommendations for "best [service type] in [city]" and similar queries.

How to audit it: Search for your business name in Google Maps. Check that the listing is claimed, that the business name matches exactly what's on your website, that the description is accurate and complete, and that your hours, phone number, and URL are current.

What good looks like: Claimed, complete, and consistent. Business name, description, and contact details match your website exactly. Categories are accurate and specific.


12. Entity Alignment Across Directories

What it is: The consistency of your brand's name, description, URL, and contact details across every platform an AI engine might cross-reference: LinkedIn, Crunchbase, Apple Maps, Bing Places, and any vertical-specific directories relevant to your industry.

What breaking it costs you: AI engines cross-reference your brand across the web before deciding how confidently to cite you. If LinkedIn says one thing, Crunchbase says another, and your website says a third, the LLM's confidence in your entity drops. Inconsistency reads as unreliability.

How to audit it: Search your brand name across the major directories. Check that your business name is identical everywhere (not "Acme Inc." in one place and "Acme" in another). Verify that your URL, description, and founding details are consistent.

What good looks like: Identical name, URL, and description across every listing. No outdated information sitting on platforms you forgot you signed up for three years ago.

How to Prioritise When You Find Problems

Most sites that go through this audit find issues in multiple areas. The question is where to start.

The answer depends on severity, but there's a general order that makes sense for most sites:

Fix crawlability first. robots.txt, sitemap, and canonical issues are foundational. If crawlers can't access your site or are getting confused about which URL is canonical, nothing else matters. These are also usually the fastest to fix.

Implement structured data next. Organisation schema, Article schema, and FAQ schema have the highest direct impact on citation probability. They're the difference between an LLM having to guess what your content is about and being told explicitly.

Then address content readability. H1 direct answers and publication dates are content-level changes. They require going through your existing pages and reformatting. This takes time but compounds quickly once done.

Entity consistency is ongoing. Directory alignment isn't a one-time fix. As you add new listings and update your business details, consistency needs to be maintained.

The single highest-impact fix for most sites: FAQ schema on key pages, combined with H1 direct answers in the opening paragraph. These two changes alone can move citation rates meaningfully for sites that previously had neither.

Frequently Asked Questions

Can I do a technical AEO audit myself?

Yes. Everything in this checklist is something a technically capable founder or developer can work through independently. The tools referenced here (Screaming Frog, Google's Rich Results Test, Schema.org's validator, Google Search Console) are all free or low-cost. The audit itself isn't the hard part. The hard part is fixing everything you find, especially the structured data implementation, which requires adding and validating JSON-LD across your entire site. If you have a developer comfortable with schema markup, the technical fixes are manageable. If you don't, the implementation phase is where most teams get stuck.

How long does it take to implement AEO technical fixes?

For a site with 50 to 200 pages, a thorough technical AEO implementation typically takes three to four weeks when done properly. Crawlability fixes (robots.txt, sitemap, canonicals) can be done in a day or two. Structured data implementation across all key pages is the most time-consuming part, particularly if your CMS doesn't have native schema support. Entity alignment across directories can be done in parallel and usually takes a week of focused work. Plan for four weeks total if you're doing this alongside other priorities.

Which technical AEO fix has the biggest impact?

FAQ schema, consistently. It's the fix that most directly changes how AI engines interact with your content. When you add valid FAQ schema to a page, you're giving LLMs a pre-packaged, extractable answer they can cite verbatim. Without it, they have to infer the answer from unstructured prose. That inference is less reliable and less likely to result in a citation. Pair FAQ schema with H1 direct answers in the opening paragraph and you've addressed the two most common reasons AI engines pass over otherwise good content.

We Handle All 12 of These in Month 1

This checklist is the foundation layer of every Windgrove engagement. Before we write a single piece of content or build a single citation, we go through all 12 of these items on your site. We fix what's broken, implement what's missing, and verify everything is working before the content engine starts.

Here is what that looks like in practice:

Week 1: robots.txt audit and rewrite, sitemap review, canonical conflict identification and resolution, llms.txt configuration.

Week 2: Organisation schema, Article schema, and FAQ schema implemented across your key pages. Rich Results Test validation on every page touched.

Week 3: H1 direct answer restructuring on existing content, publication date audit and correction, author attribution added to all articles.

Week 4: Google Business Profile optimisation, entity alignment across LinkedIn, Crunchbase, Apple Maps, Bing Places, and any vertical-specific directories. Internal link architecture review and gap-filling.

By the end of Month 1, your site is technically ready for AI citation. Month 2 is when the content work begins, and it compounds on a foundation that's actually solid.

If you want to see what this looks like for your specific site, we're happy to take a look. We'll tell you exactly where you stand across all 12 items before any commitment.