DANYLO PRAVDA
ALL NOTES

FIELD NOTE — GENERATIVE ENGINE OPTIMIZATION — 2026-06-16

How to get cited by AI answer engines in 2026

A field-tested playbook for getting a site found and cited by ChatGPT Search, Perplexity, Claude, and Google's AI Overviews in 2026 — the full set of levers ranked by impact-for-effort, every number traced to its source, the myths that waste your time, and a teardown of how I applied each lever to this very site.

How to get cited by AI answer engines in 2026

When a B2B buyer asks ChatGPT, Perplexity, or Google's AI Overview "who can automate this for me," the engine answers with a handful of cited sources. Getting into that handful is a different game from classic SEO — it's Generative Engine Optimization (GEO). This is the full playbook I assembled while wiring every piece into this site, with the numbers attributed to their sources and ranked by impact-for-effort.

CONTENTS

CH.01

What actually gets you cited (the short version)

Four levers do most of the work in 2026: (1) get indexed in Bing and Google fast, (2) let AI crawlers actually read your HTML, (3) shape a small set of pages so the answer is liftable, and (4) build entity authority. Everything below ranks roughly in that order; llms.txt and caching are real but secondary.

The mental model that matters: answer engines don't pick pages, they pick passages. They expand a query into sub-questions, retrieve passages that cleanly answer them, then rank by how authoritative and quotable the source is. Optimize for "a model can lift this paragraph and attribute it to you," not "this page ranks #1."

CH.02

Get indexed fast: IndexNow + the consoles

The fastest legitimate path for a new domain is IndexNow plus the search consoles — and Bing matters far more than its market share suggests, because ChatGPT Search and Microsoft Copilot are grounded in Bing's index. If you're not in Bing, those engines literally cannot cite you.

Concretely: verify in Google Search Console (submit the sitemap; use URL Inspection → Request Indexing for key URLs) and Bing Webmaster Tools (submit sitemap), and implement IndexNow so every publish pings the index. Bing reports 80M+ sites using IndexNow, ~5B URLs submitted per day, and 17–22% of newly-clicked URLs arriving via IndexNow — so it materially shapes what Bing knows. Google still recommends sitemaps + URL Inspection for fast recrawl of new sites. Effort: small, then automatic.

CH.03

Let the crawlers actually read you: robots, WAF, and no-JS

The most common reason a page "can't be fetched" isn't a block — it's a JavaScript-only shell, a WAF challenge, or a redirect. Independent benchmarks (Vercel + Merj, searchVIU) consistently find that GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot and CCBot do not execute JavaScript — they read raw HTML and stop. Only Googlebot/Gemini and Applebot render JS reliably.

So: keep core content server-rendered (SSG/ISR/SSR), with all text and schema in the initial HTML — never client-only. Allow the AI user-agents in robots.txt. Confirm your CDN/WAF isn't silently challenging them or their IP ranges. When something returns 200 in your browser but an engine "fails to fetch," it's almost always one of those three — test each key URL with the bot's user-agent and confirm 200 HTML with the body text present. Treat SSR as non-negotiable; regressing to client-only rendering makes you invisible to most AI crawlers. Effort: small if you're already SSR; large to fix later.

CH.04

Shape pages so the answer is liftable

Open every section with a 40–60 word "answer capsule" that directly answers the question in its heading; keep sections to one idea in ~120–180 words; use question-style H2/H3s; and add tables, an FAQ, and dated statistics with sources. This is the highest-ROI editorial change you can make.

The evidence is unusually concrete. The Princeton GEO study (Aggarwal et al., KDD 2024) found that adding citations, quotations, and statistics raised a source's visibility in generative answers by 30–40%. Industry analyses report that pages built from short answer blocks + question headings + structured Q&A get 2–3× higher citation rates than dense prose, and FAQ blocks with schema correlate with roughly 40–44% higher AI-citation rates. You don't need this everywhere — pick the handful of pages that map to real buyer questions. Effort: medium, very compounding.

CH.05

Build the pages buyers actually ask for

Answer engines need query-aligned landing pages, not just a strong homepage narrative. Define the 3–5 jobs a buyer would phrase to an AI ("who can build AI agents to automate reporting for a B2B SaaS?") and build a page per job.

Each page: an H1 in the language of the query; a top capsule answering who / what / for whom / where / how-fast in 40–60 words; one flagship case-study excerpt with hard metrics linking to the full study; a short FAQ (cost, speed, how to start); and Service/Product JSON-LD with areaServed, serviceType, and provider linking back to your Person/Organization entity. AI Overviews and ChatGPT Search expand a query into fan-out sub-questions and prefer pages that clearly match intent over a generic homepage. Effort: medium — mostly repackaging material you already have.

CH.06

Structured data, done right

Use Article/BlogPosting (with author + publisher + dates), FAQPage for Q&A, and HowTo for genuine step-by-step workflows — and for gated content, mark the page free and only the locked block gated. The shape that works: isAccessibleForFree: true on the article, with a hasPart WebPageElement carrying isAccessibleForFree: false and a cssSelector pointing at just the registration wall.

The myth to kill: isAccessibleForFree: false on the whole article reads to an answer engine as "paywalled, don't cite." Google's own guidance treats the flag as anti-cloaking metadata, not a reason to drop you — correctly-marked paywalled publishers still get cited. For passages you genuinely don't want quoted, use data-nosnippet / max-snippet / X-Robots-Tag, not the paywall flag. Effort: medium.

CH.07

Caching for freshness (don't overthink it)

Serve public pages with shared-cache-friendly headers (Cache-Control: public, plus ETag/Last-Modified); avoid private/no-store on public URLs. no-store is about caching, not indexing — engines can still index it — but several AI-crawler references note that crawlers use ETag, Last-Modified, and Cache-Control to prioritize freshness and avoid needless refetches, and private on a public page confuses that logic. Keep private, no-store only for genuinely authenticated endpoints. Effort: small.

CH.08

llms.txt: a developer guide, not a ranking factor

Keep an llms.txt, but treat it as an AI-oriented sitemap, not a lever. A 300K-domain study found ~10% adoption and no statistically significant citation lift once you control for content quality and authority; Prerender's 2026 guide is blunt that it "does not control indexing." Worth shipping — list your primary sections and point to clean Markdown exports — because it's nearly free and dev-facing AI tooling does read it, but don't expect ranking magic. Effort: small.

CH.09

Authority is the real moat

Entity clarity and brand mentions predict AI citations more strongly than backlinks do. One Ahrefs-based synthesis put the correlation of brand mentions with AI-citation probability at ≈0.664 versus ≈0.218 for backlinks; analyses of Google AI Overviews find ~96% of citations come from sources with strong E-E-A-T signals (with an r≈0.8 correlation between E-E-A-T proxies and selection).

For a solo brand: stable @id entities in your JSON-LD with sameAs to every profile (GitHub, LinkedIn, talks, interviews); a consistent name and brand everywhere; a real author bio; and a slow accumulation of third-party mentions that name you as the builder. Phrase your own claims so they're quotable and attributable ("in our btc-bot system we processed N events/day at Y µs"). Non-promotional, evidence-dense writing correlates positively with citation; salesy copy correlates negatively. Write like a source, not a brochure. Effort: medium–large, but the most durable lever.

CH.10

Measure it, or you're guessing

AI citations drift 40–60% month over month, so one-off "did ChatGPT mention me" checks are useless — you need continuous tracking. Tools like Otterly.AI or Peec AI track which engines cite which of your URLs across ChatGPT, Perplexity, AI Overviews, Gemini and Copilot (Profound is the enterprise tier). Pair that with server-log analysis of AI user-agents so you can see crawl → citation, and periodically sanity-check against AI-crawler benchmark reports (Vercel, Promagen) so a stack change hasn't broken visibility. Define a fixed prompt set around your real buyer queries and run it weekly. Effort: medium to set up, small to run.

CH.11

The myths I'm ignoring

  • "llms.txt is a ranking factor." ~10% adoption, no measurable independent effect once you control for content quality — a convenience file, not a lever.
  • "Rank #1 in Google → AI Overviews cite you." The share of AI-Overview citations from top-10 organic results fell from ~70% in 2025 to roughly 17–38% in 2026; structure and authority now matter as much as rank.
  • "Backlinks are the main driver." Brand/entity mentions and factual density correlate more strongly than raw link volume (≈0.664 vs ≈0.218).
  • "AI crawlers render JS like Googlebot." They mostly don't — SPA-only content is invisible to them.
  • "Blocking GPTBot doesn't affect ChatGPT Search." GPTBot and OAI-SearchBot both feed ChatGPT's retrieval; block GPTBot and you can lose search visibility too.
  • "isAccessibleForFree: false kills your AI chances." Google indexes and features correctly-marked paywalled content; the failure mode is flagging a mostly-free page as entirely gated.
  • "robots.txt reliably blocks AI training." Many training bots ignore it; real enforcement needs WAF-level controls.

CH.12

Applied: auditing and fixing this very site

Build-in-public, so here's the dogfood — I ran the playbook above against pravda.systems itself. The trigger was Perplexity claiming it "couldn't fetch" a page that returned a clean 200 in a browser. A six-dimension audit (crawlability, structured data, metadata, AI-discoverability, indexing, Core Web Vitals) found that access was never the problem — the signals were:

  • The structured data was telling AI not to cite it. Every note carried isAccessibleForFree: false — Google's paywall flag — even though the articles are free. Fixed: true on the page, with the gated marker only on the actual confidential regions of each note.
  • The pages most meant for AI were the least cacheable. The field notes were force-dynamic with Cache-Control: private, no-store (a cold fetch every crawl), while the static case studies were edge-cached. Fixed: public notes prerender via generateStaticParams and edge-cache; only gated entries stay dynamic.
  • The clean Markdown exports were undiscoverable. Every page has a /md export, but nothing advertised it. Fixed: llms.txt lists them all, each page exposes a text/markdown alternate link, and /md is noindex (no duplicate content) but still AI-fetchable.
  • The boring wins. A generated OpenGraph card (it was blank), BreadcrumbList + enriched article schema, og:type=article + dates, an Atom feed, IndexNow on every deploy, and LCP images no longer hidden behind an opacity:0 reveal.
  • What no script can do. Verifying the domain in Google Search Console + Bing Webmaster — done; both sitemaps submitted, all 14 URLs discovered.

The lesson matched the theory: for a content site, "can the AI fetch it" is rarely the real question — it's "is it indexed," "do the signals say it's free and authoritative," and "is the answer easy to lift." The audit didn't find a locked door; it found a site politely telling visitors it was closed. Still on the list: the query-aligned service pages, more entity authority, and a citation tracker. If this note gets cited by an answer engine, the experiment worked.

geoseoai-searchanswer-enginesperplexitychatgpt
DISCUSSION

No comments yet — start the conversation.

Sign in to join the discussion — it's free.