SEO & AEO Basics

SEO is the discipline of getting your pages found and ranked by traditional search engines. AEO — Answer Engine Optimization — is the newer discipline of getting your content surfaced by AI answer engines. They overlap heavily but are not the same. This page walks through every check our scanner runs, what to do about it, and links to the canonical sources from Google, schema.org, MDN, and others.

What is AEO?

Answer Engine Optimization is the practice of structuring content so AI systems — ChatGPT, Perplexity, Claude, Google's AI Overviews, Gemini — can extract, cite, and recommend it. These systems do not behave like a classic search ranking. They read pages, decide which sources are trustworthy and well-structured, and synthesize an answer. Pages that win in AEO tend to share a few traits: clear schema, named authors, fresh dates, question-style headings, short summary paragraphs, and outbound citations.

SEO checks the scanner runs

HTTPS

Search engines distrust unencrypted sites. If your site is on HTTP, migrate to HTTPS — Let's Encrypt offers free certificates. Google has confirmed HTTPS as a ranking signal since 2014.

Title tag

The <title> tag is the single biggest on-page SEO signal. Aim for 30–60 characters — Google truncates titles past 60 chars in search results. Reference: Google Search Central — Title links in Google Search.

Meta description

Not a direct ranking factor, but it shapes click-through rate. Aim for 120–160 characters. Write it for humans, not keywords. Reference: Google Search Central — Snippets.

Canonical URL

Tells search engines which version of a URL is authoritative. Critical for sites that serve the same content under multiple URLs (with/without trailing slash, with tracking params, etc.). Reference: Google Search Central — Consolidate duplicate URLs.

Robots meta

Controls indexing on a per-page basis. noindex tells search engines to skip the page entirely — only use it on pages you genuinely don't want ranked. Reference: Google Search Central — Robots meta tag.

H1 and heading hierarchy

Headings (<h1><h6>) are how search engines, AI engines, and screen readers parse the document outline — the topic, the major sections, and the subsections. Treat them as semantic structure, not as a way to make text bigger or smaller (use CSS for that).

The rules

  • Exactly one <h1> per page, and it should describe the whole page's topic. Multiple H1s confuse crawlers about what the page is primarily about.
  • The first heading on the page should be the H1. If your page starts with an <h2> or <h3> with no H1 above it, search engines have to guess at the page's subject. This is one of the most common problems we find on CMS-built pages where the template wraps content in lower-level headings.
  • Use a new <h2> every time the page enters a new top-level section. Unlike H1, you can (and should) have many H2s — one for each major topic on the page (e.g. Overview, Specifications, Installation, FAQs). After you've used H3s under one H2, returning to a fresh H2 tells crawlers "a new top-level section starts here." That's how the document outline gets built.
  • Use <h3> for subsections inside an H2 (and <h4> for subsections inside those, etc.). Each level represents nesting under the level above it.
  • Don't skip levels. After an <h2>, the next nested heading should be <h3> — not <h4>. Skipped levels (H2 → H4) break the outline and are an explicit WCAG accessibility violation (WCAG 2.4.6 — Headings and Labels).
  • Use headings to outline content, not to style text. If you want bold or larger text without semantic meaning, use <strong> or CSS — promoting visual emphasis to a heading pollutes the outline.

What good structure looks like

<h1>AssetLocker Pro Smart Asset Tracker</h1>
  <h2>Overview</h2>
  <h2>Specifications</h2>
    <h3>Hardware</h3>
    <h3>Battery life</h3>
  <h2>Installation</h2>
  <h2>FAQs</h2>
    <h3>How accurate is the GPS?</h3>
    <h3>Does it work indoors?</h3>

Common mistakes our scanner catches

  • No H1 anywhere on the page — flagged as FAIL. Crawlers can't reliably identify the page's topic.
  • First heading is H3 (or lower) instead of H1 — flagged as WARN with the actual heading text. Common with HubSpot, Webflow, Squarespace templates that wrap product/post titles in styled H3s.
  • Multiple H1s (e.g., from a header banner + a content title) — flagged as WARN. Pick one.
  • Skipped levels (H2 → H4, H1 → H3) — flagged as WARN with the specific transitions called out.

Why it matters for AEO specifically

AI answer engines (ChatGPT, Perplexity, Claude, Google AI Overviews) use heading hierarchy to extract the question/answer pairs they cite. A page with the structure H1 → H2 (question) → H3 (sub-question) gives the engine a clean tree it can pull from. A page with chaotic heading levels gets parsed as a flat blob and is much less likely to be cited.

References: MDN — Heading elements · W3C WAI — Headings tutorial · Google Developer Documentation Style Guide — Headings · HTML Living Standard — Headings and sections.

Image alt text

Every meaningful image needs alt. It's an accessibility requirement and a ranking signal for image search. Reference: Google Search Central — Google Images SEO best practices and W3C WAI — Images tutorial.

Open Graph tags

og:title, og:description, og:image, og:url control how your URL appears when shared on social platforms. Missing OG tags lead to ugly previews. Reference: The Open Graph protocol — ogp.me.

Mobile viewport and language

<meta name="viewport"> is required for mobile rendering, and Google has been mobile-first since 2020. <html lang> tells crawlers and AI systems what language the page is in. Reference: MDN — Viewport meta tag.

robots.txt and sitemap.xml

/robots.txt controls crawler access. /sitemap.xml helps search engines discover all your URLs. Reference your sitemap from robots.txt with a Sitemap: line. Canonical specs: RFC 9309 — Robots Exclusion Protocol and sitemaps.org — XML sitemap protocol.

AEO checks the scanner runs

JSON-LD structured data

This is the single highest-leverage AEO investment. Add <script type="application/ld+json"> blocks describing your page as an Article, FAQPage, Organization, etc. AI engines lean heavily on schema to understand and cite content. References: schema.org (the canonical vocabulary) and Google Search Central — Structured data introduction.

FAQPage schema

If your page contains question-and-answer content, mark it up with FAQPage schema. AI engines often pull FAQ entries directly into answers. Reference: Google Search Central — FAQPage.

Article schema

For content pages, add Article or BlogPosting JSON-LD with headline, author, datePublished, dateModified, and publisher. Reference: Google Search Central — Article.

Author and E-E-A-T

E-E-A-T stands for Experience, Expertise, Authoritativeness, Trustworthiness. Add an author entity to your schema with sameAs links to social profiles, and a real author bio on the page. AI engines weight perceived author authority. Reference: Google's Search Quality Rater Guidelines (PDF) — the document where E-E-A-T is defined.

Question-shaped headings

People ask AI engines questions. Headings phrased as questions ("How does X work?", "What is Y?") match the way users query and how AI engines extract answers. Aim for 20%+ of H2/H3 to be questions.

Answer-shaped content

Lead each section with a 1–3 sentence summary that directly answers the heading. Use lists for steps and feature comparisons. AI engines extract these as answer chunks.

Content freshness

AI engines favor recent content. Always include datePublished and dateModified in your schema, and update dateModified when you make substantive edits.

Outbound citations

Cite authoritative sources. Outbound links are a trust signal both for traditional search and for AI systems that try to evaluate source quality. (This guide is itself an example.)

llms.txt

A new convention: a /llms.txt file at your domain root summarizing your site's purpose and key URLs in a format optimized for LLMs. Spec: llmstxt.org. Already adopted by Anthropic, Mistral, FastAPI, Cursor, Hugging Face, and others.

AI crawler access via robots.txt

If you want to appear in AI answer engines, you must allow their crawlers:

Blocking any of them removes you from that engine's index for AI features.

Where to start

Run the scanner on your most important page. Fix anything that comes back as FAIL first, then move to WARN items. The single highest-leverage fixes for AEO are typically: add JSON-LD schema (use our Schema Validator), add a real author with sameAs links, and ensure AI crawlers are not blocked in robots.txt (use our Robots.txt Tester to verify).

Further reading