Ships 19 tools that audit why AI engines do or don't cite your pages. You get schema validation against citation best practices, robots.txt parsing for GPTBot and friends, llms.txt linting and generation, canonical URL hygiene checks, and scoring functions that predict AI Overview eligibility and answer extractability per content block. The rewrite tools restructure pages for AEO and GEO formats. Runs as an MCP server inside Claude or any stdio client, doubles as a GitHub Action to gate PRs on minimum citation scores, and includes headless rendering for SPAs. No API keys. The audit response hands you ranked findings with exact fixes and estimated impact, not opaque scores.
Public tool metadata for what this MCP can expose to an agent.
audit.pageFull AI-SEO audit of a single URL: returns categorized findings (info/warning/error) with severity, fix instructions, and a 0-100 composite score plus per-dimension subscores. Read-only. Fetches the URL once and runs every sub-audit (schema, robots, technical, sitemap, AI-Over...5 paramsFull AI-SEO audit of a single URL: returns categorized findings (info/warning/error) with severity, fix instructions, and a 0-100 composite score plus per-dimension subscores. Read-only. Fetches the URL once and runs every sub-audit (schema, robots, technical, sitemap, AI-Over...
urlstringrenderstringstatic · headlessdefault: staticrespect_robotsbooleangenerate_reportbooleaninclude_raw_htmlbooleanaudit.schemaValidate JSON-LD structured data against Schema.org rules and AI-citation best practices. Accepts either a URL (fetched) or a raw JSON string (parsed directly). Read-only when given `url` (one HTTP GET). Zero network when given `schema_json`. No writes. Deterministic, rule-bas...3 paramsValidate JSON-LD structured data against Schema.org rules and AI-citation best practices. Accepts either a URL (fetched) or a raw JSON string (parsed directly). Read-only when given `url` (one HTTP GET). Zero network when given `schema_json`. No writes. Deterministic, rule-bas...
urlstringschema_jsonstringrespect_robotsbooleanaudit.canonicalAudit a page's canonical link integrity: presence, self-reference, cross-domain mismatches, trailing-slash hygiene, and og:url consistency. Read-only. One HTTP GET to fetch the HEAD section. Deterministic, rule-based; no LLM. When to use: a focused canonical-only audit (e.g. d...2 paramsAudit a page's canonical link integrity: presence, self-reference, cross-domain mismatches, trailing-slash hygiene, and og:url consistency. Read-only. One HTTP GET to fetch the HEAD section. Deterministic, rule-based; no LLM. When to use: a focused canonical-only audit (e.g. d...
urlstringrespect_robotsbooleancheck.robotsFetch and parse a domain's robots.txt; report per-crawler allow/disallow posture for every known AI training crawler (GPTBot, CCBot, Anthropic-AI, Google-Extended, etc.), AI search crawlers (ChatGPT-User, PerplexityBot, OAI-SearchBot), and user-triggered fetchers. Read-only. O...1 paramsFetch and parse a domain's robots.txt; report per-crawler allow/disallow posture for every known AI training crawler (GPTBot, CCBot, Anthropic-AI, Google-Extended, etc.), AI search crawlers (ChatGPT-User, PerplexityBot, OAI-SearchBot), and user-triggered fetchers. Read-only. O...
domainstringcheck.sitemapValidate a domain's XML sitemap: presence, accessibility, URL count, lastmod freshness, sitemap-index handling, and image/video sitemap extensions. Read-only. Issues N+1 HTTP GETs: one for robots.txt + sitemap, then up to `max_urls_to_check` HEADs against sampled URLs. Determi...2 paramsValidate a domain's XML sitemap: presence, accessibility, URL count, lastmod freshness, sitemap-index handling, and image/video sitemap extensions. Read-only. Issues N+1 HTTP GETs: one for robots.txt + sitemap, then up to `max_urls_to_check` HEADs against sampled URLs. Determi...
domainstringmax_urls_to_checkintegercheck.technicalAudit a page's HEAD section for technical signals relevant to AI crawlers: HTTPS, canonical, OpenGraph, Twitter Card, hreflang, noindex, and title-vs-H1 hygiene. Read-only. One HTTP GET, inspects HEAD only (body is not parsed). Deterministic, rule-based; no LLM. When to use: w...2 paramsAudit a page's HEAD section for technical signals relevant to AI crawlers: HTTPS, canonical, OpenGraph, Twitter Card, hreflang, noindex, and title-vs-H1 hygiene. Read-only. One HTTP GET, inspects HEAD only (body is not parsed). Deterministic, rule-based; no LLM. When to use: w...
urlstringrespect_robotsbooleanscore.ai_overview_eligibilityScore a page's probability of appearing in Google AI Overviews. Returns an overall 0-100 score plus six factor subscores: semantic completeness, structured data, E-E-A-T signals, entity density, freshness, and technical hygiene. Read-only. One HTTP GET. Deterministic, rule-bas...2 paramsScore a page's probability of appearing in Google AI Overviews. Returns an overall 0-100 score plus six factor subscores: semantic completeness, structured data, E-E-A-T signals, entity density, freshness, and technical hygiene. Read-only. One HTTP GET. Deterministic, rule-bas...
urlstringrespect_robotsbooleanscore.agentic_browsingScore a page against the four signals Google added to the Lighthouse "Agentic Browsing" category in May 2026: presence of an llms.txt, WebMCP integration, accessibility-tree integrity, and layout stability. Returns an overall 0-100 score, a letter grade, and a per-factor break...5 paramsScore a page against the four signals Google added to the Lighthouse "Agentic Browsing" category in May 2026: presence of an llms.txt, WebMCP integration, accessibility-tree integrity, and layout stability. Returns an overall 0-100 score, a letter grade, and a per-factor break...
urlstringhtmlstringrenderstringstatic · headlessdefault: staticcheck_llms_txtbooleanrespect_robotsbooleanllms_txt.generateGenerate a spec-compliant llms.txt (and optionally llms-full.txt) for a domain by reading its sitemap, sampling up to `max_pages` pages, and synthesizing a grouped, sectioned summary. Read-only. Issues one HTTP GET for the sitemap then one per sampled page. Deterministic; no L...5 paramsGenerate a spec-compliant llms.txt (and optionally llms-full.txt) for a domain by reading its sitemap, sampling up to `max_pages` pages, and synthesizing a grouped, sectioned summary. Read-only. Issues one HTTP GET for the sitemap then one per sampled page. Deterministic; no L...
domainstringmax_pagesintegersite_namestringinclude_fullbooleansite_descriptionstringpricing.generateGenerate a machine-readable /pricing.md for AI shopping/agent flows. Finds the site's pricing page (or uses `pricing_url`), extracts named tiers and price lines, and returns the file content as a string. Read-only. Issues a few HTTP GETs probing common pricing paths. Determini...2 paramsGenerate a machine-readable /pricing.md for AI shopping/agent flows. Finds the site's pricing page (or uses `pricing_url`), extracts named tiers and price lines, and returns the file content as a string. Read-only. Issues a few HTTP GETs probing common pricing paths. Determini...
domainstringpricing_urlstringllms_txt.validateValidate an existing llms.txt or llms-full.txt against the spec: structure, section ordering, link format, and (optionally) broken-link detection. Read-only. One HTTP GET when given `url`; zero network when given `content`. Optional link-check issues HEAD requests against each...3 paramsValidate an existing llms.txt or llms-full.txt against the spec: structure, section ordering, link format, and (optionally) broken-link detection. Read-only. One HTTP GET when given `url`; zero network when given `content`. Optional link-check issues HEAD requests against each...
urlstringcontentstringcheck_linksbooleanscore.citation_worthinessScore how citable a page or text block is for AI engines (ChatGPT, Claude, Perplexity, Google AI Overviews). Evaluates BLUF (bottom-line-up-front) opening, FAQ patterns, statistic density, entity clarity, and answer-shape fit for the optional `target_query`. Also returns `extr...4 paramsScore how citable a page or text block is for AI engines (ChatGPT, Claude, Perplexity, Google AI Overviews). Evaluates BLUF (bottom-line-up-front) opening, FAQ patterns, statistic density, entity clarity, and answer-shape fit for the optional `target_query`. Also returns `extr...
urlstringtextstringtarget_querystringrespect_robotsbooleanrewrite.aeoRewrite a content block for Answer Engine Optimization. Adds a BLUF opening, FAQ structure, schema additions, and concise question-shaped headings tuned for ChatGPT / Perplexity / Google AI Overviews. Read-only when given `url` (one HTTP GET). Zero network when given `text`. T...6 paramsRewrite a content block for Answer Engine Optimization. Adds a BLUF opening, FAQ structure, schema additions, and concise question-shaped headings tuned for ChatGPT / Perplexity / Google AI Overviews. Read-only when given `url` (one HTTP GET). Zero network when given `text`. T...
urlstringtextstringformatstringarticle · faq · howto · comparisondefault: articlemax_wordsintegertarget_querystringrespect_robotsbooleanrewrite.geoRewrite a content block for Generative Engine Optimization: entity-rich, comparison-ready, synthesis-friendly. Tuned for surfaces that summarize across sources (Perplexity, Google AI Mode, Claude search). Read-only on input. Does NOT write back to the source URL - returns the...6 paramsRewrite a content block for Generative Engine Optimization: entity-rich, comparison-ready, synthesis-friendly. Tuned for surfaces that summarize across sources (Perplexity, Google AI Mode, Claude search). Read-only on input. Does NOT write back to the source URL - returns the...
urlstringtextstringmax_wordsintegertarget_querystringrespect_robotsbooleanadd_comparison_tablebooleanextract.entitiesExtract named entities, linked concepts, and sameAs graph nodes from a page's content and structured data. Combines body-text NER with JSON-LD `@type` / `sameAs` walking. Read-only when given `url` (one HTTP GET). Zero network when given `text`. Primary path: MCP sampling - th...4 paramsExtract named entities, linked concepts, and sameAs graph nodes from a page's content and structured data. Combines body-text NER with JSON-LD `@type` / `sameAs` walking. Read-only when given `url` (one HTTP GET). Zero network when given `text`. Primary path: MCP sampling - th...
urlstringtextstringrenderstringstatic · headlessdefault: staticrespect_robotsbooleanscore.test_citationSimulate `would an AI engine cite this page for this query?`. The host LLM role-plays the chosen engine (chatgpt / claude / perplexity / google_ai_overviews / any), reads the page content, and returns a cite/no-cite verdict with the verbatim excerpt it would surface plus ranke...5 paramsSimulate `would an AI engine cite this page for this query?`. The host LLM role-plays the chosen engine (chatgpt / claude / perplexity / google_ai_overviews / any), reads the page content, and returns a cite/no-cite verdict with the verbatim excerpt it would surface plus ranke...
urlstringtextstringenginestringchatgpt · claude · perplexity · google_ai_overviews · anydefault: anytarget_querystringrespect_robotsbooleandiff.pagesCompare two URLs for AI citation-worthiness and return a structured breakdown of which page is more likely to be cited and why. Typical use: your page (url_a) vs a competitor's page (url_b). Read-only. Runs audit.page on both URLs in parallel (2 HTTP fetches per URL), then dif...4 paramsCompare two URLs for AI citation-worthiness and return a structured breakdown of which page is more likely to be cited and why. Typical use: your page (url_a) vs a competitor's page (url_b). Read-only. Runs audit.page on both URLs in parallel (2 HTTP fetches per URL), then dif...
querystringurl_astringurl_bstringrespect_robotsbooleanaudit.siteSingle-call site sweep: runs audit.page (homepage), check.robots, check.sitemap, and audit.schema in parallel and returns an overall grade (A–F) plus top-5 highest-impact fixes. Read-only. Issues several HTTP GETs against the domain (homepage fetch, robots.txt, sitemap.xml, an...2 paramsSingle-call site sweep: runs audit.page (homepage), check.robots, check.sitemap, and audit.schema in parallel and returns an overall grade (A–F) plus top-5 highest-impact fixes. Read-only. Issues several HTTP GETs against the domain (homepage fetch, robots.txt, sitemap.xml, an...
domainstringrespect_robotsbooleanaudit.sitemapSite-wide content audit: discovers the sitemap, samples N URLs by deterministic uniform stride, runs audit.page on each, and returns score distribution + worst pages + most-common findings. Read-only. One HTTP GET for sitemap discovery, optionally a few more for sitemap-index...4 paramsSite-wide content audit: discovers the sitemap, samples N URLs by deterministic uniform stride, runs audit.page on each, and returns score distribution + worst pages + most-common findings. Read-only. One HTTP GET for sitemap discovery, optionally a few more for sitemap-index...
domainstringconcurrencyintegersample_sizeintegerrespect_robotsbooleanreport.saveRender an audit.page or audit.site result as a Markdown report and write it to a file under MCP_WORKSPACE_ROOT (defaults to cwd).3 paramsRender an audit.page or audit.site result as a Markdown report and write it to a file under MCP_WORKSPACE_ROOT (defaults to cwd).
pathstringoverwritebooleanaudit_resultvalueAI Citation Toolkit for the Model Context Protocol
Audit why AI systems do or do not cite your pages. MCP server. No API keys.
Works inside Claude, Cursor, Windsurf, Codex, and any MCP client that speaks stdio.
robots.txtllms.txt - present, spec-compliant, links alivesameAs coverage that help AI systems identify the subjectog:url, hreflang, noindex trapslastmod signals that tell crawlers the page is currentYou: Run an AI-SEO audit on
https://automatelab.tech/launching-the-ai-seo-mcp/.
Result (truncated):
{
"url": "https://automatelab.tech/launching-the-ai-seo-mcp/",
"score": 61,
"grade": "C",
"dimension_scores": {
"schema": 45, "technical": 80, "structure": 40,
"robots": 90, "freshness": 85, "authority": 40,
"entity_density": 21, "sitemap": 100
},
"findings": [
{
"severity": "critical",
"category": "structure",
"message": "No FAQ structure found (no FAQPage schema or H3 question headings).",
"fix": "Add FAQ H3 headings ending in '?' with answer paragraphs, and a FAQPage JSON-LD block.",
"estimated_impact": "high"
},
{
"severity": "warning",
"category": "authority",
"message": "Low authority signals - missing Organization or author Person schema.",
"fix": "Add Organization JSON-LD and Article.author as a Person node with sameAs links.",
"estimated_impact": "high"
}
]
}
Each finding names the exact fix. No opaque scores, no guesswork.
npx -y @automatelab/ai-seo-mcp
Requires Node 20 or later.
Add to %APPDATA%\Claude\claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"ai-seo": {
"command": "npx",
"args": ["-y", "@automatelab/ai-seo-mcp"]
}
}
}
Restart Claude Desktop. Any MCP client that supports stdio transport works - same command / args pattern.
By default audit_page reads raw HTML — fast, but misses content on React/Vue/Angular SPAs. Pass render: "headless" to spin up Chromium and audit the rendered DOM (adds 3-10s per audit).
One-time install:
npm install playwright-core
npx playwright install chromium
Then call audit_page with render: "headless". Use static for everything else — most marketing sites and docs render fine without it.
This repo doubles as a GitHub Action. Drop it in a workflow to fail a PR when any page regresses below an AI-citation score - the same audit engine, gated on every change.
- uses: actions/checkout@v4
- name: AI-SEO audit
uses: AutomateLab-tech/ai-seo-mcp@v0.5.0
with:
urls: "https://example.com,https://example.com/pricing"
min-score: "70" # fail if any URL scores below this
respect-robots: "true" # set false for staging / sites you own
report-path: "ai-seo-report.md" # optional Markdown report artifact
fail-on-regression: "true"
The Action builds the auditor from the pinned ref, runs audit_page on each URL, writes a scorecard to the job summary, and exits non-zero if any URL falls below min-score (when fail-on-regression is true). Outputs: min_score_observed, urls_audited, report_path. Full example: examples/github-action-usage.yml.
| Tool | Purpose |
|---|---|
audit_page | Composite AI-SEO audit with 8-dimension scoring (schema, technical, structure, robots, freshness, authority, entity density, sitemap). |
audit_schema | Validate JSON-LD against Schema.org rules and AI-citation best practice. Flags deprecated patterns. |
audit_canonical | Canonical link integrity, trailing-slash hygiene, og:url consistency. |
audit_site | Single-call site sweep: audit_page + check_robots + check_sitemap + audit_schema with overall grade and top-5 fixes. |
audit_sitemap | Site-wide content audit: stride-sample N URLs from the sitemap, run audit_page on each, return distribution + worst pages + top findings. |
check_robots | Parse robots.txt and report per-crawler allow/disallow for all known AI crawlers. Surfaces the GPTBot-blocked-but-OAI-SearchBot-allowed trap. |
check_sitemap | Validate XML sitemaps: presence, URL count, lastmod freshness, image/video extensions. |
check_technical | HEAD tag audit: canonical, OpenGraph, Twitter Card, hreflang, HTTPS, noindex, title hygiene. |
score_ai_overview_eligibility | Score a page's probability of appearing in Google AI Overviews using current correlation factors. |
score_citation_worthiness | Score how citable a page or text block is for Perplexity, ChatGPT, Google AI Overviews, and Claude. Includes per-section chunk_analysis / extractability_score: how cleanly an LLM can lift a standalone answer from each heading. |
score_agentic_browsing | Score a page against the Lighthouse "Agentic Browsing" category (May 2026): llms.txt, WebMCP, accessibility-tree integrity, and layout stability. |
score_test_citation | Simulate "would an AI engine cite this for this query?" via MCP sampling, with deterministic heuristic fallback. |
llms_txt_generate | Generate llms.txt and optionally llms-full.txt from a domain's sitemap. |
llms_txt_validate | Lint an existing llms.txt for spec compliance and broken links. |
rewrite_aeo | Rewrite content for Answer Engine Optimization (BLUF structure, FAQ format, schema additions). |
rewrite_geo | Rewrite content for Generative Engine Optimization (entity definitions, comparison tables, synthesis-ready structure). |
extract_entities | Extract named entities, sameAs links, and citation-density score from a page's content and structured data. |
diff_pages | Compare two URLs for AI citation-worthiness: side-by-side dimension scores, gap analysis, and prioritized fix recommendations for url_a. |
report_save | Render an audit_page / audit_site result as a Markdown report and write it to disk under MCP_WORKSPACE_ROOT. |
v0.4.0 renamed tools from flat
snake_caseto dot-notation (audit_page,check_robots, …) for a navigable hierarchy. Update any saved invocations.
Environment variables: see ENV.md.
Bug reports, feature ideas, and PRs welcome. See CONTRIBUTING.md.
To report a vulnerability, see SECURITY.md.
MIT - see LICENSE.
Built by automatelab.tech