Citation Intelligence

324 toolsSTDIOregistry active

Summary

Gives your agent real citation data from Perplexity, Claude, ChatGPT, Gemini, and Google AI Overviews without subscribing to a $300/mo dashboard. Self-hosted via stdio, bring your own API keys. Built around seven tool namespaces: citations.provenance for cross-engine consensus on who cites what, domain.am_i_cited to check if your site shows up, competitors.compete for side-by-side visibility, and audit.schema for structured data diagnostics. Also surfaces Google Search Console gaps, Wikipedia backlinks, and bot access checks. Designed for SEO practitioners who want programmatic citation tracking in their workflow and agents that need to know which sources LLMs already trust.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Tools

Public tool metadata for what this MCP can expose to an agent.

24 tools

citations.checkReturn URLs cited by an AI engine (Perplexity, Claude, ChatGPT, Gemini, or Bing) for a query. Use this when an agent or user wants to see what sources an AI search engine grounds answers on. Requires at least one engine API key; auto-picks the first available.4 params

Return URLs cited by an AI engine (Perplexity, Claude, ChatGPT, Gemini, or Bing) for a query. Use this when an agent or user wants to see what sources an AI search engine grounds answers on. Requires at least one engine API key; auto-picks the first available.

Parameters* required

querystring

The search query to test (what would a user ask an AI?)

enginestring

Engine to query. • perplexity / google_ai_mode — consumer_scrape: closest to real product behavior. • claude / openai / gemini — api_proxy: API-tier call, may differ from consumer product. • bing_serp / brave_serp — web_rank: traditional SERP rank, NOT LLM citation. 'auto' prefers SerpAPI (google_ai_mode) → Perplexity → LLM adapters → web_rank.one of perplexity · claude · openai · gemini · bing_serp · brave_serpdefault: auto

max_resultsinteger

Maximum citations to return.default: 10

perplexity_modelstring

Perplexity model override (e.g. 'sonar', 'sonar-pro', 'sonar-reasoning'). Only used when engine='perplexity'. Defaults to 'sonar-pro'.

domain.am_i_citedCheck whether a domain is cited by an AI engine across a cluster of queries. Returns per-query presence, rank, and a citation-rate summary. Use to measure visibility for a brand, product, or content site in AI search.3 params

Check whether a domain is cited by an AI engine across a cluster of queries. Returns per-query presence, rank, and a citation-rate summary. Use to measure visibility for a brand, product, or content site in AI search.

Parameters* required

domainstring

Domain to check, e.g. 'automatelab.tech' (without protocol).

enginestring

LLM engine to check for citations. 'auto' runs all available LLM engines and returns per-engine breakdown + cross-engine consensus. Pin to a specific engine to reduce cost. 'bing_serp' and 'brave_serp' measure web rank, not LLM citations — use check_citations for those.one of perplexity · claude · openai · gemini · bing_serp · brave_serpdefault: auto

queriesarray

Queries to test the domain against. 1-20 queries per call.

signals.ai_overviewCheck whether Google shows an AI Overview for a query, and which URLs it cites. Uses SerpAPI (free tier: 100/month). Set SERPAPI_KEY.3 params

Check whether Google shows an AI Overview for a query, and which URLs it cites. Uses SerpAPI (free tier: 100/month). Set SERPAPI_KEY.

Parameters* required

hlstring

Language code, default 'en'.default: en

querystring

Search query to check for Google AI Overview.

locationstring

Location string, e.g. 'United States'. Affects AI Overview eligibility.

domain.cited_forList queries that the given domain has been cited for, served from the local cache. Build up a corpus by calling check_citations or am_i_cited first; cited_for queries it without spending API budget.4 params

List queries that the given domain has been cited for, served from the local cache. Build up a corpus by calling check_citations or am_i_cited first; cited_for queries it without spending API budget.

Parameters* required

limitinteger

Maximum results.default: 50

sincestring

ISO date floor, e.g. '2026-01-01'. Only return entries fetched on or after this date.

domainstring

Domain to look up, e.g. 'automatelab.tech'.

enginestring

Filter by engine. Omit to include all.one of perplexity · claude · openai · gemini · bing_serp · brave_serp

citations.predictScore citation likelihood for a URL from public signals (Wikipedia link presence, schema.org markup, /llms.txt, GitHub and Reddit references, canonical hygiene, HTTPS). No LLM fired - all heuristic. Returns 0-100 score, grade, signal breakdown, and ranked fixes.1 params

Score citation likelihood for a URL from public signals (Wikipedia link presence, schema.org markup, /llms.txt, GitHub and Reddit references, canonical hygiene, HTTPS). No LLM fired - all heuristic. Returns 0-100 score, grade, signal breakdown, and ranked fixes.

Parameters* required

urlstring

URL to score for citation likelihood. Must be absolute http(s).

panel.trackSave, load, or list named query panels. A panel is a persisted set of queries you want to monitor over time (e.g. editorial-watchlist). Use action=save with queries[] to create, action=load to read, action=list to enumerate. Panels live under <config>/panels/<name>.json.4 params

Save, load, or list named query panels. A panel is a persisted set of queries you want to monitor over time (e.g. editorial-watchlist). Use action=save with queries[] to create, action=load to read, action=list to enumerate. Panels live under <config>/panels/<name>.json.

Parameters* required

namestring

Panel name, e.g. 'editorial-watchlist'. Used to save and recall the query set.

actionstring

'save' writes the panel, 'load' returns an existing panel, 'list' enumerates all panels.one of save · load · listdefault: save

domainstring

Default domain to track for this panel, e.g. 'automatelab.tech'.

queriesarray

Queries to save under this panel. Omit to read the existing panel.

panel.runRun a saved panel through am_i_cited and append a timestamped snapshot. Snapshots live under <config>/snapshots/<panel>/<iso>.json. Feeds citation_trend.3 params

Run a saved panel through am_i_cited and append a timestamped snapshot. Snapshots live under <config>/snapshots/<panel>/<iso>.json. Feeds citation_trend.

Parameters* required

namestring

Panel name previously saved via track_queries.

domainstring

Override the panel's default domain for this run.

enginestring

AI engine to query. Use bing_serp/brave_serp for web_rank comparison only — am_i_cited will refuse them.one of perplexity · claude · openai · gemini · bing_serp · brave_serpdefault: auto

citations.trendReport citation rate over time for a panel from stored snapshots. Returns the series of citation_rate per snapshot plus per-query deltas (gained/lost/unchanged) between first and last snapshot.2 params

Report citation rate over time for a panel from stored snapshots. Returns the series of citation_rate per snapshot plus per-query deltas (gained/lost/unchanged) between first and last snapshot.

Parameters* required

panelstring

Panel name to report on.

sincestring

ISO date floor, e.g. '2026-01-01'. Only include snapshots on or after.

competitors.compareRun predict_citation on 2-10 URLs and return a side-by-side signal table plus a list of signals where the URLs diverge. Use to compare your URL to top-cited competitors for the same query.1 params

Run predict_citation on 2-10 URLs and return a side-by-side signal table plus a list of signals where the URLs diverge. Use to compare your URL to top-cited competitors for the same query.

Parameters* required

urlsarray

URLs to compare side-by-side. 2-10 URLs. One is typically yours and the rest are cited competitors.

signals.wikipediaList Wikipedia articles that reference the given domain. Wikipedia citation is the highest-lift signal for LLM training corpora. Zero keys required.3 params

List Wikipedia articles that reference the given domain. Wikipedia citation is the highest-lift signal for LLM training corpora. Zero keys required.

Parameters* required

langstring

Wikipedia language subdomain, e.g. 'en', 'de', 'fr'.default: en

limitinteger

Maximum mention rows to return.default: 20

domainstring

Domain to search for, e.g. 'automatelab.tech' (without protocol).

audit.sitemapFetch a sitemap.xml (or sitemap index) and run predict_citation on every URL. Returns results sorted worst-score-first. Surfaces systemic issues across a whole site in one pass. Zero engine keys needed.3 params

Fetch a sitemap.xml (or sitemap index) and run predict_citation on every URL. Returns results sorted worst-score-first. Surfaces systemic issues across a whole site in one pass. Zero engine keys needed.

Parameters* required

limitinteger

Max URLs to score. Sitemap is sliced after parsing.default: 50

concurrencyinteger

Parallel predict_citation calls. Higher is faster but more rate-limit risk.default: 3

sitemap_urlstring

URL of sitemap.xml (or a sitemap index). Nested sitemaps are followed.

competitors.competeEnd-to-end competitive snapshot for a single query. Calls check_citations to get the cited URLs, then runs compare_domains on your_url vs the top cited competitors. Returns your score, the average competitor score, and the gap.4 params

End-to-end competitive snapshot for a single query. Calls check_citations to get the cited URLs, then runs compare_domains on your_url vs the top cited competitors. Returns your score, the average competitor score, and the gap.

Parameters* required

querystring

Search query to test (what would a user ask an AI?).

enginestring

AI engine to query for the citation set. 'auto' picks the first available key.one of perplexity · claude · openai · gemini · bing_serp · brave_serpdefault: auto

your_urlstring

Your URL to benchmark against the cited competitors.

max_competitorsinteger

How many cited URLs to compare against your_url. Capped at 9 (compare_domains accepts max 10 URLs total including yours).default: 4

citations.freshnessScore how recent the pages cited for a query are. Calls check_citations, then collects dateModified for each cited URL, returns a 0-100 recency_score (halflife=365d) plus per-URL freshness bucket (fresh/current/stale/ancient/unknown). Surfaces queries where AI cites old conten...3 params

Score how recent the pages cited for a query are. Calls check_citations, then collects dateModified for each cited URL, returns a 0-100 recency_score (halflife=365d) plus per-URL freshness bucket (fresh/current/stale/ancient/unknown). Surfaces queries where AI cites old conten...

Parameters* required

querystring

Search query whose cited URLs to score for freshness.

enginestring

AI engine to query for the citation set.one of perplexity · claude · openai · gemini · bing_serp · brave_serpdefault: auto

max_resultsinteger

How many cited URLs to inspect.default: 10

domain.cited_for_diffDiff cited_for between two time windows for a domain. Returns queries gained (cited now, not before baseline_until) and queries lost (cited before, not since current_since). Cache-only, no API spend. Use to track citation drift over time after publishing or migrating content.4 params

Diff cited_for between two time windows for a domain. Returns queries gained (cited now, not before baseline_until) and queries lost (cited before, not since current_since). Cache-only, no API spend. Use to track citation drift over time after publishing or migrating content.

Parameters* required

domainstring

Domain to diff, e.g. 'automatelab.tech'.

enginestring

Filter by engine. Omit to include all.one of perplexity · claude · openai · gemini · bing_serp · brave_serp

current_sincestring

ISO date floor for the 'current' window. Defaults to baseline_until.

baseline_untilstring

ISO date (or ISO datetime). Baseline window = all cache entries fetched on or before this timestamp.

signals.gsc_gapJoin Google Search Console performance with am_i_cited per query. Surfaces queries where the domain ranks well in Google but is not cited in AI - the closest editorial wins. Requires GCP service account creds (credentials_path or GOOGLE_APPLICATION_CREDENTIALS env).7 params

Join Google Search Console performance with am_i_cited per query. Surfaces queries where the domain ranks well in Google but is not cited in AI - the closest editorial wins. Requires GCP service account creds (credentials_path or GOOGLE_APPLICATION_CREDENTIALS env).

Parameters* required

domainstring

Domain to analyze, e.g. 'automatelab.tech'. Used both for the GSC site URL and the citation check.

enginestring

AI engine for the citation check.one of perplexity · claude · openai · gemini · bing_serp · brave_serpdefault: auto

queriesarray

Queries to cross-reference. 1-20 per call.

end_datestring

ISO date for GSC range end, e.g. '2026-05-01'.

site_urlstring

Override the GSC siteUrl. Defaults to 'sc-domain:<domain>'.

start_datestring

ISO date for GSC range start, e.g. '2026-04-01'.

credentials_pathstring

Path to GCP service account JSON. Defaults to env GOOGLE_APPLICATION_CREDENTIALS.

audit.schemaDeep schema.org validation for a URL. Parses every JSON-LD block and microdata node, checks required fields per @type (Article needs headline+author+datePublished, FAQPage needs mainEntity, HowTo needs step, etc.), and flags missing fields and malformed JSON-LD. Returns issues...1 params

Deep schema.org validation for a URL. Parses every JSON-LD block and microdata node, checks required fields per @type (Article needs headline+author+datePublished, FAQPage needs mainEntity, HowTo needs step, etc.), and flags missing fields and malformed JSON-LD. Returns issues...

Parameters* required

urlstring

URL whose JSON-LD and microdata to validate against schema.org expected fields.

audit.llms_txtGenerate an llms.txt file (https://llmstxt.org spec) from a sitemap. Parses sitemap.xml + nested indexes, groups URLs by top-level path, and emits a Markdown document with H1+description+sectioned link lists. Set fetch_titles=true to pull <title> per URL (slower, richer output).5 params

Generate an llms.txt file (https://llmstxt.org spec) from a sitemap. Parses sitemap.xml + nested indexes, groups URLs by top-level path, and emits a Markdown document with H1+description+sectioned link lists. Set fetch_titles=true to pull <title> per URL (slower, richer output).

Parameters* required

limitinteger

Max URLs to include. Truncated after sitemap parse, before title fetch.default: 100

site_titlestring

Site title - top H1 in the generated llms.txt file.

sitemap_urlstring

URL of sitemap.xml (or sitemap index). Nested sitemaps are followed.

fetch_titlesboolean

If true, fetch each URL to extract <title> for richer links. Slower (one HEAD-ish GET per URL). Default false uses the URL path as the link text.default: false

site_descriptionstring

One-paragraph site description placed under the H1. Optional but strongly recommended.

signals.answer_boxLocate where each cited URL appears in the AI's raw answer text. Calls check_citations, finds the first mention of each citation's URL (or hostname) in raw_answer, and bins by char position into early/middle/late thirds. Surfaces whether your URL is cited up-front or buried ne...3 params

Locate where each cited URL appears in the AI's raw answer text. Calls check_citations, finds the first mention of each citation's URL (or hostname) in raw_answer, and bins by char position into early/middle/late thirds. Surfaces whether your URL is cited up-front or buried ne...

Parameters* required

querystring

Search query whose AI answer to measure citation positions on.

enginestring

AI engine to query. web_rank engines (bing_serp, brave_serp) lack raw_answer and will return position 'unknown'.one of perplexity · claude · openai · gemini · bing_serp · brave_serpdefault: auto

max_resultsinteger

Max citations to locate.default: 10

citations.provenanceFan a query out across multiple AI engines and report per-URL cross-engine consensus. Returns each unique cited URL with the list of engines that cited it, plus a consensus_urls list (URLs cited by ALL engines). High engine_count = strong cross-engine citation signal; engine_c...3 params

Fan a query out across multiple AI engines and report per-URL cross-engine consensus. Returns each unique cited URL with the list of engines that cited it, plus a consensus_urls list (URLs cited by ALL engines). High engine_count = strong cross-engine citation signal; engine_c...

Parameters* required

querystring

Search query to fan out across multiple engines.

enginesarray

Engines to query. If omitted, uses all LLM engines with a configured API key (perplexity, claude, openai, gemini, google_ai_mode). Include bing_serp/brave_serp only when you explicitly want web_rank comparison.

max_resultsinteger

Max citations per engine.default: 10

citations.evidenceExtract the cited snippet from the AI engine's raw answer for each citation. Calls check_citations, then for each returned URL finds the first mention in raw_answer and returns a context window plus the nearest quoted span or containing sentence. Use to see *why* an engine cit...4 params

Extract the cited snippet from the AI engine's raw answer for each citation. Calls check_citations, then for each returned URL finds the first mention in raw_answer and returns a context window plus the nearest quoted span or containing sentence. Use to see *why* an engine cit...

Parameters* required

querystring

Search query whose AI answer to extract citation evidence from.

enginestring

AI engine to query. web_rank engines (bing_serp, brave_serp) lack raw_answer and return no evidence.one of perplexity · claude · openai · gemini · bing_serp · brave_serpdefault: auto

max_resultsinteger

Max citations to extract evidence for.default: 10

context_charsinteger

Half-width of the snippet window around each citation mention (chars). Total snippet is up to 2x this.default: 240

audit.crawler_accessVerify that major AI crawlers (GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, CCBot, Google-Extended, Applebot-Extended, Bytespider, Meta-ExternalAgent, plus real-time fetch UAs) can fetch a URL. Parses robots.txt and does a live GET with each bot's User-Agent. Surfaces robo...3 params

Verify that major AI crawlers (GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, CCBot, Google-Extended, Applebot-Extended, Bytespider, Meta-ExternalAgent, plus real-time fetch UAs) can fetch a URL. Parses robots.txt and does a live GET with each bot's User-Agent. Surfaces robo...

Parameters* required

urlstring

Page URL to test for AI crawler access.

botsarray

Override the default bot list. Each entry is a User-Agent token (e.g. 'GPTBot', 'ClaudeBot').

fetch_with_uaboolean

If true, do a live GET as each bot's User-Agent and report status. Disable to only parse robots.txt (no extra requests).default: true

audit.sitemap_mapCross-reference a sitemap with the citation cache. For each sitemap URL, reports whether it appears in cached citations (and how many queries/engines cited it). Inverse of audit_sitemap: not 'how citable is each URL', but 'has each URL actually been cited yet'. Cache must be p...4 params

Cross-reference a sitemap with the citation cache. For each sitemap URL, reports whether it appears in cached citations (and how many queries/engines cited it). Inverse of audit_sitemap: not 'how citable is each URL', but 'has each URL actually been cited yet'. Cache must be p...

Parameters* required

limitinteger

Max sitemap URLs to consider.default: 500

sincestring

ISO date floor; only count citations recorded on or after this date.

domainstring

Domain to look up citations for. If omitted, inferred from the sitemap host.

sitemap_urlstring

URL of sitemap.xml (or a sitemap index). Nested sitemaps are followed.

competitors.canonical_setFan a query across engines and aggregate citations by registered domain (not URL). Returns top competitor domains ranked by cross-engine consensus, with per-engine breakdown and top URLs per domain. Use to identify the canonical competitor set for a query - the domains every e...5 params

Fan a query across engines and aggregate citations by registered domain (not URL). Returns top competitor domains ranked by cross-engine consensus, with per-engine breakdown and top URLs per domain. Use to identify the canonical competitor set for a query - the domains every e...

Parameters* required

querystring

Search query to fan out across engines.

top_ninteger

Max competitor domains to return.default: 10

enginesarray

Engines to query. If omitted, uses all LLM engines with a configured API key (google_ai_mode, perplexity, claude, openai, gemini). Include bing_serp/brave_serp only for web_rank comparison.

max_resultsinteger

Max citations per engine.default: 10

exclude_domainsarray

Domains to filter out (e.g. your own brand, Wikipedia, Reddit). Suffix-match.

audit.structured_dataSuggest missing JSON-LD additions for a URL. Fetches the page, detects existing schema types, and returns ready-to-paste templates for types that are missing but signalled by page content (BlogPosting from og:type=article or bylines, FAQPage from Q&A pairs, HowTo from numbered...1 params

Suggest missing JSON-LD additions for a URL. Fetches the page, detects existing schema types, and returns ready-to-paste templates for types that are missing but signalled by page content (BlogPosting from og:type=article or bylines, FAQPage from Q&A pairs, HowTo from numbered...

Parameters* required

urlstring

URL to inspect for missing JSON-LD. The page is fetched and its content signals are used to suggest schema types.

Citation Intelligence MCP

A free, self-hosted MCP server that tells your agent what LLMs cite - across Perplexity, Google AI Overviews, ChatGPT, Claude, Gemini, and Bing.

What this is

An MCP server for agents and developers who need to know which URLs get cited by AI search engines for any query. Install once, query from any MCP-compatible client (Claude Desktop, Cursor, Claude Code, Continue, Cline, n8n, LangGraph). Self-hosted, no account, no centralized backend. Bring your own API keys; nothing is stored on a remote server.

Who this is for

Install this if you're:

Building an agent that does research and want it to cite sources LLMs already trust
A solo dev or indie hacker checking whether your SaaS is showing up in AI search
A content creator confirming your articles are being cited by ChatGPT, Claude, or Perplexity
An SEO or GEO practitioner who wants programmatic citation data without a $295-$499/mo dashboard
Running an editorial pipeline and want citation-deficit-driven topic selection
Comparing competitor visibility across AI engines for any niche

Do NOT install this if you want:

A polished marketing dashboard with charts and team seats - try Profound, AthenaHQ, or Otterly.AI
A hosted service with SLAs - this is self-hosted by design
Citation tracking for academic papers - try citecheck
350M+ pre-modeled prompts - that's Ahrefs Brand Radar

Why this exists

The AI citation tracking market is dominated by VC-funded dashboards starting at $295/mo. None ships MCP-first. If you're an agent or developer who wants citation data piped directly into your workflow - not into a SaaS login - there isn't a tool for you. This is that tool.

Tools

Tools are grouped into seven namespaces: citations_*, domain_*, signals_*, panel_*, report_*, competitors_*, audit_*. The prefix is the question category; the suffix is the action. Wire names use underscores (not dots) so Anthropic-API-based MCP clients (Claude Desktop, Claude Code) can forward the tool list without HTTP 400.

Start with citations_provenance or domain_am_i_cited. Single-engine results (citations_check with a pinned engine) are directional; multi-engine consensus is the honest signal. A URL cited by 4 of 5 engines is a very different finding than one cited by 1.

`citations_*` — query-level: who cites what, with what evidence

Tool	Purpose
`citations_provenance`	Recommended first tool. Fan a query across engines; per-URL cross-engine consensus matrix. Returns `interpretation_note` per engine.
`citations_check`	URLs cited by Perplexity / Claude / ChatGPT / Gemini / Google AI Mode for a query; or web rank via bing_serp / brave_serp
`citations_evidence`	Extract the cited snippet from `raw_answer` for each citation (why, not just that)
`citations_predict`	Citation likelihood from public signals - no LLM fired
`citations_trend`	Time-series report of citation rate + per-query gained/lost deltas
`citations_freshness`	Recency score (halflife=365d) for the pages an engine cites

`domain_*` — domain-level: am I cited, what for

Tool	Purpose
`domain_am_i_cited`	Domain citation check. With `engine=auto` (default): fans across all available LLM engines, returns per-engine breakdown + cross-engine consensus. Pin `engine=` to reduce cost.
`domain_cited_for`	Queries the domain has been cited for, from local cache
`domain_cited_for_diff`	Diff of `domain_cited_for` between two time windows for a domain

`signals_*` — external signals: AI Overview, Wikipedia, GSC, answer-box position

Tool	Purpose
`signals_ai_overview`	Google AI Overview presence + cited sources
`signals_wikipedia`	List Wikipedia articles referencing a domain (zero keys)
`signals_gsc_gap`	Join Google Search Console performance with AI citation status
`signals_answer_box`	Bin each citation's first mention in `raw_answer` into early/middle/late thirds

`panel_*` — saved query panels (editorial watchlists)

Tool	Purpose
`panel_track`	Save / load / list named query panels (editorial watchlists)
`panel_run`	Run a panel through `domain_am_i_cited` and snapshot to disk

`report_*` — turnkey reporting artifacts

Tool	Purpose
`report_visibility`	One-call AI visibility report over a query set (or panel): citation rate (mention frequency), share of voice vs competitors, average rank, and brand sentiment. Returns structured data + a Markdown artifact for a public page.

`competitors_*` — competitive landscape per query

Tool	Purpose
`competitors_canonical_set`	Top cited domains per query, aggregated across engines
`competitors_compete`	End-to-end competitive snapshot: your URL vs top cited competitors
`competitors_compare`	Side-by-side `citations_predict` across 2-10 URLs

`audit_*` — fixable on-page / on-site checks

Tool	Purpose
`audit_schema`	Deep schema.org validation - required fields per `@type`, malformed JSON-LD
`audit_structured_data`	Repair-oriented schema.org diagnostics + suggested patches
`audit_crawler_access`	Verify GPTBot / ClaudeBot / PerplexityBot / CCBot / Google-Extended etc. can fetch a URL
`audit_sitemap`	Bulk `citations_predict` across every URL in a sitemap, worst-first
`audit_sitemap_map`	Cross-reference sitemap URLs with cached citations (inverse of `audit_sitemap`)
`audit_llms_txt`	Generate an `llms.txt` (https://llmstxt.org) from a sitemap

Prompts

Server-side prompt templates the client can offer end users (call via the MCP prompt list):

audit_citation_readiness(url) - chains citations_predict + audit_schema
audit_competitor_snapshot(query, your_url?) - chains competitors_canonical_set + competitors_compete
audit_crawler_checkup(url) - runs audit_crawler_access and writes a remediation list
audit_gap_analysis(domain, days?) - drives signals_gsc_gap and suggests next moves
audit_sitemap_coverage(sitemap_url) - runs audit_sitemap_map and recommends priorities

Resources

Cache views the client can read or subscribe to (no tool call required):

citation://cache/summary - entry counts by type/engine, unique queries/URLs, oldest/newest
citation://panels - saved panels + per-panel snapshot counts
citation://docs/llms-txt - llms.txt primer (markdown)
citation://docs/ai-crawlers - AI crawlers cheatsheet (markdown)
citation://domain/{domain}/cited-for - dynamic template: citations for {domain}

What this actually measures

Every response includes a surface field that tells you exactly how the data was collected. Understanding this is important before drawing conclusions.

Surface	Engines	What it means
`consumer_scrape`	`perplexity`, `google_ai_mode`	Proxied through a real consumer-facing AI search product. Closest to what your users see.
`api_proxy`	`claude`, `openai`, `gemini`	API call to a search-enabled LLM. May differ from consumer product behavior — different model versions, no UI-level ranking logic, no personalization. Use as a directional proxy, not as ground truth.
`web_rank`	`bing_serp`, `brave_serp`	Traditional web search rank (not LLM citation). Measures whether a URL appears in SERP results, not whether an LLM cites it.
`static_signal`	`citations_predict`, `signals_wikipedia`	Offline signal computed from public data. No live LLM query.

Per-engine notes

perplexity (consumer_scrape) — Sonar Pro via the Perplexity API with a consumer-equivalent system prompt. Reasonably close to Perplexity.ai. Citations come from search_results in the response; the citations fallback contains URL-only entries without title.

claude (api_proxy) — Claude Sonnet via the Anthropic Messages API with web_search tool enabled. The consumer Claude.ai product uses different routing and ranking logic. Citation behavior can differ, especially for recent/time-sensitive queries.

openai (api_proxy) — gpt-4o + the web_search_preview tool via the OpenAI Responses API. Replaces the deprecated gpt-4o-search-preview alias OpenAI retired; base gpt-4o plus the tool is the supported path.

gemini (api_proxy) — Gemini 2.5 Pro via the Generative Language API with google_search grounding. Consumer Gemini uses the same grounding index but different re-ranking. Results are directional.

google_ai_mode (consumer_scrape) — Google AI Mode results via SerpAPI. Closest to what users see in Google Search. Requires SERPAPI_KEY.

bing_serp / brave_serp (web_rank) — Traditional SERP rank. Does NOT measure LLM citations. Use citations_check with these engines to compare organic web rank against LLM citation rank. domain_am_i_cited refuses these engines — it only measures LLM behavior.

The proxy nature of api_proxy engines is a feature, not a bug: it lets you run citation checks without consuming expensive consumer-product quota. Just don't report API-proxy numbers as "ChatGPT cites you" without the caveat.

Every tool response includes an interpretation_note field that summarizes the fidelity in one sentence. Full per-engine fidelity ratings: docs/surface-fidelity.md.

Quick start

npx -y @automatelab/citation-intelligence

Requires Node 20 or later.

Claude Desktop

Add to %APPDATA%\Claude\claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "citation-intelligence": {
      "command": "npx",
      "args": ["-y", "@automatelab/citation-intelligence"],
      "env": {
        "PERPLEXITY_API_KEY": "pplx-...",
        "SERPAPI_KEY": "...",
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENAI_API_KEY": "sk-...",
        "GEMINI_API_KEY": "..."
      }
    }
  }
}

Set only the keys you have. Any MCP client that supports stdio transport works - same command / args pattern.

How it stays free

No central backend. The server runs on your machine. Nothing is uploaded.
Free tier first. SerpAPI gives 100 free Google AI Overview lookups/month. Bing Web Search has a free tier. Perplexity offers free Sonar access on signup.
Bring your own paid keys if you want the premium engines (Claude, ChatGPT, Gemini). Keys pass through to the vendor and never touch any third party.
Local cache at ~/.config/citation-intelligence/cache.json. Repeated queries hit cache, not API. Default TTL: 7 days.
citations_predict runs with zero keys - it scores citation likelihood from public signals (Wikipedia, schema.org, llms.txt, GitHub) without firing any LLM.

Privacy

All API calls go from your machine directly to the vendor (Anthropic, OpenAI, Google, Perplexity, Bing, SerpAPI).
No proxy. No analytics. No telemetry by default.
API keys are read from environment variables on the MCP process - never logged, never persisted.
Cache file lives at ~/.config/citation-intelligence/cache.json. Delete it any time.

Environment variables

Var	Purpose	Free tier?
`PERPLEXITY_API_KEY`	`citations_check` (perplexity — consumer_scrape)	Yes
`SERPAPI_KEY`	`signals_ai_overview` + `citations_check` (google_ai_mode — consumer_scrape)	100/month free
`ANTHROPIC_API_KEY`	`citations_check` (claude — api_proxy)	Paid only
`OPENAI_API_KEY`	`citations_check` (openai — api_proxy)	Paid only
`GEMINI_API_KEY`	`citations_check` (gemini — api_proxy)	Yes
`BING_API_KEY`	`citations_check` (bing_serp — web_rank)	Yes
`BRAVE_API_KEY`	`citations_check` (brave_serp — web_rank)	Yes (2000/month)
`CITATION_CACHE_TTL_DAYS`	Cache TTL for `citations_check` entries (default 7)	n/a
`CITATION_AI_OVERVIEW_TTL_DAYS`	Cache TTL for `signals_ai_overview` entries (default 1)	n/a
`CITATION_CONFIG_DIR`	Override config dir (default `~/.config/citation-intelligence`)	n/a

Example: am I cited?

You: For the queries "best AI citation tracker", "MCP for AI search", "self-hosted GEO tool",
     is automatelab.tech cited?

(agent invokes `domain_am_i_cited`)

Result:
{
  "domain": "automatelab.tech",
  "engine": "perplexity",
  "results": [
    { "query": "best AI citation tracker",   "cited": true,  "rank": 4 },
    { "query": "MCP for AI search",          "cited": true,  "rank": 1 },
    { "query": "self-hosted GEO tool",       "cited": false, "matching_urls": [] }
  ],
  "summary": {
    "queries_total": 3,
    "queries_cited": 2,
    "citation_rate": 0.67,
    "average_rank": 2.5
  }
}

Example: predict citation likelihood (no key required)

You: How likely is https://example.com/blog/post to be cited by AI?

(agent invokes `citations_predict`)

Result:
{
  "url": "https://example.com/blog/post",
  "score": 62,
  "grade": "C",
  "signals": {
    "wikipedia_linked": false,
    "github_referenced": false,
    "reddit_referenced": true,
    "llms_txt_present": true,
    "https": true,
    "has_article_schema": true,
    "has_faq_schema": false,
    "has_breadcrumb_schema": true,
    "canonical_clean": true,
    "word_count": 1850,
    "reading_time_minutes": 8,
    "h2_count": 7,
    "h2_question_count": 1,
    "authority_link_count": 2,
    "external_link_count": 6,
    "internal_link_count": 11,
    "last_modified_days_ago": 42,
    "has_open_graph": true
  },
  "fixes": [
    { "signal": "has_faq_schema", "suggestion": "Page already has question-style H2s. Wrap them in FAQPage JSON-LD - high-leverage win.", "estimated_lift": "high" },
    { "signal": "h2_question_count", "suggestion": "Reframe at least 2 H2s as questions users actually ask...", "estimated_lift": "medium" }
  ]
}

The Wikipedia signal is measured (it correlates with citation) but no "go get a Wikipedia article" suggestion is emitted - the advice would be non-actionable. Scoring is split across six buckets - domain authority, structured data, content depth, link graph, freshness, metadata - so a thin page and a deep page on the same domain get meaningfully different scores.

Workflow recipes

Concrete patterns that compose the 26 tools into something useful. Costs assume ChatGPT or Perplexity at ~$0.01-0.03/query.

1. Weekly citation tracker

The single highest-ROI pattern. Pick 20-30 queries from your editorial backlog, snapshot weekly, watch the rate trend.

# One-time setup
panel_track name="editorial-watchlist" domain="example.com" action="save"
            queries=["best widget tutorial", "how to set up X", ...]

# Weekly cron (5 min, ~$0.20-0.60 per run)
panel_run name="editorial-watchlist"

# Anytime
citations_trend panel="editorial-watchlist"

citations_trend returns per-query deltas: which queries flipped from cited: false to cited: true since the first snapshot. That's your real editorial-impact metric.

2. Pre-publish gate

Before publishing a post, find out who owns the citation slot and whether the slot is worth competing for.

# 1. Is there an AI Overview to compete for?
signals_ai_overview query="<target query>"

# 2. Who is cited today?
citations_check query="<target query>"

# 3. After publish + 14 days: did the post break in?
domain_am_i_cited domain="example.com" queries=["<target query>"]

If citations_check returns 5+ strong incumbents on a low-volume query, pick a different angle. If ai_overview_present: false, the query has no AI surface - reconsider.

3. Bulk site audit

Catch site-wide structural issues across every page in one pass. Zero API spend.

audit_sitemap sitemap_url="https://example.com/sitemap.xml" limit=200

Returns worst_first sorted by citation-likelihood score. Surfaces missing schema, conflicting canonicals, missing /llms.txt, broken HTTPS.

4. Competitor signal gap

You're not cited; they are. Why?

# 1. Find the top-cited URLs for your target query
citations_check query="<query>"

# 2. Compare your URL to theirs signal-by-signal
competitors_compare urls=[
  "https://example.com/your-post",
  "https://competitor-1.com/their-post",
  "https://competitor-2.com/their-post"
]

diverging_signals is the list of where you're losing. Usually obvious once you see it - they have FAQ schema, GitHub references, Wikipedia links - you don't.

5. Google-rank vs AI-citation gap

The closest editorial wins are queries where you already rank in Google's top 10 but are invisible to AI. Requires a GCP service account with webmasters.readonly scope.

signals_gsc_gap
  domain="example.com"
  queries=["...editorial watchlist..."]
  start_date="2026-04-01"
  end_date="2026-05-01"

closest_wins returns queries with position <= 10 and ai_cited: false, sorted by impressions desc. Push citation signals on those specific URLs first.

6. Wikipedia mention monitor

Wikipedia is the top-correlation signal but the advice "get on Wikipedia" is useless. So instead: watch when it happens organically.

signals_wikipedia domain="example.com" limit=50

Returns Wikipedia article URLs that already link to the domain. Re-run quarterly; the diff is your "we got a Wikipedia citation" alert.

Schema.org

{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "name": "Citation Intelligence MCP",
  "applicationCategory": "DeveloperApplication",
  "operatingSystem": "Cross-platform",
  "description": "Self-hosted MCP server for querying AI citation data from Perplexity, Claude, ChatGPT, Gemini, Bing, and Google AI Overviews.",
  "offers": { "@type": "Offer", "price": "0" },
  "url": "https://github.com/AutomateLab-tech/citation-intelligence"
}

Contributing

Bug reports, feature ideas, and PRs welcome. See CONTRIBUTING.md.

Security

Report a vulnerability via SECURITY.md.

License

MIT - see LICENSE.

Built by automatelab.tech

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Citation Intelligence MCP

A free, self-hosted MCP server that tells your agent what LLMs cite - across Perplexity, Google AI Overviews, ChatGPT, Claude, Gemini, and Bing.

What this is

Who this is for

Install this if you're:

Building an agent that does research and want it to cite sources LLMs already trust
A solo dev or indie hacker checking whether your SaaS is showing up in AI search
A content creator confirming your articles are being cited by ChatGPT, Claude, or Perplexity
An SEO or GEO practitioner who wants programmatic citation data without a $295-$499/mo dashboard
Running an editorial pipeline and want citation-deficit-driven topic selection
Comparing competitor visibility across AI engines for any niche

Do NOT install this if you want:

A polished marketing dashboard with charts and team seats - try Profound, AthenaHQ, or Otterly.AI
A hosted service with SLAs - this is self-hosted by design
Citation tracking for academic papers - try citecheck
350M+ pre-modeled prompts - that's Ahrefs Brand Radar

Why this exists

Tools

`citations_*` — query-level: who cites what, with what evidence

Tool	Purpose
`citations_provenance`	Recommended first tool. Fan a query across engines; per-URL cross-engine consensus matrix. Returns `interpretation_note` per engine.
`citations_check`	URLs cited by Perplexity / Claude / ChatGPT / Gemini / Google AI Mode for a query; or web rank via bing_serp / brave_serp
`citations_evidence`	Extract the cited snippet from `raw_answer` for each citation (why, not just that)
`citations_predict`	Citation likelihood from public signals - no LLM fired
`citations_trend`	Time-series report of citation rate + per-query gained/lost deltas
`citations_freshness`	Recency score (halflife=365d) for the pages an engine cites

`domain_*` — domain-level: am I cited, what for

Tool	Purpose
`domain_am_i_cited`	Domain citation check. With `engine=auto` (default): fans across all available LLM engines, returns per-engine breakdown + cross-engine consensus. Pin `engine=` to reduce cost.
`domain_cited_for`	Queries the domain has been cited for, from local cache
`domain_cited_for_diff`	Diff of `domain_cited_for` between two time windows for a domain

`signals_*` — external signals: AI Overview, Wikipedia, GSC, answer-box position

Tool	Purpose
`signals_ai_overview`	Google AI Overview presence + cited sources
`signals_wikipedia`	List Wikipedia articles referencing a domain (zero keys)
`signals_gsc_gap`	Join Google Search Console performance with AI citation status
`signals_answer_box`	Bin each citation's first mention in `raw_answer` into early/middle/late thirds

`panel_*` — saved query panels (editorial watchlists)

Tool	Purpose
`panel_track`	Save / load / list named query panels (editorial watchlists)
`panel_run`	Run a panel through `domain_am_i_cited` and snapshot to disk

`report_*` — turnkey reporting artifacts

Tool	Purpose
`report_visibility`	One-call AI visibility report over a query set (or panel): citation rate (mention frequency), share of voice vs competitors, average rank, and brand sentiment. Returns structured data + a Markdown artifact for a public page.

`competitors_*` — competitive landscape per query

Tool	Purpose
`competitors_canonical_set`	Top cited domains per query, aggregated across engines
`competitors_compete`	End-to-end competitive snapshot: your URL vs top cited competitors
`competitors_compare`	Side-by-side `citations_predict` across 2-10 URLs

`audit_*` — fixable on-page / on-site checks

Tool	Purpose
`audit_schema`	Deep schema.org validation - required fields per `@type`, malformed JSON-LD
`audit_structured_data`	Repair-oriented schema.org diagnostics + suggested patches
`audit_crawler_access`	Verify GPTBot / ClaudeBot / PerplexityBot / CCBot / Google-Extended etc. can fetch a URL
`audit_sitemap`	Bulk `citations_predict` across every URL in a sitemap, worst-first
`audit_sitemap_map`	Cross-reference sitemap URLs with cached citations (inverse of `audit_sitemap`)
`audit_llms_txt`	Generate an `llms.txt` (https://llmstxt.org) from a sitemap

Prompts

Server-side prompt templates the client can offer end users (call via the MCP prompt list):

audit_citation_readiness(url) - chains citations_predict + audit_schema
audit_competitor_snapshot(query, your_url?) - chains competitors_canonical_set + competitors_compete
audit_crawler_checkup(url) - runs audit_crawler_access and writes a remediation list
audit_gap_analysis(domain, days?) - drives signals_gsc_gap and suggests next moves
audit_sitemap_coverage(sitemap_url) - runs audit_sitemap_map and recommends priorities

Resources

Cache views the client can read or subscribe to (no tool call required):

citation://cache/summary - entry counts by type/engine, unique queries/URLs, oldest/newest
citation://panels - saved panels + per-panel snapshot counts
citation://docs/llms-txt - llms.txt primer (markdown)
citation://docs/ai-crawlers - AI crawlers cheatsheet (markdown)
citation://domain/{domain}/cited-for - dynamic template: citations for {domain}

What this actually measures

Every response includes a surface field that tells you exactly how the data was collected. Understanding this is important before drawing conclusions.

Surface	Engines	What it means
`consumer_scrape`	`perplexity`, `google_ai_mode`	Proxied through a real consumer-facing AI search product. Closest to what your users see.
`api_proxy`	`claude`, `openai`, `gemini`	API call to a search-enabled LLM. May differ from consumer product behavior — different model versions, no UI-level ranking logic, no personalization. Use as a directional proxy, not as ground truth.
`web_rank`	`bing_serp`, `brave_serp`	Traditional web search rank (not LLM citation). Measures whether a URL appears in SERP results, not whether an LLM cites it.
`static_signal`	`citations_predict`, `signals_wikipedia`	Offline signal computed from public data. No live LLM query.

Per-engine notes

google_ai_mode (consumer_scrape) — Google AI Mode results via SerpAPI. Closest to what users see in Google Search. Requires SERPAPI_KEY.

Every tool response includes an interpretation_note field that summarizes the fidelity in one sentence. Full per-engine fidelity ratings: docs/surface-fidelity.md.

Quick start

npx -y @automatelab/citation-intelligence

Requires Node 20 or later.

Claude Desktop

Add to %APPDATA%\Claude\claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "citation-intelligence": {
      "command": "npx",
      "args": ["-y", "@automatelab/citation-intelligence"],
      "env": {
        "PERPLEXITY_API_KEY": "pplx-...",
        "SERPAPI_KEY": "...",
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENAI_API_KEY": "sk-...",
        "GEMINI_API_KEY": "..."
      }
    }
  }
}

Set only the keys you have. Any MCP client that supports stdio transport works - same command / args pattern.

How it stays free

No central backend. The server runs on your machine. Nothing is uploaded.
Free tier first. SerpAPI gives 100 free Google AI Overview lookups/month. Bing Web Search has a free tier. Perplexity offers free Sonar access on signup.
Bring your own paid keys if you want the premium engines (Claude, ChatGPT, Gemini). Keys pass through to the vendor and never touch any third party.
Local cache at ~/.config/citation-intelligence/cache.json. Repeated queries hit cache, not API. Default TTL: 7 days.
citations_predict runs with zero keys - it scores citation likelihood from public signals (Wikipedia, schema.org, llms.txt, GitHub) without firing any LLM.

Privacy

All API calls go from your machine directly to the vendor (Anthropic, OpenAI, Google, Perplexity, Bing, SerpAPI).
No proxy. No analytics. No telemetry by default.
API keys are read from environment variables on the MCP process - never logged, never persisted.
Cache file lives at ~/.config/citation-intelligence/cache.json. Delete it any time.

Environment variables

Var	Purpose	Free tier?
`PERPLEXITY_API_KEY`	`citations_check` (perplexity — consumer_scrape)	Yes
`SERPAPI_KEY`	`signals_ai_overview` + `citations_check` (google_ai_mode — consumer_scrape)	100/month free
`ANTHROPIC_API_KEY`	`citations_check` (claude — api_proxy)	Paid only
`OPENAI_API_KEY`	`citations_check` (openai — api_proxy)	Paid only
`GEMINI_API_KEY`	`citations_check` (gemini — api_proxy)	Yes
`BING_API_KEY`	`citations_check` (bing_serp — web_rank)	Yes
`BRAVE_API_KEY`	`citations_check` (brave_serp — web_rank)	Yes (2000/month)
`CITATION_CACHE_TTL_DAYS`	Cache TTL for `citations_check` entries (default 7)	n/a
`CITATION_AI_OVERVIEW_TTL_DAYS`	Cache TTL for `signals_ai_overview` entries (default 1)	n/a
`CITATION_CONFIG_DIR`	Override config dir (default `~/.config/citation-intelligence`)	n/a

Example: am I cited?

You: For the queries "best AI citation tracker", "MCP for AI search", "self-hosted GEO tool",
     is automatelab.tech cited?

(agent invokes `domain_am_i_cited`)

Result:
{
  "domain": "automatelab.tech",
  "engine": "perplexity",
  "results": [
    { "query": "best AI citation tracker",   "cited": true,  "rank": 4 },
    { "query": "MCP for AI search",          "cited": true,  "rank": 1 },
    { "query": "self-hosted GEO tool",       "cited": false, "matching_urls": [] }
  ],
  "summary": {
    "queries_total": 3,
    "queries_cited": 2,
    "citation_rate": 0.67,
    "average_rank": 2.5
  }
}

Example: predict citation likelihood (no key required)

You: How likely is https://example.com/blog/post to be cited by AI?

(agent invokes `citations_predict`)

Result:
{
  "url": "https://example.com/blog/post",
  "score": 62,
  "grade": "C",
  "signals": {
    "wikipedia_linked": false,
    "github_referenced": false,
    "reddit_referenced": true,
    "llms_txt_present": true,
    "https": true,
    "has_article_schema": true,
    "has_faq_schema": false,
    "has_breadcrumb_schema": true,
    "canonical_clean": true,
    "word_count": 1850,
    "reading_time_minutes": 8,
    "h2_count": 7,
    "h2_question_count": 1,
    "authority_link_count": 2,
    "external_link_count": 6,
    "internal_link_count": 11,
    "last_modified_days_ago": 42,
    "has_open_graph": true
  },
  "fixes": [
    { "signal": "has_faq_schema", "suggestion": "Page already has question-style H2s. Wrap them in FAQPage JSON-LD - high-leverage win.", "estimated_lift": "high" },
    { "signal": "h2_question_count", "suggestion": "Reframe at least 2 H2s as questions users actually ask...", "estimated_lift": "medium" }
  ]
}

Workflow recipes

Concrete patterns that compose the 26 tools into something useful. Costs assume ChatGPT or Perplexity at ~$0.01-0.03/query.

1. Weekly citation tracker

The single highest-ROI pattern. Pick 20-30 queries from your editorial backlog, snapshot weekly, watch the rate trend.

# One-time setup
panel_track name="editorial-watchlist" domain="example.com" action="save"
            queries=["best widget tutorial", "how to set up X", ...]

# Weekly cron (5 min, ~$0.20-0.60 per run)
panel_run name="editorial-watchlist"

# Anytime
citations_trend panel="editorial-watchlist"

citations_trend returns per-query deltas: which queries flipped from cited: false to cited: true since the first snapshot. That's your real editorial-impact metric.

2. Pre-publish gate

Before publishing a post, find out who owns the citation slot and whether the slot is worth competing for.

# 1. Is there an AI Overview to compete for?
signals_ai_overview query="<target query>"

# 2. Who is cited today?
citations_check query="<target query>"

# 3. After publish + 14 days: did the post break in?
domain_am_i_cited domain="example.com" queries=["<target query>"]

If citations_check returns 5+ strong incumbents on a low-volume query, pick a different angle. If ai_overview_present: false, the query has no AI surface - reconsider.

3. Bulk site audit

Catch site-wide structural issues across every page in one pass. Zero API spend.

audit_sitemap sitemap_url="https://example.com/sitemap.xml" limit=200

Returns worst_first sorted by citation-likelihood score. Surfaces missing schema, conflicting canonicals, missing /llms.txt, broken HTTPS.

4. Competitor signal gap

You're not cited; they are. Why?

# 1. Find the top-cited URLs for your target query
citations_check query="<query>"

# 2. Compare your URL to theirs signal-by-signal
competitors_compare urls=[
  "https://example.com/your-post",
  "https://competitor-1.com/their-post",
  "https://competitor-2.com/their-post"
]

diverging_signals is the list of where you're losing. Usually obvious once you see it - they have FAQ schema, GitHub references, Wikipedia links - you don't.

5. Google-rank vs AI-citation gap

The closest editorial wins are queries where you already rank in Google's top 10 but are invisible to AI. Requires a GCP service account with webmasters.readonly scope.

signals_gsc_gap
  domain="example.com"
  queries=["...editorial watchlist..."]
  start_date="2026-04-01"
  end_date="2026-05-01"

closest_wins returns queries with position <= 10 and ai_cited: false, sorted by impressions desc. Push citation signals on those specific URLs first.

6. Wikipedia mention monitor

Wikipedia is the top-correlation signal but the advice "get on Wikipedia" is useless. So instead: watch when it happens organically.

signals_wikipedia domain="example.com" limit=50

Returns Wikipedia article URLs that already link to the domain. Re-run quarterly; the diff is your "we got a Wikipedia citation" alert.

Schema.org

{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "name": "Citation Intelligence MCP",
  "applicationCategory": "DeveloperApplication",
  "operatingSystem": "Cross-platform",
  "description": "Self-hosted MCP server for querying AI citation data from Perplexity, Claude, ChatGPT, Gemini, Bing, and Google AI Overviews.",
  "offers": { "@type": "Offer", "price": "0" },
  "url": "https://github.com/AutomateLab-tech/citation-intelligence"
}

Contributing

Bug reports, feature ideas, and PRs welcome. See CONTRIBUTING.md.

Security

Report a vulnerability via SECURITY.md.

License

MIT - see LICENSE.

Built by automatelab.tech

Citation Intelligence

Tools

Citation Intelligence MCP

What this is

Who this is for

Why this exists

Tools

citations_* — query-level: who cites what, with what evidence

domain_* — domain-level: am I cited, what for

signals_* — external signals: AI Overview, Wikipedia, GSC, answer-box position

panel_* — saved query panels (editorial watchlists)

report_* — turnkey reporting artifacts

competitors_* — competitive landscape per query

audit_* — fixable on-page / on-site checks

Prompts

Resources

What this actually measures

Per-engine notes

Quick start

Claude Desktop

How it stays free

Privacy

Environment variables

Example: am I cited?

Example: predict citation likelihood (no key required)

Workflow recipes

1. Weekly citation tracker

2. Pre-publish gate

3. Bulk site audit

4. Competitor signal gap

5. Google-rank vs AI-citation gap

6. Wikipedia mention monitor

Schema.org

Contributing

Security

License

Citation Intelligence

Tools

Citation Intelligence MCP

What this is

Who this is for

Why this exists

Tools

citations_* — query-level: who cites what, with what evidence

domain_* — domain-level: am I cited, what for

signals_* — external signals: AI Overview, Wikipedia, GSC, answer-box position

panel_* — saved query panels (editorial watchlists)

report_* — turnkey reporting artifacts

competitors_* — competitive landscape per query

audit_* — fixable on-page / on-site checks

Prompts

Resources

What this actually measures

Per-engine notes

Quick start

Claude Desktop

How it stays free

Privacy

Environment variables

Example: am I cited?

Example: predict citation likelihood (no key required)

Workflow recipes

1. Weekly citation tracker

2. Pre-publish gate

3. Bulk site audit

4. Competitor signal gap

5. Google-rank vs AI-citation gap

6. Wikipedia mention monitor

Schema.org

Contributing

Security

License

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers

`citations_*` — query-level: who cites what, with what evidence

`domain_*` — domain-level: am I cited, what for

`signals_*` — external signals: AI Overview, Wikipedia, GSC, answer-box position

`panel_*` — saved query panels (editorial watchlists)

`report_*` — turnkey reporting artifacts

`competitors_*` — competitive landscape per query

`audit_*` — fixable on-page / on-site checks

`citations_*` — query-level: who cites what, with what evidence

`domain_*` — domain-level: am I cited, what for

`signals_*` — external signals: AI Overview, Wikipedia, GSC, answer-box position

`panel_*` — saved query panels (editorial watchlists)

`report_*` — turnkey reporting artifacts

`competitors_*` — competitive landscape per query

`audit_*` — fixable on-page / on-site checks