Gives Claude access to multiple LLM providers through a single MCP interface. Auto-detects what you have installed (Gemini CLI, Codex CLI, or local Ollama) and registers the appropriate tools. You can ask Gemini for 1M+ token context reviews, hit Codex for GPT-5.5 responses, or keep everything local with Ollama. Ships with ask-gemini, ask-codex, ask-ollama, and ask-llm tools that handle prompts and structured edits. The multi-llm tool fans out to all providers in parallel for side-by-side comparisons. Supports session continuity with sessionId parameters for multi-turn conversations. Useful when you want a second AI opinion on architecture, code reviews, or need to leverage different model strengths without switching contexts.
| Package | Type | Version | Downloads |
|---|---|---|---|
ask-gemini-mcp | MCP Server | ||
ask-codex-mcp | MCP Server | ||
ask-ollama-mcp | MCP Server | ||
ask-antigravity-mcp | MCP Server | ||
ask-llm-mcp | MCP Server | ||
@ask-llm/plugin | Claude Code Plugin | /plugin install |
MCP servers + Claude Code plugin for AI-to-AI collaboration
Get a second opinion before you ship. Ask LLM lets your AI assistant — Claude Code, Cursor, Claude Desktop, or any of 40+ MCP clients — consult a second model to review your code, debate a plan, or catch a bug it might have missed. Pick the reviewer that fits: OpenAI Codex (GPT-5.5), Google Antigravity (agy), a local Ollama model, or Gemini (1M+ token context). Standard MCP, no prompt hacks.
⚠️ Gemini CLI goes enterprise-only on 2026-06-18: From that date Google restricts Gemini CLI to Gemini Code Assist Standard/Enterprise seats, and free, Google AI Pro, and Ultra accounts lose access.
ask-gemini-mcpstill installs, but a non-enterprise account then surfaces actionable guidance instead of output. Free/Pro users: switch toask-antigravity(the Google-sanctioned successor, subscription-backed via Google AI Pro/Ultra),ask-codex, orask-ollama. Announcement
Your primary AI is confident — but confidence isn't correctness. A second model, with no stake in the first one's answer, catches what it missed.
You: ask codex to review src/auth.ts for security issues
Codex: ⚠ verifyToken() compares tokens with === — not timing-safe (line 42)
⚠ the session cookie is missing a SameSite attribute
Claude: Good catches — applying both fixes to src/auth.ts.
One prompt. A second model reviews independently; your assistant applies the fix — no copy-paste between tools.
# All-in-one — auto-detects installed providers
claude mcp add --scope user ask-llm -- npx -y ask-llm-mcp
claude mcp add --scope user gemini -- npx -y ask-gemini-mcp
claude mcp add --scope user codex -- npx -y ask-codex-mcp
claude mcp add --scope user ollama -- npx -y ask-ollama-mcp
claude mcp add --scope user antigravity -- npx -y ask-antigravity-mcp
Add to claude_desktop_config.json:
{
"mcpServers": {
"ask-llm": {
"command": "npx",
"args": ["-y", "ask-llm-mcp"]
}
}
}
{
"mcpServers": {
"gemini": {
"command": "npx",
"args": ["-y", "ask-gemini-mcp"]
},
"codex": {
"command": "npx",
"args": ["-y", "ask-codex-mcp"]
},
"ollama": {
"command": "npx",
"args": ["-y", "ask-ollama-mcp"]
}
}
}
Cursor (.cursor/mcp.json):
{
"mcpServers": {
"ask-llm": { "command": "npx", "args": ["-y", "ask-llm-mcp"] }
}
}
Codex CLI (~/.codex/config.toml):
[mcp_servers.ask-llm]
command = "npx"
args = ["-y", "ask-llm-mcp"]
Any MCP Client (STDIO transport):
{ "command": "npx", "args": ["-y", "ask-llm-mcp"] }
Replace ask-llm-mcp with ask-codex-mcp, ask-antigravity-mcp, ask-ollama-mcp, or ask-gemini-mcp for a single provider.
| Provider | Best for | Model (default → fallback) | Notes |
|---|---|---|---|
| Codex | Code reasoning, targeted reviews, architecture critique | gpt-5.5 → gpt-5.4-mini | Requires an OpenAI/Codex account |
| Antigravity | A subscription-backed second opinion; larger-context reads | Gemini 3.1 Pro (High) → Gemini 3.5 Flash (High) | Google AI Pro/Ultra plan; one-shot, experimental |
| Ollama | Private/local review, zero cost, offline | qwen3.6:27b (no auto-fallback) | Runs entirely on your machine |
| Gemini | Whole-codebase reads (1M+ tokens) | gemini-3.1-pro-preview → gemini-3.5-flash | ⚠️ Enterprise-gated from 2026-06-18 |
Unified (ask-llm) | One install for all of the above; fan out in parallel | routes per call | Recommended |
The Ask LLM plugin adds multi-provider code review, brainstorming, and automated hooks directly into Claude Code:
/plugin marketplace add Lykhoyda/ask-llm
/plugin install ask-llm@ask-llm-plugins
| Feature | Description |
|---|---|
/multi-review | Parallel Antigravity + Codex review with 4-phase validation pipeline and consensus highlighting (gemini via /gemini-review) |
/gemini-review | Gemini-only review with confidence filtering |
/codex-review | Codex-only review with confidence filtering |
/ollama-review | Local review — no data leaves your machine |
/antigravity-review | Subscription-backed review via Google Antigravity (agy) — experimental |
/brainstorm | Multi-LLM brainstorm: Claude Opus researches the topic against real files in parallel with external providers (Gemini/Codex/Ollama), then synthesizes all findings with verified findings weighted higher |
/compare | Side-by-side raw responses from multiple providers, no synthesis — for when you want to see how each provider phrases the same answer |
codex-pair hook | Opt-in continuous review — runs Codex against every Edit/Write/MultiEdit when a .codex-pair/context.md marker is present in the project |
The review agents use a 4-phase pipeline inspired by Anthropic's code-review plugin: context gathering, prompt construction with explicit false-positive exclusions, synthesis, and source-level validation of each finding.
See the plugin docs for details.
agy) — installed and logged in once (Google AI Pro/Ultra)ollama pull qwen3.6:27b)npm install -g @google/gemini-cli && gemini login (enterprise-gated from 2026-06-18)| Tool | Package | Purpose |
|---|---|---|
ask-gemini | ask-gemini-mcp | Send prompts to Gemini CLI with @ file syntax. 1M+ token context. Live progressive output via stream-json |
ask-gemini-edit | ask-gemini-mcp | Get structured OLD/NEW code edit blocks from Gemini |
fetch-chunk | ask-gemini-mcp | Retrieve chunks from cached large responses |
ask-codex | ask-codex-mcp | Send prompts to Codex CLI. GPT-5.5 with mini fallback. Native session resume via sessionId |
ask-ollama | ask-ollama-mcp | Send prompts to local Ollama. Fully private, zero cost. Server-side conversation replay via sessionId |
ask-antigravity | ask-antigravity-mcp | Send a prompt to Google Antigravity (agy) for a subscription-backed second opinion. Experimental; one-shot |
ask-llm | ask-llm-mcp | Unified orchestrator — pick provider per call. Fan out to all installed providers |
multi-llm | ask-llm-mcp | Dispatch the same prompt to multiple providers in parallel; returns per-provider responses + usage in one call |
get-usage-stats | all | Per-session token totals, fallback counts, breakdowns by provider/model — all in-memory, no persistence |
diagnose | ask-llm-mcp | Self-diagnosis: Node version, PATH resolution, provider CLI presence + versions. Read-only |
ping | all | Connection test — verify MCP setup |
All ask-* tools accept an optional sessionId parameter for multi-turn conversations and now return a structured AskResponse (provider, response, model, sessionId, usage) via MCP outputSchema alongside the human-readable text. The orchestrator (ask-llm-mcp) also exposes usage://current-session as an MCP Resource for live JSON snapshots.
ask codex to review the changes in src/auth.ts for security issues
ask antigravity to debate this architecture plan in docs/design.md
ask ollama to explain src/config.ts (runs locally, no data sent anywhere)
ask gemini to summarize @. the current directory (1M+ context, @ is Gemini-only)
use multi-llm to compare what codex and gemini think about this approach
The orchestrator binary (ask-llm-mcp) supports two CLI modes alongside the default MCP server:
# Interactive multi-provider REPL — switch providers, persist sessions, see usage live
npx ask-llm-mcp repl
# Diagnose your setup — Node version, PATH, provider CLI versions, env vars
npx ask-llm-mcp doctor # human-readable
npx ask-llm-mcp doctor --json # machine-readable, exit 1 on error
The REPL ships sessions per provider (/provider gemini, /provider codex, /new, /sessions, /usage) and inherits all the executor behavior (quota fallback, stream-json output for Gemini, native session resume).
| Provider | Default | Fallback |
|---|---|---|
| Gemini | gemini-3.1-pro-preview | gemini-3.5-flash (on quota) |
| Codex | gpt-5.5 | gpt-5.4-mini (on quota) |
| Ollama | qwen3.6:27b | — (local; errors if the model isn't pulled) |
Gemini and Codex automatically fall back to a lighter model on quota errors. Ollama runs locally and never substitutes a model — if the requested model isn't pulled, it returns a clear ollama pull error.
Contributions are welcome! See open issues for things to work on.
MIT License. See LICENSE for details.
Disclaimer: This is an unofficial, third-party tool and is not affiliated with, endorsed, or sponsored by Google or OpenAI.
ASK_ANTIGRAVITY_TIMEOUT_MSdefault: 300000Timeout for Antigravity (agy) execution in milliseconds (default: 300000 = 5 minutes)
ASK_ANTIGRAVITY_SANDBOXdefault: 1Set to '0' to drop agy's --sandbox flag if it blocks --add-dir context reads (default: sandbox on)
GMCPT_LOG_LEVELdefault: warnLog verbosity: debug, info, warn, error (default: warn)
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent