This is a validation layer that sits between your AI coding assistant and the LLM, acting as a gatekeeper for code changes. It exposes tools like judge_coding_plan, judge_code_change, and judge_testing_implementation that use MCP sampling to evaluate whether your research is thorough, your diffs meet engineering standards, and your tests actually work. It also includes raise_missing_requirements and raise_obstacle for interactive decision making via MCP elicitation. You'd reach for this if you're tired of AI assistants hallucinating APIs, skipping proper testing, or making unilateral architectural decisions. Works best with GitHub Copilot in VS Code where sampling is native, though other assistants need an LLM API key configured. Think of it as enforcing a code review process before the AI's changes land.
mcp-name: io.github.OtherVibes/mcp-as-a-judge
MCP as a Judge acts as a validation layer between AI coding assistants and LLMs, helping ensure safer and higher-quality code.
MCP as a Judge is a behavioral MCP that strengthens AI coding assistants by requiring explicit LLM evaluations for:
It enforces evidence-based research, reuse over reinvention, and human-in-the-loop decisions.
If your IDE has rules/agents (Copilot, Cursor, Claude Code), keep using them—this Judge adds enforceable approval gates on plan, code diffs, and tests.
| Tool | What it solves |
|---|---|
set_coding_task | Creates/updates task metadata; classifies task_size; returns next-step workflow guidance |
get_current_coding_task | Recovers the latest task_id and metadata to resume work safely |
judge_coding_plan | Validates plan/design; requires library selection and internal reuse maps; flags risks |
judge_code_change | Reviews unified Git diffs for correctness, reuse, security, and code quality |
judge_testing_implementation | Validates tests using real runner output and optional coverage |
judge_coding_task_completion | Final gate ensuring plan, code, and tests approvals before completion |
raise_missing_requirements | Elicits missing details and decisions to unblock progress |
raise_obstacle | Engages the user on trade‑offs, constraints, and enforced changes |
MCP as a Judge is heavily dependent on MCP Sampling and MCP Elicitation features for its core functionality:
| AI Assistant | Platform | MCP Support | Status | Notes |
|---|---|---|---|---|
| GitHub Copilot | Visual Studio Code | ✅ Full | Recommended | Complete MCP integration with sampling and elicitation |
| Claude Code | - | ⚠️ Partial | Requires LLM API key | Sampling Support feature request Elicitation Support feature request |
| Cursor | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |
| Augment | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |
| Qodo | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |
✅ Recommended setup: GitHub Copilot + VS Code — full MCP sampling; no API key needed.
⚠️ Critical: For assistants without full MCP sampling (Cursor, Claude Code, Augment, Qodo), you MUST set LLM_API_KEY. Without it, the server cannot evaluate plans or code. See LLM API Configuration.
💡 Tip: Prefer large context models (≥ 1M tokens) for better analysis and judgments.
For troubleshooting, visit the FAQs section.
Configure MCP as a Judge in your MCP-enabled client:
Notes:
Configure MCP Settings:
Add this to your MCP client configuration file:
{
"command": "docker",
"args": ["run", "--rm", "-i", "--pull=always", "ghcr.io/othervibes/mcp-as-a-judge:latest"],
"env": {
"LLM_API_KEY": "your-openai-api-key-here",
"LLM_MODEL_NAME": "gpt-4o-mini"
}
}
📝 Configuration Options (All Optional):
--pull=always flag ensures you always get the latest version automaticallyThen manually update when needed:
# Pull the latest version
docker pull ghcr.io/othervibes/mcp-as-a-judge:latest
Install the package:
uv tool install mcp-as-a-judge
Configure MCP Settings:
The MCP server may be automatically detected by your MCP‑enabled client.
📝 Notes:
To update to the latest version:
# Update MCP as a Judge to the latest version
uv tool upgrade mcp-as-a-judge
For AI assistants without full MCP sampling support you can configure an LLM API key as a fallback. This ensures MCP as a Judge works even when the client doesn't support MCP sampling.
LLM_API_KEY (unified key). Vendor is auto-detected; optionally set LLM_MODEL_NAME to override the default.| Rank | Provider | API Key Format | Default Model | Notes |
|---|---|---|---|---|
| 1 | OpenAI | sk-... | gpt-4.1 | Fast and reliable model optimized for speed |
| 2 | Anthropic | sk-ant-... | claude-sonnet-4-20250514 | High-performance with exceptional reasoning |
| 3 | AIza... | gemini-2.5-pro | Most advanced model with built-in thinking | |
| 4 | Azure OpenAI | [a-f0-9]{32} | gpt-4.1 | Same as OpenAI but via Azure |
| 5 | AWS Bedrock | AWS credentials | anthropic.claude-sonnet-4-20250514-v1:0 | Aligned with Anthropic |
| 6 | Vertex AI | Service Account JSON | gemini-2.5-pro | Enterprise Gemini via Google Cloud |
| 7 | Groq | gsk_... | deepseek-r1 | Best reasoning model with speed advantage |
| 8 | OpenRouter | sk-or-... | deepseek/deepseek-r1 | Best reasoning model available |
| 9 | xAI | xai-... | grok-code-fast-1 | Latest coding-focused model (Aug 2025) |
| 10 | Mistral | [a-f0-9]{64} | pixtral-large | Most advanced model (124B params) |
Open Cursor Settings:
File → Preferences → Cursor SettingsMCP tab+ Add to add a new MCP serverAdd MCP Server Configuration:
{
"command": "uv",
"args": ["tool", "run", "mcp-as-a-judge"],
"env": {
"LLM_API_KEY": "your-openai-api-key-here",
"LLM_MODEL_NAME": "gpt-4.1"
}
}
📝 Configuration Options:
Add MCP Server via CLI:
# Set environment variables first (optional model override)
export LLM_API_KEY="your_api_key_here"
export LLM_MODEL_NAME="claude-3-5-haiku" # Optional: faster/cheaper model
# Add MCP server
claude mcp add mcp-as-a-judge -- uv tool run mcp-as-a-judge
Alternative: Manual Configuration:
~/.config/claude-code/mcp_servers.json{
"command": "uv",
"args": ["tool", "run", "mcp-as-a-judge"],
"env": {
"LLM_API_KEY": "your-anthropic-api-key-here",
"LLM_MODEL_NAME": "claude-3-5-haiku"
}
}
📝 Configuration Options:
For other MCP-compatible clients, use the standard MCP server configuration:
{
"command": "uv",
"args": ["tool", "run", "mcp-as-a-judge"],
"env": {
"LLM_API_KEY": "your-openai-api-key-here",
"LLM_MODEL_NAME": "gpt-5"
}
}
📝 Configuration Options:
Primary Mode: MCP Sampling
Fallback Mode: LLM API Key
LLM_API_KEY for fallback, the server will call your chosen LLM provider only to perform judgments (plan/code/test) with the evaluation content you provide.We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Clone the repository
git clone https://github.com/OtherVibes/mcp-as-a-judge.git
cd mcp-as-a-judge
# Install dependencies with uv
uv sync --all-extras --dev
# Install pre-commit hooks
uv run pre-commit install
# Run tests
uv run pytest
# Run all checks
uv run pytest && uv run ruff check && uv run ruff format --check && uv run mypy src
© 2025 OtherVibes and Zvi Fried. The "MCP as a Judge" concept, the "behavioral MCP" approach, the staged workflow (plan → code → test → completion), tool taxonomy/descriptions, and prompt templates are original work developed in this repository.
While “LLM‑as‑a‑judge” is a broadly known idea, this repository defines the original “MCP as a Judge” behavioral MCP pattern by OtherVibes and Zvi Fried. It combines task‑centric workflow enforcement (plan → code → test → completion), explicit LLM‑based validations, and human‑in‑the‑loop elicitation, along with the prompt templates and tool taxonomy provided here. Please attribute as: “OtherVibes – MCP as a Judge (Zvi Fried)”.
| Feature | IDE Rules | Subagents | MCP as a Judge |
|---|---|---|---|
| Static behavior guidance | ✓ | ✓ | ✗ |
| Custom system prompts | ✓ | ✓ | ✓ |
| Project context integration | ✓ | ✓ | ✓ |
| Specialized task handling | ✗ | ✓ | ✓ |
| Active quality gates | ✗ | ✗ | ✓ |
| Evidence-based validation | ✗ | ✗ | ✓ |
| Approve/reject with feedback | ✗ | ✗ | ✓ |
| Workflow enforcement | ✗ | ✗ | ✓ |
| Cross-assistant compatibility | ✗ | ✗ | ✓ |
This project is licensed under the MIT License (see LICENSE).
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent