CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Mcp As A Judge

othervibes/mcp-as-a-judge
17STDIOregistry active
Summary

This is a validation layer that sits between your AI coding assistant and the LLM, acting as a gatekeeper for code changes. It exposes tools like judge_coding_plan, judge_code_change, and judge_testing_implementation that use MCP sampling to evaluate whether your research is thorough, your diffs meet engineering standards, and your tests actually work. It also includes raise_missing_requirements and raise_obstacle for interactive decision making via MCP elicitation. You'd reach for this if you're tired of AI assistants hallucinating APIs, skipping proper testing, or making unilateral architectural decisions. Works best with GitHub Copilot in VS Code where sampling is native, though other assistants need an LLM API key configured. Think of it as enforcing a code review process before the AI's changes land.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

MCP as a Judge ⚖️

mcp-name: io.github.OtherVibes/mcp-as-a-judge

MCP as a Judge Logo

MCP as a Judge acts as a validation layer between AI coding assistants and LLMs, helping ensure safer and higher-quality code.

License: MIT Python 3.13+ MCP Compatible

CI Release PyPI version

MCP as a Judge is a behavioral MCP that strengthens AI coding assistants by requiring explicit LLM evaluations for:

  • Research, system design, and planning
  • Code changes, testing, and task-completion verification

It enforces evidence-based research, reuse over reinvention, and human-in-the-loop decisions.

If your IDE has rules/agents (Copilot, Cursor, Claude Code), keep using them—this Judge adds enforceable approval gates on plan, code diffs, and tests.

Key problems with AI coding assistants and LLMs

  • Treat LLM output as ground truth; skip research and use outdated information
  • Reinvent the wheel instead of reusing libraries and existing code
  • Cut corners: code below engineering standards and weak tests
  • Make unilateral decisions when requirements are ambiguous or plans change
  • Security blind spots: missing input validation, injection risks/attack vectors, least‑privilege violations, and weak defensive programming

Vibe coding doesn’t have to be frustrating

What it enforces

  • Evidence‑based research and reuse (best practices, libraries, existing code)
  • Plan‑first delivery aligned to user requirements
  • Human‑in‑the‑loop decisions for ambiguity and blockers
  • Quality gates on code and tests (security, performance, maintainability)

Key capabilities

  • Intelligent code evaluation via MCP sampling; enforces software‑engineering standards and flags security/performance/maintainability risks
  • Comprehensive plan/design review: validates architecture, research depth, requirements fit, and implementation approach
  • User‑driven decisions via MCP elicitation: clarifies requirements, resolves obstacles, and keeps choices transparent
  • Security validation in system design and code changes

Tools and how they help

ToolWhat it solves
set_coding_taskCreates/updates task metadata; classifies task_size; returns next-step workflow guidance
get_current_coding_taskRecovers the latest task_id and metadata to resume work safely
judge_coding_planValidates plan/design; requires library selection and internal reuse maps; flags risks
judge_code_changeReviews unified Git diffs for correctness, reuse, security, and code quality
judge_testing_implementationValidates tests using real runner output and optional coverage
judge_coding_task_completionFinal gate ensuring plan, code, and tests approvals before completion
raise_missing_requirementsElicits missing details and decisions to unblock progress
raise_obstacleEngages the user on trade‑offs, constraints, and enforced changes

🚀 Quick Start

Requirements & Recommendations

MCP Client Prerequisites

MCP as a Judge is heavily dependent on MCP Sampling and MCP Elicitation features for its core functionality:

  • MCP Sampling - Required for AI-powered code evaluation and judgment
  • MCP Elicitation - Required for interactive user decision prompts

System Prerequisites

  • Docker Desktop / Python 3.13+ - Required for running the MCP server

Supported AI Assistants

AI AssistantPlatformMCP SupportStatusNotes
GitHub CopilotVisual Studio Code✅ FullRecommendedComplete MCP integration with sampling and elicitation
Claude Code-⚠️ PartialRequires LLM API keySampling Support feature request
Elicitation Support feature request
Cursor-⚠️ PartialRequires LLM API keyMCP support available, but sampling/elicitation limited
Augment-⚠️ PartialRequires LLM API keyMCP support available, but sampling/elicitation limited
Qodo-⚠️ PartialRequires LLM API keyMCP support available, but sampling/elicitation limited

✅ Recommended setup: GitHub Copilot + VS Code — full MCP sampling; no API key needed.

⚠️ Critical: For assistants without full MCP sampling (Cursor, Claude Code, Augment, Qodo), you MUST set LLM_API_KEY. Without it, the server cannot evaluate plans or code. See LLM API Configuration.

💡 Tip: Prefer large context models (≥ 1M tokens) for better analysis and judgments.

If the MCP server isn’t auto‑used

For troubleshooting, visit the FAQs section.

🔧 MCP Configuration

Configure MCP as a Judge in your MCP-enabled client:

Method 1: Using Docker (Recommended)

One‑click install for VS Code (MCP)

Install for MCP as a Judge

Notes:

  • VS Code controls the sampling model; select it via “MCP: List Servers → mcp-as-a-judge → Configure Model Access”.
  1. Configure MCP Settings:

    Add this to your MCP client configuration file:

    {
      "command": "docker",
      "args": ["run", "--rm", "-i", "--pull=always", "ghcr.io/othervibes/mcp-as-a-judge:latest"],
      "env": {
        "LLM_API_KEY": "your-openai-api-key-here",
        "LLM_MODEL_NAME": "gpt-4o-mini"
      }
    }
    

    📝 Configuration Options (All Optional):

    • LLM_API_KEY: Optional for GitHub Copilot + VS Code (has built-in MCP sampling)
    • LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults)
    • The --pull=always flag ensures you always get the latest version automatically

    Then manually update when needed:

    # Pull the latest version
    docker pull ghcr.io/othervibes/mcp-as-a-judge:latest
    

Method 2: Using uv

  1. Install the package:

    uv tool install mcp-as-a-judge
    
  2. Configure MCP Settings:

    The MCP server may be automatically detected by your MCP‑enabled client.

    📝 Notes:

    • No additional configuration needed for GitHub Copilot + VS Code (has built-in MCP sampling)
    • LLM_API_KEY is optional and can be set via environment variable if needed
  3. To update to the latest version:

    # Update MCP as a Judge to the latest version
    uv tool upgrade mcp-as-a-judge
    

Select a sampling model in VS Code

  • Open Command Palette (Cmd/Ctrl+Shift+P) → “MCP: List Servers”
  • Select the configured server “mcp-as-a-judge”
  • Choose “Configure Model Access”
  • Check your preferred model(s) to enable sampling

🔑 LLM API Configuration (Optional)

For AI assistants without full MCP sampling support you can configure an LLM API key as a fallback. This ensures MCP as a Judge works even when the client doesn't support MCP sampling.

  • Set LLM_API_KEY (unified key). Vendor is auto-detected; optionally set LLM_MODEL_NAME to override the default.

Supported LLM Providers

RankProviderAPI Key FormatDefault ModelNotes
1OpenAIsk-...gpt-4.1Fast and reliable model optimized for speed
2Anthropicsk-ant-...claude-sonnet-4-20250514High-performance with exceptional reasoning
3GoogleAIza...gemini-2.5-proMost advanced model with built-in thinking
4Azure OpenAI[a-f0-9]{32}gpt-4.1Same as OpenAI but via Azure
5AWS BedrockAWS credentialsanthropic.claude-sonnet-4-20250514-v1:0Aligned with Anthropic
6Vertex AIService Account JSONgemini-2.5-proEnterprise Gemini via Google Cloud
7Groqgsk_...deepseek-r1Best reasoning model with speed advantage
8OpenRoutersk-or-...deepseek/deepseek-r1Best reasoning model available
9xAIxai-...grok-code-fast-1Latest coding-focused model (Aug 2025)
10Mistral[a-f0-9]{64}pixtral-largeMost advanced model (124B params)

Client-Specific Setup

Cursor

  1. Open Cursor Settings:

    • Go to File → Preferences → Cursor Settings
    • Navigate to the MCP tab
    • Click + Add to add a new MCP server
  2. Add MCP Server Configuration:

    {
      "command": "uv",
      "args": ["tool", "run", "mcp-as-a-judge"],
      "env": {
        "LLM_API_KEY": "your-openai-api-key-here",
        "LLM_MODEL_NAME": "gpt-4.1"
      }
    }
    

    📝 Configuration Options:

    • LLM_API_KEY: Required for Cursor (limited MCP sampling)
    • LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults)

Claude Code

  1. Add MCP Server via CLI:

    # Set environment variables first (optional model override)
    export LLM_API_KEY="your_api_key_here"
    export LLM_MODEL_NAME="claude-3-5-haiku"  # Optional: faster/cheaper model
    
    # Add MCP server
    claude mcp add mcp-as-a-judge -- uv tool run mcp-as-a-judge
    
  2. Alternative: Manual Configuration:

    • Create or edit ~/.config/claude-code/mcp_servers.json
    {
      "command": "uv",
      "args": ["tool", "run", "mcp-as-a-judge"],
      "env": {
        "LLM_API_KEY": "your-anthropic-api-key-here",
        "LLM_MODEL_NAME": "claude-3-5-haiku"
      }
    }
    

    📝 Configuration Options:

    • LLM_API_KEY: Required for Claude Code (limited MCP sampling)
    • LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults)

Other MCP Clients

For other MCP-compatible clients, use the standard MCP server configuration:

{
  "command": "uv",
  "args": ["tool", "run", "mcp-as-a-judge"],
  "env": {
    "LLM_API_KEY": "your-openai-api-key-here",
    "LLM_MODEL_NAME": "gpt-5"
  }
}

📝 Configuration Options:

  • LLM_API_KEY: Required for most MCP clients (except GitHub Copilot + VS Code)
  • LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults)

🔒 Privacy & Flexible AI Integration

🔑 MCP Sampling (Preferred) + LLM API Key Fallback

Primary Mode: MCP Sampling

  • All judgments are performed using MCP Sampling capability
  • No need to configure or pay for external LLM API services
  • Works directly with your MCP-compatible client's existing AI model
  • Currently supported by: GitHub Copilot + VS Code

Fallback Mode: LLM API Key

  • When MCP sampling is not available, the server can use LLM API keys
  • Supports multiple providers via LiteLLM: OpenAI, Anthropic, Google, Azure, Groq, Mistral, xAI
  • Automatic vendor detection from API key patterns
  • Default model selection per vendor when no model is specified

🛡️ Your Privacy Matters

  • The server runs locally on your machine
  • No data collection - your code and conversations stay private
  • No external API calls when using MCP Sampling. If you set LLM_API_KEY for fallback, the server will call your chosen LLM provider only to perform judgments (plan/code/test) with the evaluation content you provide.
  • Complete control over your development workflow and sensitive information

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone the repository
git clone https://github.com/OtherVibes/mcp-as-a-judge.git
cd mcp-as-a-judge

# Install dependencies with uv
uv sync --all-extras --dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest

# Run all checks
uv run pytest && uv run ruff check && uv run ruff format --check && uv run mypy src

© Concepts and Methodology

© 2025 OtherVibes and Zvi Fried. The "MCP as a Judge" concept, the "behavioral MCP" approach, the staged workflow (plan → code → test → completion), tool taxonomy/descriptions, and prompt templates are original work developed in this repository.

Prior Art and Attribution

While “LLM‑as‑a‑judge” is a broadly known idea, this repository defines the original “MCP as a Judge” behavioral MCP pattern by OtherVibes and Zvi Fried. It combines task‑centric workflow enforcement (plan → code → test → completion), explicit LLM‑based validations, and human‑in‑the‑loop elicitation, along with the prompt templates and tool taxonomy provided here. Please attribute as: “OtherVibes – MCP as a Judge (Zvi Fried)”.

❓ FAQ

How is “MCP as a Judge” different from rules/subagents in IDE assistants (GitHub Copilot, Cursor, Claude Code)?

FeatureIDE RulesSubagentsMCP as a Judge
Static behavior guidance✓✓✗
Custom system prompts✓✓✓
Project context integration✓✓✓
Specialized task handling✗✓✓
Active quality gates✗✗✓
Evidence-based validation✗✗✓
Approve/reject with feedback✗✗✓
Workflow enforcement✗✗✓
Cross-assistant compatibility✗✗✓
  • References: GitHub Copilot Custom Instructions, Cursor Rules, Claude Code Subagents

How does the Judge workflow relate to the tasklist? Why do we need both?

  • Tasklist = planning/organization: tracks tasks, priorities, and status. It doesn’t guarantee engineering quality or readiness.
  • Judge workflow = quality gates: enforces approvals for plan/design, code diffs, tests, and final completion. It demands real evidence (e.g., unified Git diffs and raw test output) and returns structured approvals and required improvements.
  • Together: Use the tasklist to organize work; use the Judge to decide when each stage is actually ready to proceed. The server also emits next_tool guidance to keep progress moving through the gates.

If the Judge isn’t used automatically, how do I force it?

  • In your prompt: "use mcp-as-a-judge" or "Evaluate plan/code/test using the MCP server mcp-as-a-judge".
  • VS Code: Command Palette → "MCP: List Servers" → ensure "mcp-as-a-judge" is listed and enabled.
  • Ensure the MCP server is running and, in your client, the judge tools are enabled/approved.

How do I select models for sampling in VS Code?

  • Open Command Palette (Cmd/Ctrl+Shift+P) → "MCP: List Servers"
  • Select "mcp-as-a-judge" → "Configure Model Access"
  • Check your preferred model(s) to enable sampling

📄 License

This project is licensed under the MIT License (see LICENSE).

🙏 Acknowledgments

  • Model Context Protocol by Anthropic
  • LiteLLM for unified LLM API integration

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
AI & LLM Tools
Registryactive
Packagemcp-as-a-judge
TransportSTDIO
UpdatedSep 20, 2025
View on GitHub

Related AI & LLM Tools MCP Servers

View all →
SkillFM LLM Cost Optimizer

io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage

LLM cost optimizer for OpenAI, Anthropic, token usage, BYOK, and SkillFM Beacon audits.
Llm Orchestration Agent

io.github.mikerawsonnz/llm-orchestration-agent

Run a prompt through a LangChain (system + human) chain over Gemini on Vertex AI; optional LangSmith
Authenticated Llm Agent

io.github.mikerawsonnz/authenticated-llm-agent

JWT-gated LLM gateway: authenticate (bcrypt/JWT), then run a LangChain-on-Vertex Gemini completion.
Copilot Memory MCP

labforgedev/copilot-memory-mcp

Persistent semantic memory for AI agents using local ChromaDB vector search. No cloud required.
1
Agent Prompt Injection Firewall Mcp

csoai-org/agent-prompt-injection-firewall-mcp

The WAF for agents. Pattern-based + heuristic firewall scans prompts, RAG documents, tool argume...
Authenticated Multi Llm Agent

io.github.mikerawsonnz/authenticated-multi-llm-agent

Google-OAuth-gated LLM gateway: verify a Google ID token, then run a Gemini (Vertex AI) completion f