Before you fire off another expensive Claude API call, this server asks whether a local model could handle it instead. It exposes a single tool, check_local_viability, that analyzes your task and returns LOCAL or CLOUD with reasoning, cost savings estimates, and recommended local models like llama3.2 or mistral-7b. If you mark data as CONFIDENTIAL, it forces local routing regardless of complexity. Works over stdio or HTTP, supports LangChain and OpenAI Agents SDK integrations. Free tier gives you 20 routing decisions per month, paid tiers unlock bundles. Think of it as a gatekeeper that saves you money by catching tasks that don't need Sonnet's firepower.
claude mcp add --transport http local-model-suitability-mcp https://local-model-suitability-mcp-production.up.railway.appRun in your terminal. Add --scope user to make it available in every project.
Review the command, arguments, and environment values before installing — MCP servers run with your local permissions.
Verified live against the running server on Jun 10, 2026.
check_local_viabilityCall this BEFORE your agent routes any task to a cloud model — at the moment a task is ready to be sent and the model has not yet been selected. Returns LOCAL/CLOUD/EITHER verdict with recommended model. If verdict is LOCAL, skip the cloud call entirely — saves cost on every r...3 paramsCall this BEFORE your agent routes any task to a cloud model — at the moment a task is ready to be sent and the model has not yet been selected. Returns LOCAL/CLOUD/EITHER verdict with recommended model. If verdict is LOCAL, skip the cloud call entirely — saves cost on every r...
task*stringdata_sensitivitystringPUBLIC · INTERNAL · CONFIDENTIALquality_thresholdstringPRODUCTION · PROTOTYPE · BEST_EFFORTCloud inference is expensive. Everything that can run locally should.
This MCP server tells your agent — before every cloud API call — whether the task can be handled by a local model instead. Route to Ollama, LM Studio, or llama.cpp when you can. Only pay for cloud when you must.
check_local_viabilityCall this BEFORE every cloud inference call. If verdict is LOCAL, skip the cloud call entirely and route to your local model. Only use cloud when this tool returns CLOUD.
Inputs:
| Field | Required | Description |
|---|---|---|
task | ✅ | The exact task you are about to send to a cloud model |
quality_threshold | Optional | PRODUCTION (default) / PROTOTYPE / BEST_EFFORT |
data_sensitivity | Optional | PUBLIC (default) / INTERNAL / CONFIDENTIAL |
CONFIDENTIAL forces LOCAL regardless of task complexity — data never leaves the machine.
Response:
{
"verdict": "LOCAL",
"confidence": "HIGH",
"reason": "Simple text summarisation — no reasoning depth required. Any 7B+ local model handles this well.",
"estimated_cost_saving": "$0.002-0.008 saved per call at claude-sonnet pricing",
"recommended_local_models": ["llama3.2:8b", "mistral-7b", "phi3:medium"],
"cloud_justified_reason": null,
"analysis_type": "AI-powered cost routing — NOT a simple lookup"
}
| Plan | Calls | Price |
|---|---|---|
| Free | 20/month | $0 |
| Starter | 500-call bundle | $20 |
| Pro | 2,000-call bundle | $70 |
{
"mcpServers": {
"local-model-suitability": {
"command": "npx",
"args": ["-y", "local-model-suitability-mcp"],
"env": {
"ANTHROPIC_API_KEY": "your-key",
"API_KEY": "your-lms-api-key-for-paid-tier"
}
}
}
}
Free tier requires no API key — tracked by IP.
{
"mcpServers": {
"local-model-suitability": {
"type": "http",
"url": "https://local-model-suitability-mcp-production.up.railway.app"
}
}
}
from langchain_mcp_adapters.client import MultiServerMCPClient
client = MultiServerMCPClient({
"local-model-suitability": {
"url": "https://local-model-suitability-mcp-production.up.railway.app",
"transport": "http"
}
})
tools = await client.get_tools()
from agents import Agent, HostedMCPTool
agent = Agent(
name="Assistant",
tools=[HostedMCPTool(tool_config={
"type": "mcp",
"server_label": "local-model-suitability",
"server_url": "https://local-model-suitability-mcp-production.up.railway.app",
"require_approval": "never"
})]
)
Same as LangChain above — langchain-mcp-adapters works with LangGraph natively.
Results are for cost-optimisation guidance only and do not constitute technical advice. Full terms: kordagencies.com/terms.html
ANTHROPIC_API_KEY*secretAnthropic API key for Claude routing analysis
gongrzhe/office-powerpoint-mcp-server
gongrzhe/office-word-mcp-server
io.github.mindstone/mcp-server-office
greirson/mcp-todoist
henilcalagiya/mcp-apple-notes
ankimcp/anki-mcp-server-addon