GammaInfra gives your MCP client four tools to call any major LLM provider through a unified OpenAI-shaped API. The chat_completions tool handles model routing (auto, fast, cheap, or direct pins like openai/gpt-5-mini) and returns structured routing metadata showing which provider served the request, exact cost in USD, and fallback chains. list_models pulls the full catalog with pricing and capability flags, get_balance shows managed and BYOK credits, and get_status surfaces per-provider health and latency. Useful when you want an agent to pick models dynamically based on cost, speed, or quality constraints without hardcoding provider SDKs. Runs via npx with just an API key.
Model Context Protocol (MCP) server for GammaInfra — intelligent LLM routing across every major provider via one OpenAI-shape API.
Drop this server into Claude Code, Claude Desktop, Cursor, Cline, Continue, or any MCP-compatible host, and your agent gets direct tool access to:
chat_completions — call any supported model (or gammainfra/auto for smart routing) with cost, latency, and quality controls. Routing metadata (which provider served, exact cost in USD, fallback chain) is returned as a structured routing_meta field.list_models — full model catalog with pricing and capability flags.get_balance — managed + BYOK balances.get_status — overall + per-provider health, 24h request count.The server runs via npx — no manual install needed. The first invocation downloads and caches the package.
claude mcp add gammainfra \
--env GAMMAINFRA_API_KEY=sk-gammainfra-... \
-- npx -y @gammainfra/mcp-server
Or edit ~/.claude.json and add to the mcpServers block:
{
"mcpServers": {
"gammainfra": {
"command": "npx",
"args": ["-y", "@gammainfra/mcp-server"],
"env": { "GAMMAINFRA_API_KEY": "sk-gammainfra-..." }
}
}
}
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"gammainfra": {
"command": "npx",
"args": ["-y", "@gammainfra/mcp-server"],
"env": { "GAMMAINFRA_API_KEY": "sk-gammainfra-..." }
}
}
}
Restart Claude Desktop. The "GammaInfra" server should appear in the tools menu.
Edit ~/.cursor/mcp.json:
{
"mcpServers": {
"gammainfra": {
"command": "npx",
"args": ["-y", "@gammainfra/mcp-server"],
"env": { "GAMMAINFRA_API_KEY": "sk-gammainfra-..." }
}
}
}
Open Cline's settings (gear icon → MCP Servers tab) and add:
{
"gammainfra": {
"command": "npx",
"args": ["-y", "@gammainfra/mcp-server"],
"env": { "GAMMAINFRA_API_KEY": "sk-gammainfra-..." },
"disabled": false
}
}
| Var | Required | Default | Description |
|---|---|---|---|
GAMMAINFRA_API_KEY | yes | — | Your GammaInfra API key, format sk-gammainfra-{32_chars}. |
GAMMAINFRA_BASE_URL | no | https://api.gammainfra.com/v1 | Override for staging/dev. |
chat_completionsSend a chat completion request and receive the model response plus routing metadata.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
model | string | yes | gammainfra/auto for smart routing, gammainfra/fast/gammainfra/cheap for tier shortcuts, or pin a specific model like openai/gpt-5-mini. |
messages | array | yes | OpenAI-shape conversation messages. |
temperature | number | no | 0..2. |
max_tokens | int | no | |
max_completion_tokens | int | no | GPT-5 family requires this instead of max_tokens. |
cost_quality | float | no | 0.0..1.0 continuous dial. Sent as X-GammaInfra-Cost-Quality. |
max_latency_ms | int | no | 60..600000. Caps total wall-clock incl. fallback retries. Also enforced client-side as a hard request abort. |
preference | string | no | quality, cost, or latency. |
region | string | no | us, eu, apac, or specific AWS region. |
tools, tool_choice, response_format, top_p, frequency_penalty, presence_penalty | various | no | Standard OpenAI fields, forwarded as-is. |
Returns: { response: <OpenAI response>, routing_meta: { provider, endpoint, cost_usd, input_cost_usd, output_cost_usd, router_version, logical_model, fallback_chain, attempted_count, request_id, ... } }
Timeout note: Every request has a 10-minute client-side hard timeout (via AbortController) so a hung upstream can't wedge the MCP process. For chat_completions, a supplied max_latency_ms replaces that default as the hard abort bound.
Streaming note: MCP tool responses are non-streaming. The server always sends stream: false to the upstream and does not accept a stream parameter on the tool input (it's rejected by schema validation). For streaming, use the GammaInfra HTTP API directly.
list_modelsNo parameters. Returns the full model catalog including direct-pin slugs, per-token pricing, and capability flags (supports_tools, supports_vision).
get_balance| Name | Type | Required | Description |
|---|---|---|---|
include_byok | boolean | no | Default false. Also fetch the BYOK balance. Off by default to avoid an extra request — and a guaranteed 404 — for customers without BYOK enrollment. |
Returns { managed_balance_usd, byok_balance_usd, currency }. With include_byok omitted/false, byok_balance_usd is null and no BYOK request is made (no byok_error). With include_byok: true, if BYOK isn't enrolled, byok_balance_usd is null and a byok_error field describes the cause.
get_statusNo parameters. Returns GammaInfra's current overall health, per-provider state and live p50 latency, and 24h request count.
git clone https://github.com/yuz0101/gammainfra-mcp-server.git
cd gammainfra-mcp-server
npm install
npm run test # 30 tests, ~1s
npm run build # tsc → dist/
npm run typecheck
MIT — see LICENSE.
GAMMAINFRA_API_KEY*secretGammaInfra API key (sk-gammainfra-...). Free key at https://gammainfra.com/signup
GAMMAINFRA_BASE_URLOptional base URL override. Defaults to https://api.gammainfra.com/v1
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent