XFMS — Model Source

3 toolsauthHTTPregistry active

Summary

Connects your MCP client to the XFMS model recommendation engine at xfms.xpansion.dev. Exposes five tools: rank returns a scored shortlist of LLMs with rationale, pick gives you the single best match, discover infers which quality dimensions matter for your task without ranking, and compare/benchmark run live A/B tests against real models through OpenRouter. The engine aggregates eight benchmark sources, normalizes scores, and weights dimensions based on your stated purpose. The first three tools are free with no key required. Compare and benchmark need your OpenRouter API key passed as an X-OpenRouter-Key header since they actually run test queries against production models. Useful when you're tired of guessing which model to use for a specific task and want evidence instead of vibes.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Tools

Public tool metadata for what this MCP can expose to an agent.

3 tools

rankRank LLMs for a stated purpose. Returns a shortlist with weights, scores, and plain-English rationale per pick. Use when the user wants to see and compare alternatives, not just one answer.4 params

Rank LLMs for a stated purpose. Returns a shortlist with weights, scores, and plain-English rationale per pick. Use when the user wants to see and compare alternatives, not just one answer.

Parameters* required

top_ninteger

How many models to return in the ranked list. Defaults to 5. Use 1 if you only want the single best pick; use 10+ if you want to see deeper alternatives.default: 5

primaryarray

Mark dimensions as primary tier. When set, the engine switches from weighted-sum blending to lexicographic ordering: the primary dimension is the sole ranking axis, and other dimensions only break ties. Use when the user says 'cheapest model, period' or similar — their stated preference becomes sacrosanct.

purposestring

One sentence describing what the model will be used for. Be concrete, not vague: 'fixing bugs in a Python codebase' works; 'coding' does not. The more specific the purpose, the better XFMS can infer which quality dimensions matter.

capabilitiesarray

Required capabilities the model MUST support. Models missing any listed capability are filtered out before ranking. 'vision' = image input, 'audio_in' = audio input, 'tool_use' = function calling, 'structured_outputs' = JSON schema-constrained output. Omit when the task is plain text with no tool use.

pickReturn the single best LLM for a stated purpose. Concise output, no list. Use when the user has settled on the criteria and just wants one answer.1 params

Return the single best LLM for a stated purpose. Concise output, no list. Use when the user has settled on the criteria and just wants one answer.

Parameters* required

purposestring

One sentence describing what the model will be used for. Be concrete, not vague: 'summarizing 50-page commercial leases' works; 'summarization' does not.

discoverShow which quality dimensions matter for a stated purpose, WITHOUT ranking any models. Returns the inferred weights and the discovery-walk trace. Useful for understanding how XFMS interprets the purpose before committing to a pick.1 params

Show which quality dimensions matter for a stated purpose, WITHOUT ranking any models. Returns the inferred weights and the discovery-walk trace. Useful for understanding how XFMS interprets the purpose before committing to a pick.

Parameters* required

purposestring

One sentence describing the task. The tool returns which quality dimensions XFMS would weigh for this purpose, without actually ranking any models. Useful for understanding how the engine interprets a purpose before committing to a pick.

XFMS — Xpansion Framework Model Source

Pick the right LLM for your task — without the Twitter vibes.

State what you're using the model for. XFMS aggregates evidence from eight independent benchmark sources, normalizes it onto a common scale, lets your intent decide which dimensions matter, and returns a ranked shortlist with plain-English rationale for every pick.

XFMS is one module of the Xpansion Framework — a unified architecture for governing AI-assisted work.

What this repository is

A thin Python client and command-line tool for calling the hosted XFMS API at xfms.xpansion.dev. About 250 lines of code. It turns a one-liner into a ranked LLM shortlist.

What this repository is not: the recommender engine, the score catalog, or the ingestion pipeline. Those run on the hosted service. The methodology behind every pick is published in full at docs/methodology.md — every claim there maps to code that runs at request time, you just don't run it locally.

What you say:

"Fixing bugs in our Python codebase."

What you get:

Top picks:
   1. 0.842  GPT-5.5                 (openai/gpt-5.5)         via OpenAI
   2. 0.811  Claude Opus 4.7         (anthropic/claude-opus-4.7) via Anthropic
   3. 0.798  Gemini 3.1 Pro Preview  (google/gemini-3.1-pro-preview) via Google

Inferred quality weights from your purpose:
  • structured_output_reliability  42.0%  ← BigCodeBench, Aider Polyglot
  • instruction_following          28.0%  ← LiveBench, Tau-Bench
  • factuality                     20.0%  ← MMLU, GPQA
  • coherence                      10.0%  ← LongBench

─── Explanation ───
Picked GPT-5.5: strong on structured output and instruction following —
the two dimensions that dominate code-edit work. Beats Claude on Aider
Polyglot and matches it on LiveBench reasoning, at roughly 60% of the
per-token cost.

Want to see how the picks actually behave on your kind of query? Add --ab:

─── A/B probe ───
Ran 5 test queries against the top picks.
  • GPT-4o-mini  avg_latency=5579 ms  total_cost=$0.00156  successes=5
  • GPT-5.5      avg_latency=8190 ms  total_cost=$0.07640  successes=5
  • GPT-5.4      avg_latency=8783 ms  total_cost=$0.03493  successes=5

Commentary:
  Across 5 real test queries, GPT-4o-mini was both cheapest ($0.0016 total)
  and fastest (5579 ms avg). Clear winner — 98% cheaper and 36% faster
  than the slowest pick.

What XFMS does for you

Beyond ranking, XFMS gives you these levers to honor what you actually meant:

--primary <branch> — sacrosanct user preference. When you say "cheapest model, period", the engine switches to lexicographic ranking: cost wins, other dimensions only break ties. No more weighted-blend surprises.
--ab — runs the top 3 picks against 5 real test queries (expanding to 10 or 15 if results trade off) and surfaces commentary on who won what. Grounds the recommendation in actual model behavior, not just benchmarks.
--strict-priorities — when you name two co-equal drivers ("cheap but high quality too"), the engine refuses to silently blend; it asks you which way to break the tie.
Latent-requirement suggestions — engine surfaces capabilities you didn't ask for but probably need (streaming for real-time chat, vision for OCR), so you don't get burned by what you didn't know.
Deterministic by design — every internal model call is content- cached; same input always returns the same answer. The "I got different picks for the same question" failure mode is gone.

Install — add one URL to your AI client

XFMS is a hosted Model Context Protocol (MCP) server. There is no package to install on your machine. You point your AI assistant — Claude Code, Cursor, Continue, Cline, or any MCP-speaking host — at the URL below and the tools appear inside your chat:

https://xfms.xpansion.dev/mcp/

That's it. The three discovery tools — rank, pick, discover — are free and work with no key. Your AI assistant does the small internal thinking work; we pay for nothing on your behalf, and you pay for nothing either.

The two live-probe tools — compare and benchmark — actually run test queries against the real candidate models on OpenRouter. That inference cost rides with you, so they require your own OpenRouter key in an X-OpenRouter-Key header. You probably already have one — you're an OpenRouter model picker's target audience. If not, grab one at openrouter.ai/keys.

The key travels encrypted to our server, is never logged, never persisted — used once per request and dropped. Same security posture as every other API key in your MCP config.

Concrete install snippets for each AI client are in the next section.

Use it inside Claude Code, Cursor, or any MCP client

XFMS speaks Model Context Protocol (MCP) — the standard your AI assistant uses to call external tools. Once connected, you can ask the assistant "which model should I use for OCR on shipping manifests?" and it calls XFMS for you. No leaving the chat. No copy-pasting between windows.

Hosted install — one line, no install required

The XFMS engine hosts the MCP server itself at https://xfms.xpansion.dev/mcp/. Two install shapes depending on which tools you want.

The free three — `rank`, `pick`, `discover`

Just point your client at the URL. No key, no signup.

Claude Code:

claude mcp add xfms --transport http https://xfms.xpansion.dev/mcp/

Cursor (~/.cursor/mcp.json) — or paste through Settings → MCP:

{
  "mcpServers": {
    "xfms": {
      "url": "https://xfms.xpansion.dev/mcp/"
    }
  }
}

All five tools — adds `compare` and `benchmark`

These two run real test queries against the actual candidate models on OpenRouter, so they require your OpenRouter key in an X-OpenRouter-Key header. Same install, one extra line:

Claude Code:

claude mcp add xfms --transport http https://xfms.xpansion.dev/mcp/ \
  --header "X-OpenRouter-Key: sk-or-v1-your-key-here"

Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "xfms": {
      "url": "https://xfms.xpansion.dev/mcp/",
      "headers": {
        "X-OpenRouter-Key": "sk-or-v1-your-key-here"
      }
    }
  }
}

Continue / Cline / any other MCP host — same URL + headers pattern; check your host's docs for the JSON config shape.

Don't have an OpenRouter key yet? Grab one at openrouter.ai/keys. Restart your client, then ask it:

"Use XFMS to pick a model for summarizing long legal contracts."

Five tools are available to the assistant: rank (a ranked shortlist), pick (the single best pick), discover (which quality dimensions matter for your purpose, without ranking), compare (live A/B between models you've already named), and benchmark (live A/B against the engine's top 3 picks). compare and benchmark require the X-OpenRouter-Key header above; the other three don't.

Override the system's inference

If you know which quality dimension matters most for your task, say so — your preference always wins over the LLM's inference:

xfms rank "code refactor" --leaf-priorities "structured_output_reliability=1.0,factuality=0.5"

xfms.rank(
    "code refactor",
    leaf_priorities={"structured_output_reliability": 1.0, "factuality": 0.5},
)

How XFMS picks — the four principles

Methodology in full at docs/methodology.md. The short version:

No provider self-reports. Every score comes from a third-party evaluator running the same protocol across every model.
No single-source dependence. Eight independent benchmark sources contribute today; no single leaderboard determines a pick.
User intent beats LLM inference. The system infers weights from your purpose, but your stated leaf_priorities always override the inference.
Honest gaps over invented signal. Missing data is recorded as missing — no interpolation, no synthetic scores. Coverage gaps surface on every pick.

Part of the Xpansion Framework

XFMS doesn't stand alone — it's the model-selection layer of the Xpansion Framework.

The Xpansion thesis

Humans communicate with intent compressed by contextual experience. AI simply predicts patterns in language. Xpansion is the execution layer that bridges them.

Every sentence a human types carries lifetimes of context that the speaker assumes the other side will decompress — what counts as "good enough," which constraints are non-negotiable, what failures last month taught them, what their house style demands. AI doesn't share that context. It predicts patterns in language, filling in the gaps with whatever's plausible to its training data. The result reads as plausible but isn't intent-honoring: sessions lose context, security holes ship silently, contracts break without warning, and there's no way to verify that what was built actually matches what was asked for. They don't know what they don't know, and neither does AI.

Xpansion closes the gap. It decompresses finite intent upfront, enforces it through code-driven AI behavior, and delivers binary-verified results against the intent across persistent memory that survives every session boundary.

Model Source — the model-selection enforcement

When you say "the best model for this task", you're compressing a lot: what counts as best depends on whether you care about factual reasoning or coherent prose, whether cost matters more than latency, whether you actually need vision or just text, whether the call has to stream, whether a particular benchmark dominates your real workload. AI on its own predicts the pattern — what model do most people pick for queries that look like this? — and gives you a plausible-sounding answer that's often wrong for you.

XFMS does the decompression. It takes your stated purpose, infers which benchmarks actually map to it, honors your stated primary preferences without silently overriding them, surfaces the latent requirements you didn't know to ask about (streaming for real-time chat, vision for OCR), and probes the top picks against your real query to verify the recommendation — not predict it. Then it tells you, in plain English, why it picked what it picked.

One module per enforcement

The rest of the Xpansion stack enforces the same decompress- enforce-verify contract for different parts of the work:

Dispatch (Dispatch) — runtime task router. Watches what kind of work you're doing and routes it to the right tool.
Finite Intent (XFFI) — turns "build me a feature" into a finite spec with binary terminals before any code gets written. Stops scope drift at the source.
Boundary Auditor (XFBA) — checks every code edit against contracts. Stops broken function signatures and mismatched types from ever reaching production.
Systemic Impact Analysis (XSIA) — maps the blast radius of a proposed change before it lands. Tells you what else might break.
Token Conservation (XFTC) — manages how much of the conversation has to stay in the assistant's working memory. Prevents context loss in long sessions.
Execution Audit (XFXA) — verifies every promise from the spec was actually met before declaring a task done. The final binary check.
Memory Tree (XFMT) — session snapshots that stay searchable across conversations. Your assistant remembers what you decided last week.
Security Auditor (XFSA) — static + AI security scanning on every code edit. Catches secrets, injection paths, and unsafe patterns before they ship.

The full picture, with the rest of the modules, lives at xpansion.dev.

Xpansion is in pre-signup right now. Early access and founding licenses are open at xpansion.dev. XFMS is the first piece to ship public + free — the rest follow.

Local development

git clone https://github.com/VisionAIrySE/XFMS.git
cd XFMS
python3 -m venv .venv
.venv/bin/pip install -e .[dev]
.venv/bin/python -m pytest tests/ -v

The tests mock the HTTP layer so they run offline — no API keys needed to develop.

License

This client library is MIT-licensed. The recommender engine, the catalog, and the ingestion pipeline are not open source. See NOTICE for the patent reservation language and the relationship to the broader Xpansion Framework IP.

Contact

Russ Wright — russ@visionairy.biz
Xpansion Framework — xpansion.dev
Security disclosures — see SECURITY.md

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

XFMS — Xpansion Framework Model Source

Pick the right LLM for your task — without the Twitter vibes.

XFMS is one module of the Xpansion Framework — a unified architecture for governing AI-assisted work.

What this repository is

A thin Python client and command-line tool for calling the hosted XFMS API at xfms.xpansion.dev. About 250 lines of code. It turns a one-liner into a ranked LLM shortlist.

What you say:

"Fixing bugs in our Python codebase."

What you get:

Top picks:
   1. 0.842  GPT-5.5                 (openai/gpt-5.5)         via OpenAI
   2. 0.811  Claude Opus 4.7         (anthropic/claude-opus-4.7) via Anthropic
   3. 0.798  Gemini 3.1 Pro Preview  (google/gemini-3.1-pro-preview) via Google

Inferred quality weights from your purpose:
  • structured_output_reliability  42.0%  ← BigCodeBench, Aider Polyglot
  • instruction_following          28.0%  ← LiveBench, Tau-Bench
  • factuality                     20.0%  ← MMLU, GPQA
  • coherence                      10.0%  ← LongBench

─── Explanation ───
Picked GPT-5.5: strong on structured output and instruction following —
the two dimensions that dominate code-edit work. Beats Claude on Aider
Polyglot and matches it on LiveBench reasoning, at roughly 60% of the
per-token cost.

Want to see how the picks actually behave on your kind of query? Add --ab:

─── A/B probe ───
Ran 5 test queries against the top picks.
  • GPT-4o-mini  avg_latency=5579 ms  total_cost=$0.00156  successes=5
  • GPT-5.5      avg_latency=8190 ms  total_cost=$0.07640  successes=5
  • GPT-5.4      avg_latency=8783 ms  total_cost=$0.03493  successes=5

Commentary:
  Across 5 real test queries, GPT-4o-mini was both cheapest ($0.0016 total)
  and fastest (5579 ms avg). Clear winner — 98% cheaper and 36% faster
  than the slowest pick.

What XFMS does for you

Beyond ranking, XFMS gives you these levers to honor what you actually meant:

--primary <branch> — sacrosanct user preference. When you say "cheapest model, period", the engine switches to lexicographic ranking: cost wins, other dimensions only break ties. No more weighted-blend surprises.
--ab — runs the top 3 picks against 5 real test queries (expanding to 10 or 15 if results trade off) and surfaces commentary on who won what. Grounds the recommendation in actual model behavior, not just benchmarks.
--strict-priorities — when you name two co-equal drivers ("cheap but high quality too"), the engine refuses to silently blend; it asks you which way to break the tie.
Latent-requirement suggestions — engine surfaces capabilities you didn't ask for but probably need (streaming for real-time chat, vision for OCR), so you don't get burned by what you didn't know.
Deterministic by design — every internal model call is content- cached; same input always returns the same answer. The "I got different picks for the same question" failure mode is gone.

Install — add one URL to your AI client

https://xfms.xpansion.dev/mcp/

The key travels encrypted to our server, is never logged, never persisted — used once per request and dropped. Same security posture as every other API key in your MCP config.

Concrete install snippets for each AI client are in the next section.

Use it inside Claude Code, Cursor, or any MCP client

Hosted install — one line, no install required

The XFMS engine hosts the MCP server itself at https://xfms.xpansion.dev/mcp/. Two install shapes depending on which tools you want.

The free three — `rank`, `pick`, `discover`

Just point your client at the URL. No key, no signup.

Claude Code:

claude mcp add xfms --transport http https://xfms.xpansion.dev/mcp/

Cursor (~/.cursor/mcp.json) — or paste through Settings → MCP:

{
  "mcpServers": {
    "xfms": {
      "url": "https://xfms.xpansion.dev/mcp/"
    }
  }
}

All five tools — adds `compare` and `benchmark`

These two run real test queries against the actual candidate models on OpenRouter, so they require your OpenRouter key in an X-OpenRouter-Key header. Same install, one extra line:

Claude Code:

claude mcp add xfms --transport http https://xfms.xpansion.dev/mcp/ \
  --header "X-OpenRouter-Key: sk-or-v1-your-key-here"

Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "xfms": {
      "url": "https://xfms.xpansion.dev/mcp/",
      "headers": {
        "X-OpenRouter-Key": "sk-or-v1-your-key-here"
      }
    }
  }
}

Continue / Cline / any other MCP host — same URL + headers pattern; check your host's docs for the JSON config shape.

Don't have an OpenRouter key yet? Grab one at openrouter.ai/keys. Restart your client, then ask it:

"Use XFMS to pick a model for summarizing long legal contracts."

Override the system's inference

If you know which quality dimension matters most for your task, say so — your preference always wins over the LLM's inference:

xfms rank "code refactor" --leaf-priorities "structured_output_reliability=1.0,factuality=0.5"

xfms.rank(
    "code refactor",
    leaf_priorities={"structured_output_reliability": 1.0, "factuality": 0.5},
)

How XFMS picks — the four principles

Methodology in full at docs/methodology.md. The short version:

No provider self-reports. Every score comes from a third-party evaluator running the same protocol across every model.
No single-source dependence. Eight independent benchmark sources contribute today; no single leaderboard determines a pick.
User intent beats LLM inference. The system infers weights from your purpose, but your stated leaf_priorities always override the inference.
Honest gaps over invented signal. Missing data is recorded as missing — no interpolation, no synthetic scores. Coverage gaps surface on every pick.

Part of the Xpansion Framework

XFMS doesn't stand alone — it's the model-selection layer of the Xpansion Framework.

The Xpansion thesis

Humans communicate with intent compressed by contextual experience. AI simply predicts patterns in language. Xpansion is the execution layer that bridges them.

Model Source — the model-selection enforcement

One module per enforcement

The rest of the Xpansion stack enforces the same decompress- enforce-verify contract for different parts of the work:

Dispatch (Dispatch) — runtime task router. Watches what kind of work you're doing and routes it to the right tool.
Finite Intent (XFFI) — turns "build me a feature" into a finite spec with binary terminals before any code gets written. Stops scope drift at the source.
Boundary Auditor (XFBA) — checks every code edit against contracts. Stops broken function signatures and mismatched types from ever reaching production.
Systemic Impact Analysis (XSIA) — maps the blast radius of a proposed change before it lands. Tells you what else might break.
Token Conservation (XFTC) — manages how much of the conversation has to stay in the assistant's working memory. Prevents context loss in long sessions.
Execution Audit (XFXA) — verifies every promise from the spec was actually met before declaring a task done. The final binary check.
Memory Tree (XFMT) — session snapshots that stay searchable across conversations. Your assistant remembers what you decided last week.
Security Auditor (XFSA) — static + AI security scanning on every code edit. Catches secrets, injection paths, and unsafe patterns before they ship.

The full picture, with the rest of the modules, lives at xpansion.dev.

Xpansion is in pre-signup right now. Early access and founding licenses are open at xpansion.dev. XFMS is the first piece to ship public + free — the rest follow.

Local development

git clone https://github.com/VisionAIrySE/XFMS.git
cd XFMS
python3 -m venv .venv
.venv/bin/pip install -e .[dev]
.venv/bin/python -m pytest tests/ -v

The tests mock the HTTP layer so they run offline — no API keys needed to develop.

License

Contact

Russ Wright — russ@visionairy.biz
Xpansion Framework — xpansion.dev
Security disclosures — see SECURITY.md

XFMS — Model Source

Tools

XFMS — Xpansion Framework Model Source

What this repository is

What you say:

What you get:

What XFMS does for you

Install — add one URL to your AI client

Use it inside Claude Code, Cursor, or any MCP client

Hosted install — one line, no install required

The free three — rank, pick, discover

All five tools — adds compare and benchmark

Override the system's inference

How XFMS picks — the four principles

Part of the Xpansion Framework

The Xpansion thesis

Model Source — the model-selection enforcement

One module per enforcement

Local development

License

Contact

XFMS — Model Source

Tools

XFMS — Xpansion Framework Model Source

What this repository is

What you say:

What you get:

What XFMS does for you

Install — add one URL to your AI client

Use it inside Claude Code, Cursor, or any MCP client

Hosted install — one line, no install required

The free three — rank, pick, discover

All five tools — adds compare and benchmark

Override the system's inference

How XFMS picks — the four principles

Part of the Xpansion Framework

The Xpansion thesis

Model Source — the model-selection enforcement

One module per enforcement

Local development

License

Contact

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers

The free three — `rank`, `pick`, `discover`

All five tools — adds `compare` and `benchmark`

The free three — `rank`, `pick`, `discover`

All five tools — adds `compare` and `benchmark`