Connects Claude to MLflow tracking servers (OSS or Databricks) with 82 tools spanning experiments, runs, model registry, MLflow 3 logged models, traces, assessments, webhooks, and prompt optimization jobs. The big win is aggregation tools like summarize-experiment that fold metric stats and top runs into one response instead of five round trips, plus extractFields projection to cut token overhead. Ships with four workflow prompts (debug failed traces, promote best run, compare top runs) and six resource URIs for inline fetching. Use MLFLOW_TOOLS to scope down to just the categories you need. Defaults to read only mode until you flip MLFLOW_ALLOW_WRITE. If you're doing MLflow 3 workflows or need Databricks trace attachments, this is the only third party MCP that ships them.
The widest-coverage MLflow MCP — including MLflow 3 traces, prompt-optimization, webhooks, and Databricks trace attachments that no other MCP exposes.
82 tools across experiments, runs, registry, logged models, traces, assessments, webhooks, prompt-optimization. Aggregation tools (
summarize-experiment,summarize-run) fold 3–5 round-trips into one structured response with already-fetched metric stats.
list-trace-attachments, get-trace-attachment — Databricks MLflow only; OSS returns 404).summarize-experiment returns experiment + topN runs + metric stats (min/max/mean) in one call from already-fetched data, zero extra round-trips. summarize-run dedups metricHistory.history.*.key (~100KB savings on 4k-point series).debug-failed-traces, promote-best-run, compare-top-runs, annotate-trace-quality. Workflow templates the model invokes directly.mlflow://run/{runId}, mlflow://experiment/{expId}, mlflow://run/{runId}/artifacts, mlflow://experiment/{expId}/runs, mlflow://registered-model/{name}/versions, mlflow://trace/{traceId}.extractFields projection on get-run / search-runs / search-traces / get-trace / fat reads, MLFLOW_TOOLS / MLFLOW_DISABLE 8 categories, search-tools meta-tool.compare-runs renders as a side-by-side card on ChatGPT clients (run summary + metric/param tables with diff highlight) via _meta["openai/outputTemplate"]. Claude clients receive the same JSON content.MCP_TRANSPORT=http for ChatGPT Apps SDK or remote clients (Bearer auth via MCP_HTTP_TOKEN).Connect the server to Claude Desktop or Claude Code, then paste any of these:
customer-churn-v3 experiment, find the run with the highest val_accuracy. Show its hyperparameters and metric history."status=ERROR from the last 24h in experiment 12. Group the failures by exception type and surface the 3 most common."validation_loss. Show differing hyperparameters in a table."recommendation_v2 registered model with the champion alias. Show its training metrics + lineage to the source run."tr-abc123. Highlight slow spans and any failed feedback annotations." (Add list-trace-attachments on Databricks workspaces.)Official mlflow[mcp] | kkruglik/mlflow-mcp | @us-all/mlflow-mcp (this) | |
|---|---|---|---|
| Tool count | ~9 (trace-only) | ~25 | 78 |
| MLflow 3 LoggedModel | ❌ | ✅ | ✅ |
| Trace attachments | ❌ | ❌ | ✅ Databricks only |
| Prompt-optimization-jobs | ❌ | ❌ | ✅ |
| Webhooks | ❌ | ❌ | ✅ |
| Aggregation tools | ❌ | ❌ | ✅ summarize-experiment, summarize-run |
| MCP Prompts | ❌ | ✅ | ✅ |
| MCP Resources | ❌ | ❌ | ✅ 6 URIs |
| Auth | Databricks SDK | Bearer / basic | Bearer / basic |
| Transport | stdio | stdio | stdio |
The official mlflow[mcp] is bundled inside MLflow itself and intentionally trace-narrow. Use it for quick managed-MLflow trace inspection. Use this server for end-to-end coverage, especially MLflow 3 entities, prompt-optimization workflows, and aggregation-driven AI debugging.
{
"mcpServers": {
"mlflow": {
"command": "npx",
"args": ["-y", "@us-all/mlflow-mcp"],
"env": {
"MLFLOW_TRACKING_URI": "http://localhost:5000"
}
}
}
}
claude mcp add mlflow -s user \
-e MLFLOW_TRACKING_URI=http://localhost:5000 \
-- npx -y @us-all/mlflow-mcp
docker run --rm -i \
-e MLFLOW_TRACKING_URI=http://your-host:5000 \
ghcr.io/us-all/mlflow-mcp-server
git clone https://github.com/us-all/mlflow-mcp-server.git
cd mlflow-mcp-server && pnpm install && pnpm build
node dist/index.js
| Variable | Required | Default | Description |
|---|---|---|---|
MLFLOW_TRACKING_URI | ✅ | — | MLflow tracking URL (http://localhost:5000, Databricks workspace URL, etc.) |
MLFLOW_TRACKING_TOKEN | ❌ | — | Bearer token. Use for Databricks PAT (dapi…) |
MLFLOW_TRACKING_USERNAME | ❌ | — | Basic-auth username (alternative to token) |
MLFLOW_TRACKING_PASSWORD | ❌ | — | Basic-auth password |
MLFLOW_EXPERIMENT_ID | ❌ | — | Default experiment ID for tools that accept it implicitly |
MLFLOW_ALLOW_WRITE | ❌ | false | Set true to enable mutations (create/update/delete) |
MLFLOW_TOOLS | ❌ | — | Comma-sep allowlist of categories. Biggest token saver. |
MLFLOW_DISABLE | ❌ | — | Comma-sep denylist. Ignored when MLFLOW_TOOLS is set. |
MCP_TRANSPORT | ❌ | stdio | http to enable Streamable HTTP transport |
MCP_HTTP_TOKEN | conditional | — | Bearer token. Required when MCP_TRANSPORT=http |
MCP_HTTP_PORT | ❌ | 3000 | HTTP listen port |
MCP_HTTP_HOST | ❌ | 127.0.0.1 | HTTP bind host (DNS rebinding protection auto-enabled for localhost) |
MCP_HTTP_SKIP_AUTH | ❌ | false | Skip Bearer auth — e.g. behind a reverse proxy that handles it |
Categories (8): experiments, runs, registry, logged-models, traces, assessments, webhooks, prompts.
When MCP_TRANSPORT=http: POST /mcp (Bearer-auth JSON-RPC) + GET /health (public liveness).
For Databricks-hosted MLflow:
MLFLOW_TRACKING_URI=https://<workspace>.cloud.databricks.com
MLFLOW_TRACKING_TOKEN=dapi... # PAT or service-principal token
The MLflow REST API path (/api/2.0/mlflow/...) is identical between OSS and Databricks. Bearer auth handles both PAT and service-principal flows.
| Scenario | Tools | Schema tokens | vs default |
|---|---|---|---|
| default (all categories) | 78 | 9,200 | — |
typical (MLFLOW_TOOLS=experiments,runs,registry,traces) | 54 | 5,900 | −36% |
narrow (MLFLOW_TOOLS=experiments,runs) | 27 | 3,200 | −66% |
Plus extractFields on get-run / search-runs / search-traces / get-trace / summarize-experiment — caller can scope response fields per call.
By default, all writes are blocked. The following require MLFLOW_ALLOW_WRITE=true:
create-experiment, update-experiment, delete-experiment, restore-experiment, set-experiment-tag, delete-experiment-tag, create-run, update-run, delete-run, restore-run, log-metric, log-param, log-batch, log-inputs, set-run-tag, delete-run-tag, create-registered-model, rename-registered-model, update-registered-model, delete-registered-model, plus all model-version, logged-model, trace, assessment, webhook, and prompt-optimization writes.
search-traces.maxResults is clamped to 500. MLflow 3.12+ rejects per-page max_results > 500 with INVALID_PARAMETER_VALUE. For larger result sets, loop on nextPageToken — total trace count is unbounded.list-trace-attachments / get-trace-attachment call routes that OSS MLflow (verified through 3.12.0) returns 404 for. Tool descriptions surface this; calls against OSS return a structured MlflowError.search-traces.maxResults cap applies per page, not per call — pagination still gets you the full set.Workflow templates available via MCP prompts/list:
debug-failed-traces — find failed traces, group failure modespromote-best-run — find best run, register, set champion aliascompare-top-runs — top-N comparison by metricannotate-trace-quality — guided feedback annotation loopURI-based read-only access:
mlflow://run/{runId}, mlflow://experiment/{expId}, mlflow://experiment-by-name/{name}, mlflow://registered-model/{name}, mlflow://model-version/{name}/{version}, mlflow://trace/{traceId}, mlflow://run/{runId}/artifacts, mlflow://experiment/{expId}/runs, mlflow://registered-model/{name}/versions.
8 categories. Use search-tools to discover at runtime; full list collapsed below.
get-run, search-runs, search-traces, get-trace, and summarize-experiment accept extractFields for response slicing.
create-experiment, search-experiments, get-experiment, get-experiment-by-name, update-experiment, delete-experiment, restore-experiment, set-experiment-tag, delete-experiment-tag
create-run, get-run, search-runs, update-run, delete-run, restore-run, log-metric, log-param, log-batch, log-inputs, get-metric-history, set-run-tag, delete-run-tag, list-artifacts, get-best-run, compare-runs, search-runs-by-tags, summarize-run (aggregation)
create-registered-model, get-registered-model, search-registered-models, rename-registered-model, update-registered-model, delete-registered-model, get-latest-model-versions, set-registered-model-tag, delete-registered-model-tag, set-registered-model-alias, delete-registered-model-alias, get-model-version-by-alias
create-model-version, get-model-version, search-model-versions, update-model-version, delete-model-version, transition-model-version-stage, get-model-version-download-uri, set-model-version-tag, delete-model-version-tag
create-logged-model, search-logged-models, get-logged-model, finalize-logged-model, delete-logged-model, set-logged-model-tags, delete-logged-model-tag, log-logged-model-params
search-traces, get-trace, get-trace-info, delete-traces, set-trace-tag, delete-trace-tag, list-trace-attachments, get-trace-attachment
log-feedback, log-expectation, get-assessment, update-assessment, delete-assessment
create-webhook, list-webhooks, get-webhook, update-webhook, delete-webhook, test-webhook
create-prompt-optimization-job, get-prompt-optimization-job, search-prompt-optimization-jobs, cancel-prompt-optimization-job, delete-prompt-optimization-job
summarize-experiment, summarize-run — fold 3–5 round-trips into one structured response with caveats array.
search-tools — query other tools by keyword; always enabled.
# 1. start MLflow (UI at http://localhost:5050)
docker compose up -d mlflow
# 2. seed demo experiment, runs, registered model, traces
docker compose run --rm seed
# 3a. probe the MCP server locally against the compose'd MLflow
MLFLOW_TRACKING_URI=http://localhost:5050 \
MLFLOW_EXPERIMENT_ID=1 \
MLFLOW_ALLOW_WRITE=true \
node dist/index.js
# 3b. or run inside compose (stdio)
docker compose run --rm mcp
# tear down
docker compose down -v
./dev/seed.py is idempotent — skips if demo experiment already has runs.
Claude → MCP stdio → src/index.ts → src/tools/*.ts → MlflowClient (fetch) → MLflow REST API
Built on @us-all/mcp-toolkit:
extractFields — token-efficient response projectionsaggregate(fetchers, caveats) — fan-out helper for summarize-experimentcreateWrapToolHandler — Bearer/basic credential redaction + MlflowError extractionsearch-tools meta-toolTargets MLflow 3.5.1+ (uses v3 traces/assessments REST). Dev compose pinned to MLflow 3.12.0 (multimodal trace attachments + paginated trace search).
Node.js 22+ • TypeScript strict ESM • pnpm • @modelcontextprotocol/sdk • zod • dotenv • vitest.
MLFLOW_TRACKING_URI*MLflow tracking server URI (e.g. http://localhost:5050, https://mlflow.example.com).
MLFLOW_TRACKING_TOKENsecretBearer token for authenticated MLflow servers (e.g. Databricks).
MLFLOW_TRACKING_USERNAMEBasic-auth username (alternative to token).
MLFLOW_TRACKING_PASSWORDsecretBasic-auth password (alternative to token).
MLFLOW_EXPERIMENT_IDDefault experiment ID for run-creation tools.
MLFLOW_TOOLSComma-separated category allowlist. Default: all categories enabled.
MLFLOW_DISABLEComma-separated category disablelist.
MLFLOW_ALLOW_WRITEdefault: falseSet to 'true' to enable write/destructive tools. Default read-only.