If you want Claude to actually run ML experiments instead of just browsing past results, this is the missing piece. It exposes five tools that let agents profile CSVs, define classification tasks, tune XGBoost and LightGBM models with Optuna, and generate markdown reports with feature importance. Everything persists to SQLite so the agent can compare runs across sessions. You tell Claude "train a model on titanic.csv to predict survival" and it handles the full loop: preprocessing, stratified cross validation, hyperparameter search, evaluation. Regression and time series forecasting are on the roadmap. Useful when you want to prototype models conversationally without writing boilerplate.
Let AI agents run real ML experiments end-to-end.
An MCP server that gives Claude (or any MCP-aware AI agent) the ability to profile a CSV, define an ML task, tune XGBoost and LightGBM with Optuna, and produce a markdown report with feature importance — all from natural language.
The existing ML-related MCP servers wrap MLflow, ZenML, or Weights & Biases
and expose them as read-only — agents can browse experiment history but
can't actually run anything. mcp-ml-lab fills the gap: it lets agents
execute the full experimentation loop from a user's natural-language request.
A user typing "train a model on titanic.csv to predict survival" should not
need to know what XGBoost is, what cross-validation is, or how to write a
hyperparameter search. The agent handles all of that — mcp-ml-lab is the
tools layer that makes it possible.
pip install mcp-ml-lab
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"ml-lab": {
"command": "mcp-ml-lab"
}
}
}
Restart Claude Desktop. The five tools below are now available.
Try these in Claude Desktop with mcp-ml-lab connected:
| Tool | What it does |
|---|---|
inspect_data | Profile a CSV — shape, dtypes, nulls, summary stats, class balance |
define_task | Register an ML task (CSV + target + classification/regression) |
run_experiment | Train one or more models, optionally tuning with Optuna |
get_results | Markdown report with metrics, hyperparameters, feature importance |
compare_runs | Side-by-side comparison of multiple experiments |
Each tool's full signature is in its docstring; they self-document to the LLM.
Claude Desktop ───MCP/stdio─── mcp-ml-lab server
│
├── data.py CSV loading, schema inference, preprocessor
├── trainers/ Pluggable XGBoost + LightGBM adapters
├── search.py Stratified CV + Optuna TPE tuning
├── metrics.py Accuracy, F1, AUC, log loss
├── storage.py SQLite via SQLAlchemy 2.0
└── reporting.py Markdown report generation
All experiments and trials are persisted to ~/.mcp-ml-lab/store.db so an
agent can refer back to runs across sessions.
Full design notes in ARCHITECTURE.md.
v0.1.0 ships classification with XGBoost and LightGBM. Planned for v0.2.0+:
Issues and PRs welcome.
git clone https://github.com/rohithraju-ops/mcp-ml-lab.git
cd mcp-ml-lab
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest -v
Local debugging is easiest with the MCP Inspector:
npx @modelcontextprotocol/inspector mcp-ml-lab
MIT.