If you need Claude to reason through complex problems with multiple agents and want a full audit trail, this is your server. It exposes two MCP tools using the Code Mode pattern: thoughtbox_search to query available operations and thoughtbox_execute to run JavaScript against a unified SDK. Inside that SDK you get 28 operations for multi-agent workflows (claim problems, propose solutions, peer review, reach consensus), structured thinking patterns (forward, backward, branching, revision), and a persistent knowledge graph. Every thought gets numbered, timestamped, and stored in a reasoning ledger you can visualize in the built-in Observatory UI at localhost:1729. Runs entirely local in Docker, data stays in ~/.thoughtbox. Built for scenarios where you need to see how conclusions were reached.
Public tool metadata for what this MCP can expose to an agent.
thoughtbox_gatewaySingle entry point for all Thoughtbox operations. Routes to all handlers: init, cipher, thought, notebook, session, mental_models, deep_analysis. Operations: - get_state, list_sessions, navigate, load_context, start_new, list_roots, bind_root (init) - cipher (loads notation sy...2 paramsSingle entry point for all Thoughtbox operations. Routes to all handlers: init, cipher, thought, notebook, session, mental_models, deep_analysis. Operations: - get_state, list_sessions, navigate, load_context, start_new, list_roots, bind_root (init) - cipher (loads notation sy...
argsobjectoperationstringget_state · list_sessions · navigate · load_context · start_new · list_rootsobservability_gatewayQuery system observability data including metrics, health status, active sessions, and alerts. No session initialization required - connect and query directly. Operations: - health: System and service health check - metrics: Instant Prometheus query (PromQL) - metrics_range: R...2 paramsQuery system observability data including metrics, health status, active sessions, and alerts. No session initialization required - connect and query directly. Operations: - health: System and service health check - metrics: Instant Prometheus query (PromQL) - metrics_range: R...
argsobjectoperationstringhealth · metrics · metrics_range · sessions · session_info · alertsthoughtbox_hubMulti-agent collaboration hub for coordinated reasoning. Operations: - register: Register as an agent (args: { name: string, profile?: "MANAGER"|"ARCHITECT"|"DEBUGGER"|"SECURITY"|"RESEARCHER"|"REVIEWER" }) - whoami: Get current agent identity - create_workspace: Create a colla...2 paramsMulti-agent collaboration hub for coordinated reasoning. Operations: - register: Register as an agent (args: { name: string, profile?: "MANAGER"|"ARCHITECT"|"DEBUGGER"|"SECURITY"|"RESEARCHER"|"REVIEWER" }) - whoami: Get current agent identity - create_workspace: Create a colla...
argsobjectoperationstringregister · whoami · create_workspace · join_workspace · list_workspaces · workspace_statusMulti-agent collaborative reasoning that's auditable. Thoughtbox is an MCP server for structured, multi-agent reasoning, with a companion web app for workspace and inspection flows. Every step is recorded as a structured thought in a persistent reasoning ledger that can be visualized, exported, and analyzed.
Runtime modes: Local development can use filesystem or in-memory storage. Deployed mode uses Supabase-backed storage, and the current production MCP server runs on Cloud Run.
Observatory UI showing a reasoning session with 14 thoughts and a branch exploration (purple nodes 13-14) forking from thought 5.
Thoughtbox exposes exactly two MCP tools using the Code Mode pattern:
thoughtbox_search — Write JavaScript to query the operation/prompt/resource catalog. The LLM has full programmatic filtering power over the catalog.thoughtbox_execute — Write JavaScript using the tb SDK to chain operations. Access thoughts, sessions, knowledge, notebooks, hub, observability, and protocol tools through a unified namespace.Workflow: search to discover available operations, then execute code against them. Use console.log() for debugging — output is captured in response logs.
This replaces per-operation tool registration with a two-tool surface that scales without context window bloat.
The Hub is the coordination layer. Agents register with role-specific profiles, join shared workspaces, and work through a structured problem-solving workflow — all via thoughtbox_execute.
The workflow: register → create workspace → create problem → claim → work → propose solution → peer review → merge → consensus
Workspace primitives:
Agent Profiles: MANAGER, ARCHITECT, DEBUGGER, SECURITY, RESEARCHER, REVIEWER — each provides domain-specific mental models and behavioral priming.
28 operations across identity, workspace management, problems, proposals, consensus, channels, and status reporting.
Every thought is a node in a graph — numbered, timestamped, linked to its predecessors, and persisted across sessions. This creates an auditable trail of how conclusions were reached.
Agents can think forward, plan backward, branch into parallel explorations, revise earlier conclusions, and request autonomous critique via MCP sampling. Each pattern is a first-class operation:
| Pattern | Description | Use Case |
|---|---|---|
| Forward | Sequential 1→2→3→N progression | Exploration, discovery, open-ended analysis |
| Backward | Start at goal (N), work back to start (1) | Planning, system design, working from known goals |
| Branching | Fork into parallel explorations (A, B, C...) | Comparing alternatives, A/B scenarios |
| Revision | Update earlier thoughts with new information | Error correction, refined understanding |
| Critique | Autonomous LLM review via MCP sampling | Self-checking, quality gates |
Each thought carries a semantic thoughtType (reasoning, decision_frame, action_report, belief_snapshot, assumption_update, context_snapshot, progress) that classifies what kind of thought it is, orthogonal to the process pattern used.
See the Patterns Cookbook for comprehensive examples.
The Observatory is a built-in web UI at http://localhost:1729 for watching reasoning unfold live.
The full observability stack includes OpenTelemetry tracing, Prometheus metrics, and Grafana dashboards.
Knowledge Graph — Persistent memory across sessions. Capture insights, concepts, workflows, and decisions as typed entities with typed relations (BUILDS_ON, CONTRADICTS, SUPERSEDES, etc.) and visibility controls (public, agent-private, team-private).
Notebooks — Interactive literate programming combining documentation with executable JavaScript/TypeScript in isolated environments.
Thoughtbox is currently optimized for Claude Code. We are actively working on supporting additional MCP clients. Due to variation in capability support across the MCP ecosystem — server features (prompts, resources, tools), client features (roots, sampling, elicitation), and behaviors like
listChangednotifications — we implement custom adaptations for many clients.
If you're using a client other than Claude Code and encounter issues, please open an issue describing your client and the problem.
Thoughtbox runs as a Docker-based MCP server. It requires Docker and Docker Compose.
git clone https://github.com/Kastalien-Research/thoughtbox.git
cd thoughtbox
docker compose up --build
This starts Thoughtbox and the full observability stack. The MCP server listens on port 1731 and the Observatory UI is available at http://localhost:1729.
Since Thoughtbox uses HTTP transport, configure your MCP client to connect via URL.
Add to your ~/.claude/settings.json or project .claude/settings.json:
{
"mcpServers": {
"thoughtbox": {
"url": "http://localhost:1731/mcp"
}
}
}
To connect through the observability sidecar (adds OpenTelemetry tracing):
{
"mcpServers": {
"thoughtbox": {
"url": "http://localhost:4000/mcp"
}
}
}
Add to your MCP settings or .vscode/mcp.json:
{
"servers": {
"thoughtbox": {
"url": "http://localhost:1731/mcp"
}
}
}
Thought 1: "Users report slow checkout. Let's analyze..."
Thought 2: "Data shows 45s average, target is 10s..."
Thought 3: "Root causes: 3 API calls, no caching..."
Thought 4: "Options: Redis cache, query optimization, parallel calls..."
Thought 5: "Recommendation: Implement Redis cache for product data"
Thought 8: [GOAL] "System handles 10k req/s with <100ms latency"
Thought 7: "Before that: monitoring and alerting operational"
Thought 6: "Before that: resilience patterns implemented"
Thought 5: "Before that: caching layer with invalidation"
...
Thought 1: [START] "Current state: 1k req/s, 500ms latency"
Thought 4: "Need to choose database architecture..."
Branch A (thought 5): branchId="sql-path"
"PostgreSQL: ACID compliance, mature tooling, relational integrity"
Branch B (thought 5): branchId="nosql-path"
"MongoDB: Flexible schema, horizontal scaling, document model"
Thought 6: [SYNTHESIS] "Use PostgreSQL for transactions, MongoDB for analytics"
| Variable | Description | Default |
|---|---|---|
DISABLE_THOUGHT_LOGGING | Suppress thought logging to stderr | false |
THOUGHTBOX_DATA_DIR | Base directory for persistent storage | ~/.thoughtbox |
THOUGHTBOX_PROJECT | Project scope for session isolation | _default |
THOUGHTBOX_TRANSPORT | Transport type (stdio or http) | http |
THOUGHTBOX_STORAGE | Storage backend (fs, memory, or supabase) | fs |
THOUGHTBOX_OBSERVATORY_ENABLED | Enable Observatory web UI | false |
THOUGHTBOX_OBSERVATORY_PORT | Observatory UI port | 1729 |
THOUGHTBOX_OBSERVATORY_CORS | CORS origins for Observatory (comma-separated) | (none) |
THOUGHTBOX_AGENT_ID | Pre-assigned Hub agent ID | (none) |
THOUGHTBOX_AGENT_NAME | Pre-assigned Hub agent name | (none) |
THOUGHTBOX_EVENTS_ENABLED | Enable event emission | false |
THOUGHTBOX_EVENTS_DEST | Event destination | stderr |
SUPABASE_URL | Supabase project URL (required for supabase storage) | (none) |
SUPABASE_SERVICE_ROLE_KEY | Supabase service role key (required for supabase storage) | (none) |
PORT | HTTP server port | 1731 |
HOST | HTTP server bind address | 0.0.0.0 |
NODE_ENV | Node environment | (none) |
PROMETHEUS_URL | Prometheus endpoint (Docker) | http://prometheus:9090 |
GRAFANA_URL | Grafana endpoint (Docker) | http://grafana:3000 |
For local development (requires Node.js 22+):
pnpm install
pnpm build
pnpm dev # Development with hot reload
npx vitest run # Unit tests
pnpm test # Full suite (build + vitest)
pnpm test:agentic # Agentic tests — full suite (build + run)
pnpm test:agentic:tool # Agentic tests — tool-level only
pnpm test:agentic:quick # Agentic tests — quick (no build)
pnpm test:behavioral # Behavioral contract tests
docker compose up --build starts the full stack:
| Service | Port | Description |
|---|---|---|
| thoughtbox | 1731 (MCP), 1729 (Observatory) | Core MCP server + Observatory UI |
| mcp-sidecar | 4000 | Observability proxy with OpenTelemetry |
| otel-collector | 4318 (HTTP), 8889 (metrics) | OpenTelemetry Collector |
| prometheus | 9090 | Metrics storage + alerting |
| grafana | 3001 | Dashboards and visualization |
Persistent data is stored in named volumes: thoughtbox-data, prometheus-data, grafana-data.
src/
├── index.ts # Entry point (Streamable HTTP transport)
├── server-factory.ts # MCP server factory with tool registration
├── thought-handler.ts # Core thought recording logic
├── types.ts # Shared type definitions
├── database.types.ts # Supabase generated types
├── code-mode/ # Code Mode tool surface
│ ├── search-tool.ts # thoughtbox_search — catalog query via JS
│ ├── execute-tool.ts # thoughtbox_execute — operation chaining via tb SDK
│ ├── search-index.ts # Frozen catalog of operations/prompts/resources
│ └── sdk-types.ts # TypeScript definitions for the tb SDK
├── thought/ # Thought operations and tool definitions
├── init/ # Init workflow and state management
│ ├── tool-handler.ts # Init tool operations
│ └── state-manager.ts # Session state persistence
├── sessions/ # Session management
├── sampling/ # Autonomous critique via MCP sampling
│ └── handler.ts # SamplingHandler for LLM critique requests
├── persistence/ # Storage layer
│ ├── storage.ts # InMemoryStorage with LinkedThoughtStore
│ ├── filesystem-storage.ts # FileSystemStorage with atomic writes
│ └── supabase-storage.ts # SupabaseStorage for deployed/cloud usage
├── observatory/ # Real-time visualization
│ ├── ui/ # Self-contained HTML/CSS/JS
│ └── ws-server.ts # WebSocket server for live updates
├── hub/ # Multi-agent collaboration
│ ├── identity.ts # Agent registration
│ ├── workspace.ts # Workspace management
│ ├── problems.ts # Problem tracking with dependencies
│ ├── proposals.ts # Solution proposals with reviews
│ ├── consensus.ts # Decision recording
│ ├── channels.ts # Problem-scoped messaging
│ ├── hub-handler.ts # Hub operation dispatcher
│ └── operations.ts # 28-operation catalog
├── channel/ # Hub event channels and SSE streaming
├── multi-agent/ # Agent attribution, content hashing, conflict detection
├── protocol/ # Ulysses and Theseus protocol tools
├── knowledge/ # Knowledge graph memory
├── auth/ # API key authentication
├── audit/ # Audit manifest generation
├── evaluation/ # LangSmith evaluation and online monitoring
├── notebook/ # Literate programming engine
├── events/ # Event emission system
├── observability/ # Prometheus/Grafana integration
├── prompts/ # MCP prompt definitions
├── references/ # Anchor parsing and resolution
├── revision/ # Revision indexing
├── operations-tool/ # Operations tool handler
└── resources/ # Documentation and patterns cookbook
Thoughtbox supports three storage backends:
LinkedThoughtStore for O(1) thought lookupsData is stored at ~/.thoughtbox/ by default (FileSystemStorage):
~/.thoughtbox/
├── config.json # Global configuration
└── projects/
└── {project}/
└── sessions/
└── {date}/
└── {session-id}/
├── manifest.json
└── {thought-number}.json
We welcome contributions! See CONTRIBUTING.md for:
thick_read code comprehension)MIT License — free to use, modify, and distribute.
makafeli/n8n-workflow-builder
danishashko/make-mcp
lukisch/n8n-manager-mcp
io.github.us-all/airflow
io.github.infoinlet-marketplace/mcp-workflow