Records every action an AI agent takes so you can replay sessions step by step and catch regressions. Provides tools to start and stop recording, log individual actions with inputs, outputs, reasoning and timing, then replay the full sequence. The real utility is in compare_sessions, which diffs two runs and shows you exactly where behavior diverged, and find_divergence_point, which pinpoints the first step that deviated from expected output. Export sessions as JSON for tooling or markdown for human review. Useful when debugging non-deterministic agent behavior, comparing model versions, or proving that a workflow changed between runs. Works over stdio and stores sessions in memory keyed by session ID.
MCP server for agent session recording and replay — debug non-deterministic agent behavior with session comparison and divergence detection.
Record every action an agent takes, replay sessions step by step, diff two runs to find behavioral regressions, and pinpoint exactly where an agent diverged from expected output.
npx agent-replay-mcp
Add to claude_desktop_config.json:
{
"mcpServers": {
"agent-replay": {
"command": "npx",
"args": ["agent-replay-mcp"]
}
}
}
git clone https://github.com/mdfifty50-boop/agent-replay-mcp.git
cd agent-replay-mcp
npm install
node src/index.js
Start recording all actions for an agent session.
| Param | Type | Default | Description |
|---|---|---|---|
agent_id | string | required | Unique agent identifier |
metadata | object | {} | Optional metadata (task, model, environment) |
Returns a session_id for use with other tools.
Stop recording and return a session summary.
| Param | Type | Description |
|---|---|---|
session_id | string | Session ID from record_session |
Returns: action count, total duration, action type breakdown.
Log a single action during a recording session.
| Param | Type | Default | Description |
|---|---|---|---|
session_id | string | required | Active session ID |
action_type | string | required | Type (tool_call, llm_response, decision, error) |
input | any | required | Input to the action |
output | any | required | Output from the action |
reasoning | string | "" | Agent reasoning for this step |
duration_ms | number | 0 | Action duration in milliseconds |
Replay a recorded session step by step with full action detail.
| Param | Type | Description |
|---|---|---|
session_id | string | Session ID to replay |
Returns: complete action sequence with timing, reasoning, inputs, and outputs.
Behavioral diff between two sessions. Aligns actions by step index and highlights differences.
| Param | Type | Description |
|---|---|---|
session_id_1 | string | First session |
session_id_2 | string | Second session |
Returns: similarity ratio, identical/divergent step counts, first divergence step, and per-step diffs.
Find where an agent first deviated from expected output.
| Param | Type | Description |
|---|---|---|
session_id | string | Session to analyze |
expected_output | any | Expected final output, or array of per-step expected outputs |
If expected_output is an array, compares step by step. If a single value, finds the last matching output and flags the next step as the divergence point.
Export a session for sharing and offline analysis.
| Param | Type | Default | Description |
|---|---|---|---|
session_id | string | required | Session to export |
format | string | "json" | "json" or "markdown" |
Markdown format produces a readable transcript with step headers, reasoning, and code blocks.
| URI | Description |
|---|---|
agent-replay://sessions | All recorded sessions with status and action counts |
1. record_session — start recording at agent launch
2. For each agent action:
- log_action — capture input, output, reasoning, timing
3. stop_recording — finalize the session
4. Debug:
- replay_session — review what happened step by step
- compare_sessions — diff today's run vs yesterday's
- find_divergence_point — pinpoint where it went wrong
5. Share:
- export_session — JSON for tooling, markdown for humans
npm test
MIT
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent