ProofFlow turns AI coding sessions into auditable ledgers. It exposes MCP tools to create work contracts, record algorithm decisions, capture git snapshots, bind evidence to agent claims, and export proof packets before merging. You'd reach for this when you want reviewable trails for autonomous agent changes instead of blind trust. The server handles the full chain from contract declaration through cost budgets, diff snapshots, evidence collection, and deterministic evaluation of done criteria. It includes AgentGuard for PR reviews and can gate risky actions before execution. Install via PyPI's proofflow-mcp package and connect over stdio. The project dogfoods itself with CI workflows that post stable review comments and upload proof packets as artifacts.
Agent Work Ledger for AI coding.
Vibe coding is fast. Blind trust is not enough.
ProofFlow makes AI-generated work reviewable, traceable, and reversible by recording the full chain from work contract to proof packet: contract first, record the algorithm decision, declare the cost budget, snapshot the code state, bind claims to evidence, evaluate done criteria, then export an auditable packet.
Latest release: v0.1.8 - Agent Work Ledger for AI coding
▶ Watch the 72s demo: From AI agent claims to verifiable Proof Packets
Demo asset (deferred): The end-to-end dogfood Demo_Asset GIF and the VSCode_Channel inline audit / Approve Gate screenshots for the v0.1.x dogfood-and-channel-polish milestone are deferred to the next dogfood cycle (no capturable VS Code window in this milestone). Tracked in
PLANS.md#vscode-channel-screenshots-deferred-from-v0-1-x-dogfood.
📦 Example Proof Packets: code review · issue triage · agent work ledger · ledger dogfood
Maintainer workflow: docs/maintainer_evidence_workflow.md
Agent Work Ledger guide: docs/agent_work_ledger.md
Ledger Risk Hints: docs/ledger_risk_hints.md
5-minute MCP quickstart: docs/ledger_quickstart_mcp.md
Ledger PR comment template: docs/examples/pr_comment_agent_work_ledger.md
AgentGuard semantic rules: docs/agentguard_semantic_rules.md
ProofFlow is not only a PR review helper. It is a local-first ledger for AI coding work. A Ledger Case captures the workflow before, during, and after an agent changes code:
Main chain: Work Contract -> Algorithm Decision -> Cost Budget -> Snapshot -> Evidence -> Claim -> Evaluation -> Proof Packet. This keeps the core product invariant sharp: no Case, no workflow; no Evidence, no trusted Claim; no done criteria evaluation, no quiet success.
Risk Hints extend the evidence flow without turning ProofFlow into an automatic algorithm judge. They tell the maintainer when the recorded route may be wrong or too expensive, such as regeneration where mapping was required, budget overrun metadata, or tests that prove output but not method.
See docs/agent_work_ledger.md for the full
architecture and evaluation model, or
docs/ledger_quickstart_mcp.md to run the
full MCP flow.
ProofFlow v0.1.6 was dogfooded on a real repository PR. The GitHub Actions
workflow ran AgentGuard, posted a stable PR summary comment, uploaded
summary.json, and exported a downloadable Proof Packet.
summary.json, no merge blocking.AI coding agents (Claude Code, Codex, Copilot Workspace) can modify files, run commands, and make decisions autonomously. But there's no standard way to:
ProofFlow solves this by sitting between the agent and the filesystem, creating an evidence graph that links every action to its justification.
Run from the parent directory of the freshly cloned repo. The
Push-Location / Pop-Location pair keeps the working directory at the
repository root for the docker compose up command and restores it after the
block, so this snippet is copy-paste safe in a single PowerShell session.
git clone https://github.com/Hyperion-GPU/ProofFlow-v0.1.git
Push-Location ProofFlow-v0.1
docker compose up
Pop-Location
Backend: http://localhost:8787 | Frontend: http://localhost:5173
Docker publishes both ports on 127.0.0.1 by default to preserve ProofFlow's
localhost trust boundary. For stronger local protection, set an API key before
starting:
PROOFFLOW_API_KEY=change-me docker compose up
If you enable backend auth for the Docker frontend, use the same
PROOFFLOW_API_KEY value at build time so Vite can embed
VITE_PROOFFLOW_API_KEY in the static frontend bundle. AgentGuard
test_command execution is disabled by default; set
PROOFFLOW_ENABLE_TEST_COMMANDS=true only when you intentionally want the
backend to run local test commands during review.
Start each component from the repository root in a single PowerShell session.
Push-Location / Pop-Location keeps the working directory predictable across
the backend and frontend blocks; the backend port is fixed to 8787 to match
the make dev-backend baseline. npm run dev is a long-running process - run
the frontend block in a second PowerShell session if you want to keep the
backend uvicorn process visible in the first.
# Backend
Push-Location backend
pip install -r requirements.txt
python -m uvicorn proofflow.main:app --port 8787
Pop-Location
# Frontend (long-running; recommended in a second PowerShell session)
Push-Location frontend
npm ci
npm run dev
Pop-Location
pip install proofflow-mcp
Add to your project's .mcp.json:
{
"mcpServers": {
"proofflow": {
"command": "proofflow-mcp",
"env": { "PROOFFLOW_BASE_URL": "http://127.0.0.1:8787" }
}
}
}
Now your AI agent can keep an Agent Work Ledger, scan files, review code, triage issues, suggest actions, and export audit reports - all with enforced safety gates.
ProofFlow also includes a repo-local Codex plugin at
plugins/proofflow-maintainer. It provides
starter prompts and a maintainer-focused skill for:
The plugin uses the same local proofflow-mcp server and keeps the backend
trust boundary at http://127.0.0.1:8787.
See the public-safe
Agent Work Ledger example
for the expected handoff shape.
AI Agent (Claude Code / Codex / Custom)
|
| MCP Protocol (stdio)
v
ProofFlow MCP Server (23 tools)
|
| HTTP REST API
v
ProofFlow Backend (FastAPI + SQLite)
|
|--- Agent Work Ledger: Contract > Algorithm > Budget > Snapshot > Evidence > Claim > Evaluation > Packet
|--- Evidence Graph: Cases > Artifacts > Claims > Evidence
|--- Action Pipeline: Preview > Approve > Execute > Undo
|--- Policy Gates: Risk classification > Owner decision
|--- Proof Packets: Exportable markdown audit reports
v
Local Filesystem (scanned files, git repos)
Records complex AI coding work as a first-class Case. The main flow is Work Contract -> Algorithm Decision -> Cost Budget -> Snapshot -> Evidence -> Claim -> Evaluation -> Proof Packet, so maintainers can see what the agent promised, what approach it chose, what cost limits it accepted, what changed, what evidence backs its claims, whether the done criteria were satisfied, and which Risk Hints deserve human review.
Analyzes git diffs, generates risk-scored claims, and links each claim to specific evidence (changed lines, test results). No claim exists without supporting evidence.
Scans directories, indexes files with SHA-256 hashes, extracts text for full-text search, and suggests organization actions — all tracked in an auditable Case.
Captures issue text as a first-class Case with source Artifact, deterministic triage Claims, component inference, label suggestions, and Proof Packet export.
High-risk filesystem actions (moves to system paths, bulk operations) are automatically paused at pending_decision status. Requires explicit owner approval before execution.
finished_with_riskshealth · scan · suggest · review · triage_issue · start_work_contract · record_algorithm_decision · record_cost_budget · capture_snapshot · record_evidence · record_claim · evaluate_contract · finish_work_ledger · status · approve_execute · export_packet · search · list_cases · list_actions · undo · decide
explain_risk_hint records an evidence-backed Decision for a Ledger Risk Hint
without suppressing the hint.
| Layer | Technology | Tests |
|---|---|---|
| Backend | Python 3.12, FastAPI, SQLite | 311 |
| Frontend | React 19, TypeScript, Vite | 25 |
| MCP Server | Python, MCP SDK, httpx | 44 |
| CI | GitHub Actions (PR review + release gates) | Audit artifact + PR comment |
PROOFFLOW_API_KEY)PROOFFLOW_RATE_LIMIT)PROOFFLOW_ENABLE_TEST_COMMANDS)PROOFFLOW_MCP_MAX_CONCURRENT)v0.1.0 — Stable release. All core workflows functional, tested, and documented.
| Milestone | Status |
|---|---|
| Core evidence graph (Case/Artifact/Claim/Evidence) | Done |
| LocalProof file audit workflow | Done |
| AgentGuard code review workflow | Done |
| Issue triage workflow | Done |
| Policy gate enforcement | Done |
| MCP server for Claude Code/Codex | Done |
| Backup/restore with safety preview | Done |
| Docker deployment | Done |
PyPI package (proofflow-mcp) | Done |
Run from the repository root in a single PowerShell session. Each
Push-Location / Pop-Location block restores the working directory back to
the repository root, so the python scripts/... smoke tests and
scripts/demo_workflow.py below can be pasted in the same session.
# Run all tests
Push-Location backend
python -m pytest # 311 tests
Pop-Location
Push-Location frontend
npm run test # 29 tests
Pop-Location
Push-Location mcp-server
pip install -e ".[dev]"
python -m pytest # 44 tests
Pop-Location
# End-to-end smoke test (cwd: repository root)
python scripts/mcp_smoke.py --cleanup
python scripts/ledger_mcp_smoke.py --cleanup
python scripts/ledger_risk_hints_smoke.py --cleanup
python scripts/ledger_risk_hints_dogfood_matrix.py --cleanup
# Demo workflow (cwd: repository root)
python scripts/demo_workflow.py
Local backend data defaults to backend/data/. For dogfood runs that should not
touch repository-local state, set PROOFFLOW_DB_PATH and PROOFFLOW_DATA_DIR
to a temporary directory before starting the backend.
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
MIT
Built by Hyperion-GPU — making AI agent workflows auditable, safe, and provable.
PROOFFLOW_BASE_URLProofFlow backend URL (default: http://127.0.0.1:8787)
PROOFFLOW_API_KEYsecretAPI key for authenticated access to the ProofFlow backend
silenceper/mcp-k8s
azure/containerization-assist
io.github.evozim/aws-builder
reza-gholizade/k8s-mcp-server
flux159/mcp-server-kubernetes