Figuard Mcp

authSTDIOregistry active

Summary

Exposes authorize, confirm, void, and fail operations so Claude can ask permission before spending money or consuming resources. Works like a two-phase commit: the agent requests capacity, FiGuard reserves it, the agent executes the real action (Stripe charge, OpenAI call, whatever), then reports back what actually moved. Every decision lands in an append-only ledger. You set a budget in dollars or tokens, hand the session token to your agent, and watch the spend tree fill up in real time. Useful when you're running autonomous flows that hit paid APIs and need to prevent runaway costs without killing the process after the damage is done.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

A travel-booking agent hit a Stripe timeout and retried twice. The customer's card was charged three times for the same flight before anyone noticed — 40 minutes later.

No alert fired. No limit existed. The agent had a valid API key and no concept of "I already did this."

FiGuard gives agents bounded resources — money, tokens, API calls, GPU hours, any unit you define — and they ask permission before consuming them. You set the ceiling, the retry rules, and the idempotency policy once. Every attempt, authorized or denied, lands in an append-only audit log.

That exact failure is in the stress harness: a retried charge produces 0 double-charges, and 100 agents racing one budget produce 0 overspends — verified against the ledger, reproducibly (make bench).

Your framework decides what to do next. FiGuard decides whether the resource-consuming action is allowed.

  Your agent code  (LangChain · LangGraph · CrewAI · any runtime)
  orchestrates — decides what to do next
          ↓  agent wants to spend / call / execute
  figuard.authorize()
  checks: limit · category · velocity · dedup
          ↓  AUTHORIZED — action proceeds
  Stripe · OpenAI · any API or service
  executes — real money or resource consumed
          ↓  action completes
  figuard.confirm()
  settles reservation — ledger updated

LangChain / LangGraph — FiGuard authorizes each tool call before it executes. A budget-exhausted agent stops cleanly instead of running up cost — even across parallel nodes in a LangGraph.

CrewAI — Each crew member gets a delegation token with its own cap. A runaway specialist is stopped at its limit without affecting the rest of the crew.

OpenAI Agents SDK / MCP — Wrap tools with @guarded_function_tool or add the FiGuard MCP server — every tool call is pre-flight authorized before it reaches the API.

Not using a framework? — The raw SDK works anywhere — a Python script, a background job, a serverless function. If it calls an API that costs money or consumes a bounded resource, FiGuard fits.

FiGuard demo

Try it now — no setup, no signup:
→ pip install figuard — runs locally on your machine, nothing hosted (Quickstart)
→ Run in Colab — or try it in the browser
→ Live dashboard

FiGuard is the authorization and ledger layer — not a payment processor, not a policy DSL, not an adversarial-agent firewall. Full scope →

Quickstart

Tested with:

Framework	Versions	Python
LangChain	≥ 0.3.0	3.9 – 3.12
LangGraph	≥ 0.2.0	3.10 – 3.12
CrewAI	≥ 0.102	3.10 – 3.12
OpenAI Agents SDK	≥ 0.0.5	3.10 – 3.12
TypeScript SDK	Node ≥ 18	—
MCP server	Claude Code, Cursor, Claude Desktop	—

pip install figuard

from figuard import FiGuardClient

# Zero-config, zero-infra — runs enforcement locally (embedded SQLite, no server).
# To share one budget across agents/processes, point at a server:
#   FiGuardClient(api_key="fg_live_...", base_url="https://figuard.mycompany.internal")
client = FiGuardClient()

budget = client.create_budget(
    user_id="agent_001",
    total_limit=500.00,
    currency="USD",
    intent_context="travel booking session",
)

auth = client.authorize(budget=budget, amount=270.00)
print(auth.decision)          # AUTHORIZED
print(auth.approved_quantity) # 270.0

# Confirm with actual charged amount — may differ from requested (taxes, FX, discounts)
client.confirm(auth, 267.00)

# Second spend — exceeds what's left ($500 - $267 = $233 remaining)
auth2 = client.authorize(budget=budget, amount=350.00)
print(auth2.decision)       # DENIED
print(auth2.denial_reason)  # INSUFFICIENT_FUNDS

Same calls run against a self-hosted server — and there, every authorization, denial, and confirmation shows up in the live spend-tree dashboard. (Embedded keeps the same ledger in your local SQLite; the live dashboard is a server feature.)

Not sure what limits to set? Add trust_mode="SHADOW" to create_budget — all checks run, nothing is blocked, and auth.would_have_been tells you what would have happened. When the limits look right, switch to enforcement without recreating the budget: client.update_budget(budget.id, trust_mode="FULL_ENFORCEMENT").

How It Works

Embedded and server run the same engine. These four operations — and all 29 structured denial codes — behave identically whether you pip install figuard and run embedded (in-process, against a local SQLite file) or point the client at a self-hosted server. Session tokens, delegation tokens, and the live spend-tree dashboard shown below are server-mode features; embedded keeps the same append-only ledger in your local file and exposes the tree programmatically via get_spend_tree().

Four operations. Everything else is detail.

Operation	What it does
`authorize()`	Agent asks permission — capacity reserved, nothing moved yet
`confirm()`	Report what actually moved — releases the reservation
`void()`	Cancel a pending authorization — reservation released
`fail()`	Record a failed action — reservation released

authorize() reserves capacity; confirm() / fail() / void() then settles or releases it — every transition lands in the append-only ledger, and execution happens externally (FiGuard never sees the data or proxies the call). In embedded mode you call authorize/confirm directly on the budget; in server mode the budget issues session tokens — and delegation tokens for fleets — but the four operations are otherwise identical. The full lifecycle (budget → tokens → fleet delegation → ledger) is diagrammed in the API Reference.

The spend tree shows the full causal chain across an orchestrator and its sub-agents:

FiGuard Spend Tree — orchestrator with confirmed and denied sub-agent events

How the Hard Parts Are Solved

The authorize endpoint looks simple — check the balance, write a record. The parts that matter aren't obvious until you've hit them in production:

Concurrent authorization — two agents sharing a budget can both read the same available balance, both see enough funds, and both get approved. By the time the second write lands, you're over limit. The fix is a pessimistic write lock on the budget row during authorization. Easy to know, easy to forget.

Dangling reservations — a network timeout between the authorization write and the HTTP response leaves the agent with no event ID and the budget with a reserved amount it can't release. You need idempotency keyed to the request, not the response, so a retry finds the original authorization instead of creating a second one.

The reservation/confirmation split — if you use a single amountSpent field and deduct at authorization time, two concurrent authorizations both read the same balance before either writes. The correct model is two fields: amountReserved (deducted at authorization) and amountSpent (moved from reserved at confirmation). This is the two-phase reserve-then-capture pattern that payment processors use. It's not novel — it's just usually hidden inside Stripe.

Session token security — you need a token that scopes to exactly one budget, is returned exactly once, and is never stored in plaintext. If you store the raw token and your database is breached, every active agent session is compromised. Hash at write time, never store the raw value.

Append-only ledger — a mutable status field on an authorization record loses history. When you need to reconstruct what happened and why a budget hit its limit — or when a finance team asks why $40K of agent spend happened last Tuesday — you want every state transition as a separate row, not an update to the previous one.

These are the same problems payment infrastructure teams solved 20 years ago. The reserve-then-confirm pattern, idempotency keyed to the request, append-only ledger — none of it is novel. FiGuard is that infrastructure applied to agent systems.

Failure Scenarios

These are failure modes that logging and observability tools can't catch — they require enforcement at authorization time. Each has a Colab to run with no API keys needed.

Notebooks live in figuard-notebooks; each runs in Colab with no API keys required.

Scenario	Framework	Failure mode	FiGuard stops it at
Payment retry storm	LangChain	Tool times out after Stripe charges. Retry = double charge.	Idempotency key — retry returns the same event, Stripe never called twice
Research cost spiral	LangGraph	Loop runs 30 iterations on an ambiguous query. LLM controls the exit.	Budget ceiling at $0.20 — loop exits at iteration 20
Fleet attribution loss	LangGraph	Supervisor routes through 3 sub-agents. No per-agent cost caps.	Delegation token per agent — researcher capped, others unaffected
Parallel crew blowout	CrewAI	Parallel crew — one member makes 25 API calls on a 5-call task	Delegation cap stops the runaway member, rest of crew completes
Concurrent overspend	Any	10 agents share one budget. All read the same balance simultaneously.	Pessimistic lock — 5 authorized, 5 denied, $1k ceiling never exceeded
Category violation	Any	Hotel charged to flight budget. Found at month-end.	`DENIED — NO_MATCHING_ALLOCATION` at authorization time

Source: examples/framework_scenarios/ · examples/rogue_agent_scenarios/

pip install figuard
python examples/framework_scenarios/langchain_payment_retry.py      # no API keys needed
python examples/framework_scenarios/langgraph_research_loop.py
python examples/framework_scenarios/langgraph_supervisor_fleet.py
python examples/framework_scenarios/crewai_parallel_crew.py

What FiGuard Is Not

Not a payment processor. FiGuard never touches money. It authorizes the intent to spend and records the decision. The actual payment goes through your existing processor as before.

Not a policy language. Budget limits and allocation caps are structured data, not a DSL. FiGuard matches the category an agent declares against the categories you defined — nothing more.

Not a firewall for human users. FiGuard is purpose-built for agent-to-service authorization. The session token model assumes agents are ephemeral and untrusted by default.

Not a replacement for Stripe spending controls. Use both if you want defense in depth. FiGuard blocks at agent decision time; Stripe blocks at payment time. Different layers.

Not a security boundary against adversarial agents. FiGuard enforces what the agent declares. An agent that lies about its category or amount bypasses category enforcement. FiGuard is designed for honest agents with bounded resources — the same threat model as a database connection pool or a rate limiter. It prevents accidental overspend and enforces organizational policies on well-behaved agents. For adversarial agent containment, pair FiGuard with a security layer like Microsoft AGT.

Observability tools record what happened after execution. LLM gateways manage model routing and token spend. FiGuard is the enforcement layer — it authorizes before any action executes, across the full resource spectrum. They complement each other.

Self-Hosting

Most teams start with embedded mode above — pip install figuard, zero infra. Self-host when you need a budget shared across multiple processes or machines, delegation tokens for a fleet, or the live spend-tree dashboard. It's the graduation tier, not a prerequisite.

Self-hosting is then a single Docker container alongside your existing infrastructure — same as adding Postgres or Redis. Your spend data never leaves your environment.

git clone https://github.com/figuard/figuard-core
cd figuard-core
docker compose -f docker-compose.prod.yml up -d   # pulls the released image
# Ready at http://localhost:8080

docker-compose.prod.yml pulls the published ghcr.io/figuard/figuard-core:latest (the last released version). The default docker-compose.yml builds from source — for contributors.

Point your client at it:

client = FiGuardClient(
    api_key="your_api_key",
    base_url="http://localhost:8080",
)

Full setup guide, environment variables, Postgres configuration, and production checklist: Self-Hosting.

Performance

The headline isn't speed — it's correctness under concurrency. The stress harness (bench/stress.py) verifies the invariants directly against the Postgres ledger, not the HTTP responses:

0 overspends across concurrent authorizations on a shared budget — 100 agents race for a $1,000 budget, exactly 20 win, the budget lands at exactly $1,000.00, never over.
0 double-charges across retried requests — the same idempotency key fired 50× in parallel produces exactly one event.

Typical authorize latency against the server (each call on its own budget, M1 / Docker): p50 17ms, p99 74ms. Under deliberate single-budget contention the pessimistic lock serializes requests — they queue rather than race, which is the price of never overspending.

In embedded mode there's no network hop and no Postgres — an authorize is an in-process SQLite transaction — so latency is lower still, and there's nothing to deploy.

Full methodology, numbers, and reproduction in BENCHMARKS.md — or run it yourself: make bench.

Docs

Start here:

API Reference — full endpoint reference with payloads
Pick Your Pattern — decision tree: find your scenario, get exact code
Framework Integrations — LangChain, CrewAI, OpenAI Agents SDK, Anthropic
Self-Hosting — Docker, Postgres, production checklist

Reference:

Budget Configuration — full parameter reference for all configuration layers
Enforcement Features — denial codes, anomaly detection, allocation modes
Fleet Agents & Delegation Tokens
Handling Denials — per-code recovery strategies, LLM prompt instructions
Audit & Replay — ledger, point-in-time snapshots, timeline, what-if analysis
Webhooks — event types, registration, signature verification
Observability — FiGuard spans in Langfuse, Jaeger, Honeycomb, Datadog
TypeScript SDK
MCP Server
Cookbook — short recipes: authorize/confirm/void, parallel calls, causal chains, testing
Known Limitations

Interactive API docs: localhost:8080/swagger-ui · sandbox

SDKs

SDK	Install
Python	`pip install figuard`
TypeScript / Node.js	`npm install figuard`
MCP Server	`npx figuard-mcp`

Roadmap

Recently shipped (v1.2.0): embedded mode — pip install figuard runs enforcement in-process against a local SQLite file, zero infra, same engine as the server; plus update_budget(), get_spend_tree() in embedded, and persistent local budgets.

Next:

Java SDK — JVM client (today it's available from source under sdk/java; a published Maven Central artifact is planned)
Scoped tokens — derived session tokens with hard restrictions on action types, categories, and max transaction amount; for untrusted sub-agent delegation
Overdraft policies — per-budget REJECT / ALLOW_IF_AVAILABLE / ALLOW_WITH_OVERDRAFT modes

See ROADMAP.md for the full list.

Versioning

FiGuard follows Semantic Versioning. v1.0.0 is the first stable release — the API and SDK interfaces are stable from this version forward.

Contributing

Issues, PRs, and integration requests welcome.

Looking for contributors on: Go SDK · LlamaIndex integration · DSPy integration · Helm chart

License

Apache 2.0 — see LICENSE.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Configuration

FIGUARD_API_KEYsecret

Your FiGuard API key. Leave unset to use the shared public sandbox — no account needed to try it.

FIGUARD_BASE_URL

Your FiGuard server URL. Leave unset to use the shared public sandbox.

Quickstart

Tested with:

Framework	Versions	Python
LangChain	≥ 0.3.0	3.9 – 3.12
LangGraph	≥ 0.2.0	3.10 – 3.12
CrewAI	≥ 0.102	3.10 – 3.12
OpenAI Agents SDK	≥ 0.0.5	3.10 – 3.12
TypeScript SDK	Node ≥ 18	—
MCP server	Claude Code, Cursor, Claude Desktop	—

pip install figuard

from figuard import FiGuardClient

# Zero-config, zero-infra — runs enforcement locally (embedded SQLite, no server).
# To share one budget across agents/processes, point at a server:
#   FiGuardClient(api_key="fg_live_...", base_url="https://figuard.mycompany.internal")
client = FiGuardClient()

budget = client.create_budget(
    user_id="agent_001",
    total_limit=500.00,
    currency="USD",
    intent_context="travel booking session",
)

auth = client.authorize(budget=budget, amount=270.00)
print(auth.decision)          # AUTHORIZED
print(auth.approved_quantity) # 270.0

# Confirm with actual charged amount — may differ from requested (taxes, FX, discounts)
client.confirm(auth, 267.00)

# Second spend — exceeds what's left ($500 - $267 = $233 remaining)
auth2 = client.authorize(budget=budget, amount=350.00)
print(auth2.decision)       # DENIED
print(auth2.denial_reason)  # INSUFFICIENT_FUNDS

How It Works

Embedded and server run the same engine. These four operations — and all 29 structured denial codes — behave identically whether you pip install figuard and run embedded (in-process, against a local SQLite file) or point the client at a self-hosted server. Session tokens, delegation tokens, and the live spend-tree dashboard shown below are server-mode features; embedded keeps the same append-only ledger in your local file and exposes the tree programmatically via get_spend_tree().

Four operations. Everything else is detail.

Operation	What it does
`authorize()`	Agent asks permission — capacity reserved, nothing moved yet
`confirm()`	Report what actually moved — releases the reservation
`void()`	Cancel a pending authorization — reservation released
`fail()`	Record a failed action — reservation released

The spend tree shows the full causal chain across an orchestrator and its sub-agents:

FiGuard Spend Tree — orchestrator with confirmed and denied sub-agent events

How the Hard Parts Are Solved

The authorize endpoint looks simple — check the balance, write a record. The parts that matter aren't obvious until you've hit them in production:

Failure Scenarios

These are failure modes that logging and observability tools can't catch — they require enforcement at authorization time. Each has a Colab to run with no API keys needed.

Notebooks live in figuard-notebooks; each runs in Colab with no API keys required.

Scenario	Framework	Failure mode	FiGuard stops it at
Payment retry storm	LangChain	Tool times out after Stripe charges. Retry = double charge.	Idempotency key — retry returns the same event, Stripe never called twice
Research cost spiral	LangGraph	Loop runs 30 iterations on an ambiguous query. LLM controls the exit.	Budget ceiling at $0.20 — loop exits at iteration 20
Fleet attribution loss	LangGraph	Supervisor routes through 3 sub-agents. No per-agent cost caps.	Delegation token per agent — researcher capped, others unaffected
Parallel crew blowout	CrewAI	Parallel crew — one member makes 25 API calls on a 5-call task	Delegation cap stops the runaway member, rest of crew completes
Concurrent overspend	Any	10 agents share one budget. All read the same balance simultaneously.	Pessimistic lock — 5 authorized, 5 denied, $1k ceiling never exceeded
Category violation	Any	Hotel charged to flight budget. Found at month-end.	`DENIED — NO_MATCHING_ALLOCATION` at authorization time

Source: examples/framework_scenarios/ · examples/rogue_agent_scenarios/

pip install figuard
python examples/framework_scenarios/langchain_payment_retry.py      # no API keys needed
python examples/framework_scenarios/langgraph_research_loop.py
python examples/framework_scenarios/langgraph_supervisor_fleet.py
python examples/framework_scenarios/crewai_parallel_crew.py

What FiGuard Is Not

Not a payment processor. FiGuard never touches money. It authorizes the intent to spend and records the decision. The actual payment goes through your existing processor as before.

Not a policy language. Budget limits and allocation caps are structured data, not a DSL. FiGuard matches the category an agent declares against the categories you defined — nothing more.

Not a firewall for human users. FiGuard is purpose-built for agent-to-service authorization. The session token model assumes agents are ephemeral and untrusted by default.

Not a replacement for Stripe spending controls. Use both if you want defense in depth. FiGuard blocks at agent decision time; Stripe blocks at payment time. Different layers.

Self-Hosting

Self-hosting is then a single Docker container alongside your existing infrastructure — same as adding Postgres or Redis. Your spend data never leaves your environment.

git clone https://github.com/figuard/figuard-core
cd figuard-core
docker compose -f docker-compose.prod.yml up -d   # pulls the released image
# Ready at http://localhost:8080

docker-compose.prod.yml pulls the published ghcr.io/figuard/figuard-core:latest (the last released version). The default docker-compose.yml builds from source — for contributors.

Point your client at it:

client = FiGuardClient(
    api_key="your_api_key",
    base_url="http://localhost:8080",
)

Full setup guide, environment variables, Postgres configuration, and production checklist: Self-Hosting.

Performance

The headline isn't speed — it's correctness under concurrency. The stress harness (bench/stress.py) verifies the invariants directly against the Postgres ledger, not the HTTP responses:

0 overspends across concurrent authorizations on a shared budget — 100 agents race for a $1,000 budget, exactly 20 win, the budget lands at exactly $1,000.00, never over.
0 double-charges across retried requests — the same idempotency key fired 50× in parallel produces exactly one event.

In embedded mode there's no network hop and no Postgres — an authorize is an in-process SQLite transaction — so latency is lower still, and there's nothing to deploy.

Full methodology, numbers, and reproduction in BENCHMARKS.md — or run it yourself: make bench.

Docs

Start here:

API Reference — full endpoint reference with payloads
Pick Your Pattern — decision tree: find your scenario, get exact code
Framework Integrations — LangChain, CrewAI, OpenAI Agents SDK, Anthropic
Self-Hosting — Docker, Postgres, production checklist

Reference:

Budget Configuration — full parameter reference for all configuration layers
Enforcement Features — denial codes, anomaly detection, allocation modes
Fleet Agents & Delegation Tokens
Handling Denials — per-code recovery strategies, LLM prompt instructions
Audit & Replay — ledger, point-in-time snapshots, timeline, what-if analysis
Webhooks — event types, registration, signature verification
Observability — FiGuard spans in Langfuse, Jaeger, Honeycomb, Datadog
TypeScript SDK
MCP Server
Cookbook — short recipes: authorize/confirm/void, parallel calls, causal chains, testing
Known Limitations

Interactive API docs: localhost:8080/swagger-ui · sandbox

SDKs

SDK	Install
Python	`pip install figuard`
TypeScript / Node.js	`npm install figuard`
MCP Server	`npx figuard-mcp`

Roadmap

Next:

Java SDK — JVM client (today it's available from source under sdk/java; a published Maven Central artifact is planned)
Scoped tokens — derived session tokens with hard restrictions on action types, categories, and max transaction amount; for untrusted sub-agent delegation
Overdraft policies — per-budget REJECT / ALLOW_IF_AVAILABLE / ALLOW_WITH_OVERDRAFT modes

See ROADMAP.md for the full list.

Versioning

FiGuard follows Semantic Versioning. v1.0.0 is the first stable release — the API and SDK interfaces are stable from this version forward.

Contributing

Issues, PRs, and integration requests welcome.

Looking for contributors on: Go SDK · LlamaIndex integration · DSPy integration · Helm chart

License

Apache 2.0 — see LICENSE.

Figuard Mcp

Quickstart

How It Works

How the Hard Parts Are Solved

Failure Scenarios

What FiGuard Is Not

Self-Hosting

Performance

Docs

SDKs

Roadmap

Versioning

Contributing

License

Configuration

Figuard Mcp

Quickstart

How It Works

How the Hard Parts Are Solved

Failure Scenarios

What FiGuard Is Not

Self-Hosting

Performance

Docs

SDKs

Roadmap

Versioning

Contributing

License

Configuration

Related Security & Pentesting MCP Servers

Related Security & Pentesting MCP Servers