Lets you estimate LLM costs before running agent workflows, then learns from actual usage to improve accuracy over time. Exposes five MCP tools: estimate_cost takes size, file count, and complexity to predict token spend across research, architecture, and code steps with optimistic/expected/pessimistic bands. After work completes, report_session feeds actual costs back into a calibration system that adjusts future estimates. Ships with a Claude Code plugin that auto-triggers estimation after planning agents and records session data at shutdown. Also works standalone in Cursor, VS Code, and Windsurf. Useful when you're running multi-step agentic workflows and want to avoid surprise API bills or need budget guardrails before kicking off expensive tasks.
Pre-execution cost estimation for LLM agent workflows. Get a cost estimate before running any agent task, then let tokencast learn from actuals to improve accuracy over time.
Available as a Claude Code plugin (recommended — one command delivers everything) or as an MCP server for Cursor, VS Code + Copilot, and Windsurf.
Install tokencast as a Claude Code plugin — delivers the MCP server, calibration hooks, and estimation skill in two commands:
/plugin marketplace add krulewis/tokencast
/plugin install tokencast@tokencast
The first command registers the tokencast marketplace. The second installs the plugin from it.
Prerequisites:
uvmust be installed for the MCP server to function. Install with:curl -LsSf https://astral.sh/uv/install.sh | sh
This delivers:
estimate_cost, get_calibration_status, get_cost_history, report_session, report_step_cost)Calibration data is stored in ~/.tokencast/calibration/ (global across projects, preserved on uninstall).
Scope options:
--scope user(recommended — installs globally for all projects) or--scope project(per-project only).
Install the package:
pip install tokencast
Or with uvx (no install required — runs directly from PyPI):
uvx tokencast
Configure your IDE — replace /path/to/your/project with your actual project path in the config snippets below.
Create or update .cursor/mcp.json in your project root:
{
"mcpServers": {
"tokencast": {
"command": "tokencast-mcp",
"args": [
"--calibration-dir", "/path/to/your/project/calibration",
"--project-dir", "/path/to/your/project"
]
}
}
}
Create or update .vscode/mcp.json in your project root:
{
"servers": {
"tokencast": {
"type": "stdio",
"command": "tokencast-mcp",
"args": [
"--calibration-dir", "/path/to/your/project/calibration",
"--project-dir", "/path/to/your/project"
]
}
}
}
Add to your Windsurf MCP config:
{
"mcpServers": {
"tokencast": {
"command": "tokencast-mcp",
"args": [
"--calibration-dir", "/path/to/your/project/calibration",
"--project-dir", "/path/to/your/project"
]
}
}
}
Full config examples are in docs/ide-configs/.
Once configured, tokencast exposes five MCP tools in your IDE:
| Tool | What it does |
|---|---|
estimate_cost | Estimate API cost for a planned task before running it |
get_calibration_status | Check whether your estimates are well-calibrated |
get_cost_history | Browse past estimates vs actuals |
report_session | Report actual cost at session end to improve calibration |
report_step_cost | Record the cost of a single pipeline step during a session |
Example — estimate before starting work:
Estimate the cost for: size=M, files=8, complexity=high
Example — report actuals after finishing:
Report session cost: actual_cost=4.20
tokencast includes opt-out anonymous usage telemetry. It is on by default — data is collected unless you explicitly disable it.
What is collected: session count, mean accuracy ratio, calibrated factor count, client name, framework, tool name, package version. What is NOT collected: project names, file paths, cost amounts, or any personal data.
To disable:
disable_telemetry MCP tool (permanent opt-out, creates ~/.tokencast/no-telemetry)--no-telemetry to the MCP server commandTOKENCAST_TELEMETRY=0 in your environmentPrecedence (highest to lowest):
TOKENCAST_TELEMETRY=0 → always disablesTOKENCAST_TELEMETRY=1 → always enables (overrides --no-telemetry and the no-telemetry file)~/.tokencast/no-telemetry file exists → disablesTo delete your install ID: rm ~/.tokencast/install_id.
Data is sent to PostHog (US region). A random UUID is generated locally as your install ID — it contains no personal information. See the wiki for full details.
| Flag | Default | Description |
|---|---|---|
--calibration-dir PATH | ~/.tokencast/calibration | Where calibration data is stored |
--project-dir PATH | None | Project root for file measurement |
--no-telemetry | Off | Disable anonymous usage telemetry (on by default) |
--version | Print version and exit |
The Claude Code plugin (recommended) delivers everything in one command. Use this only if you prefer the SKILL.md workflow without the plugin system.
If you use Claude Code and prefer the skill-based (SKILL.md) workflow, you can install tokencast as a Claude Code skill instead:
# Clone the repo (anywhere — it doesn't need to live inside your project)
git clone https://github.com/krulewis/tokencast.git
# Install into your project (quote paths with spaces)
bash tokencast/scripts/install-hooks.sh "/path/to/your-project"
Paths with spaces: Always wrap the project path in quotes. Without them the install script will fail on paths like
/Volumes/Macintosh HD2/....
This does three things:
<project>/.claude/skills/tokencast/Stop hook for auto-learning at session endPostToolUse hook to nudge estimation after planning agentsThe SKILL.md workflow is Claude Code-specific. The MCP server works in any MCP-compatible client and is the recommended path for new users.
(K+1)/2, and cache ratesExample output:
## tokencast estimate
Change: size=M, files=5, complexity=medium
Calibration: 1.12x from 8 prior runs
| Step | Model | Optimistic | Expected | Pessimistic |
|-----------------------|--------|------------|----------|-------------|
| Research Agent | Sonnet | $0.60 | $1.17 | $4.47 |
| Architect Agent | Opus | $0.67 | $1.18 | $3.97 |
| ... | ... | ... | ... | ... |
| TOTAL | | $3.37 | $6.26 | $22.64 |
| Band | Cache Hit | Multiplier | Meaning |
|---|---|---|---|
| Optimistic | 60% | 0.6x | Best case — focused agent work |
| Expected | 50% | 1.0x | Typical run |
| Pessimistic | 30% | 3.0x | With rework loops, debugging, retries |
Calibration is fully automatic once you report actuals:
Calibration data lives in ~/.tokencast/calibration/ (gitignored, local to each user).
from tokencast import estimate_cost, report_session, report_step_cost
from tokencast import get_calibration_status, get_cost_history
# Estimate before running a task
result = estimate_cost(
{"size": "M", "files": 5, "complexity": "medium"},
calibration_dir="./calibration",
)
# Report actuals at session end
report_session({"actual_cost": 4.20}, calibration_dir="./calibration")
# Check calibration health
status = get_calibration_status({}, calibration_dir="./calibration")
# Browse history
history = get_cost_history({"window": "30d"}, calibration_dir="./calibration")
# Report a single step's cost
report_step_cost(
{"step_name": "Research Agent", "cost": 0.85},
calibration_dir="./calibration",
)
In Claude Code with SKILL.md installed, you can invoke explicitly:
/tokencast size=L files=12 complexity=high
/tokencast steps=implement,test,qa
/tokencast review_cycles=3
/tokencast review_cycles=0
SKILL.md — Skill definition (auto-trigger, algorithm)
references/pricing.md — Model prices, cache rates, step→model map
references/heuristics.md — Token budgets, pipeline decompositions, multipliers
references/examples.md — Worked examples with arithmetic
references/calibration-algorithm.md — Detailed calibration algorithm reference
docs/ide-configs/ — Per-IDE MCP config examples
src/tokencast/ — Core estimation engine (Python package)
src/tokencast_mcp/ — MCP server (Python package)
scripts/
install-hooks.sh — One-time project setup (skill mode)
disable.sh — Remove from project (skill mode)
tokencast-learn.sh — Stop hook: auto-captures actuals (skill mode)
tokencast-track.sh — PostToolUse hook: nudges estimation after plans
sum-session-tokens.py — Parses session JSONL for actual costs
update-factors.py — Computes calibration factors from history
calibration/ — Per-user local data (gitignored)
history.jsonl — Estimate vs actual records
factors.json — Learned correction factors
active-estimate.json — Transient marker for current estimate
references/heuristics.md)last_updated in references/pricing.mdMIT
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent