Gives Claude vision into terminal UIs by running CLI apps in virtual terminals and capturing PNG screenshots. Uses node-pty and xterm's headless renderer to spin up sessions, then exposes tools to launch processes, send keystrokes, wait for text or idle states, and grab screenshots of the full terminal or specific regions. Includes diffing support to highlight what changed between captures, GIF recording for animated sequences, and scrollback access. Useful when you're debugging TUIs, interactive wizards, or anything with ANSI color and layout that's hard to describe in text. Sessions auto-close after idle timeout, or you can clean them up manually with close or close_all.
MCP server that lets AI agents see and interact with terminal/CLI applications through virtual terminals and PNG screenshots.
Built for Claude Code and any MCP-compatible agent.
Some things are easier to show than describe. When debugging a TUI app, an interactive CLI wizard, or anything with visual terminal output, can-see lets the agent see exactly what you see — colors, layout, cursor position, and all.
npm install -g can-see
can-see depends on node-canvas (Cairo) and node-pty, which require native compilation. Most systems will need:
npm install --global windows-build-tools or install from Visual Studio Installerxcode-select --installsudo apt install build-essential libcairo2-dev libjpeg-dev libpango1.0-dev libgif-dev librsvg2-devAdd to your project's .mcp.json:
{
"mcpServers": {
"can-see": {
"command": "npx",
"args": ["-y", "can-see"]
}
}
}
Or if installed globally:
{
"mcpServers": {
"can-see": {
"command": "can-see"
}
}
}
can-see uses stdio transport. Point your MCP client at the can-see binary or npx -y can-see.
| Tool | Description |
|---|---|
launch | Start a CLI app in a virtual terminal. Returns a sessionId. Accepts optional env to set environment variables. |
screenshot | Capture the terminal as a PNG image. |
screenshot_region | Capture a specific rectangular area of the terminal. |
screenshot_text_region | Find text in the viewport and capture the surrounding area as a PNG. |
capture_baseline | Snapshot terminal state for later diff comparison. |
diff_screenshot | Compare current state against baseline with highlighted changes. |
get_cell_info | Query character, colors, and attributes at specific cell(s). Supports compact mode for reduced output. |
read_text | Read the terminal buffer as plain text. |
read_scrollback | Read text that scrolled above the visible viewport. |
wait_for_text | Wait until specific text appears in the terminal buffer. |
wait_for_idle | Wait until terminal output has been stable for a given duration. Supports stableMs for content-comparison mode (for apps with timers/spinners), excludeRows to ignore specific rows, and excludePattern (regex) for dynamic row exclusion. |
wait_for_color | Wait until a specific color appears at a position. |
wait_for_exit | Wait until the process exits and return its exit code and signal. |
start_recording | Begin capturing frames for an animated GIF. |
stop_recording | Stop recording and return the animated GIF with metadata (frameCount, durationMs). Auto-trims frames or saves to file if GIF exceeds inline size limit. |
send_keys | Send keystrokes (e.g., Enter, Ctrl+C, ['Down', 'Down', 'Enter']). |
send_text | Type a string of text into the app. |
get_process_status | Get process status — distinguish "app is idle" from "app has exited". Returns PID, running state, exit code. |
list_sessions | List all active terminal sessions. |
close | Kill the app and clean up. Always close when done. |
close_all | Kill all active sessions at once. Useful for cleanup between test runs. |
Enter, Tab, Escape, Backspace, Space, Up, Down, Left, Right, Home, End, Delete, PageUp, PageDown, Ctrl+A through Ctrl+Z.
| Variable | Default | Description |
|---|---|---|
DEFAULT_COLS | 120 | Terminal width in columns |
DEFAULT_ROWS | 30 | Terminal height in rows |
IDLE_TIMEOUT_MS | 300000 | Auto-close idle sessions after this many ms (5 min) |
From an MCP-connected agent:
Agent: I'll launch your app to see what's happening.
→ launch("node", ["app.js"]) → sessionId: "abc-123"
Agent: Let me wait for the app to start.
→ wait_for_text("abc-123", "Ready") → Found "Ready" after 1200ms
Agent: Let me read the current output.
→ read_text("abc-123") → "Welcome to MyApp\nReady\n> "
Agent: I can see the prompt. Let me select option 2.
→ send_keys("abc-123", ["Down", "Enter"])
Agent: Waiting for the screen to settle.
→ wait_for_idle("abc-123") → Terminal idle for 520ms
Agent: Let me check the result.
→ screenshot("abc-123") → [PNG image showing result]
Agent: Done, closing the session.
→ close("abc-123")
New tools:
wait_for_exit — wait for process exit, get exit code and signalclose_all — kill all active sessions at onceget_process_status — distinguish "app is idle" from "app has exited"screenshot_text_region — find text in viewport, capture surrounding area as PNGEnhancements:
launch accepts env parameter for custom environment variableswait_for_idle supports excludePattern (regex) for dynamic row exclusion in stableMs modestop_recording returns frameCount and durationMs metadata alongside GIFget_cell_info supports compact option for reduced output ({char, fg, bold} only)Bug fixes:
wait_for_text and wait_for_color race condition where text/color present in the final buffer was missed when the process exited simultaneouslystableMs and idleMs are passed to wait_for_idleMIT
DEFAULT_COLSTerminal width in columns (default: 120)
DEFAULT_ROWSTerminal height in rows (default: 30)
IDLE_TIMEOUT_MSAuto-close idle sessions after this many milliseconds (default: 300000)
ray0907/git-mcp-server
cyanheads/git-mcp-server
io.github.b1ff/atlassian-dc-mcp-bitbucket
io.github.b1ff/atlassian-dc-mcp-jira
com.mcparmory/atlassian-jira
sirlordt/vscode-terminal-mcp