Mcp Screenshot

2STDIOregistry active

Summary

Gives Claude the ability to capture screenshots across Linux, macOS, and Windows using native APIs. Exposes tools to grab full screens, specific windows by ID or title pattern, and custom regions with pixel coordinates. The standout feature is OCR-based PII masking that redacts emails, phone numbers, and credit cards before sending images to the AI. Built for visual debugging workflows where you want Claude to analyze actual UI layouts, create documentation with real screenshots, or audit accessibility issues like contrast ratios and spacing. Includes security controls like path validation, rate limiting, and audit logging. Ships with multi-monitor support and configurable image formats with quality settings.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

📸 MCP ACS Screenshot Server

Give AI agents visual superpowers to see, analyze, and document your applications like senior UX designers.

This enterprise-grade MCP server transforms AI from code-only assistants into visual experts capable of UI analysis, accessibility auditing, documentation generation, and responsive design testing.

🔗 Repository

This package is now maintained in its own repository: https://github.com/Digital-Defiance/mcp-screenshot

This repository is part of the AI Capabilitites Suite on GitHub.

🤔 Why Do AI Agents Need Visual Capabilities?

AI agents today are powerful but visually blind:

❌ Can read HTML/CSS but can't see actual layouts
❌ Can suggest UI improvements without seeing the real user experience
❌ Can't detect accessibility issues like poor contrast or spacing
❌ Can't create visual documentation or bug reports
❌ Can't analyze responsive design across different screen sizes

Result: You're stuck manually creating screenshots, documentation, and visual analysis that AI should handle.

🎯 Revolutionary Use Cases

📚 "AI, create professional documentation"

You: "Document the checkout process with screenshots"
AI: *Captures each step, analyzes UI flow*
AI: "Generated complete user guide with 6 annotated screenshots, highlighting key interactions and error states..."

🔍 "AI, audit this page for accessibility"

You: "Check if this form meets WCAG guidelines"
AI: *Captures form, analyzes visual elements*
AI: "Found 4 accessibility issues: contrast ratio 2.1:1 (needs 4.5:1), missing focus indicators, touch targets too small..."

🐛 "AI, create a detailed bug report"

You: "The mobile navigation is broken"
AI: *Captures different viewport sizes*
AI: "Created bug report with before/after screenshots, exact breakpoint where it fails, and CSS fix suggestions..."

🎨 "AI, compare these design variations"

You: "Which hero section design works better?"
AI: *Captures both versions, analyzes visual hierarchy*
AI: "Version B has 28% better visual flow—CTA more prominent, text hierarchy clearer, better use of whitespace..."

📱 "AI, test responsive design"

You: "How does this look on different screen sizes?"
AI: *Captures multiple viewport sizes*
AI: "Layout breaks at 768px—sidebar overlaps content. Here's the media query fix with visual proof..."

✨ What This Changes

Before: AI worked blind, relying on code descriptions

❌ "The button looks wrong" → AI guesses the issue
❌ "Create documentation" → AI writes generic text
❌ "Check accessibility" → AI only reviews code
❌ "Test responsive design" → AI can't see actual breakpoints

After: AI sees and analyzes your actual user interface

✅ Visual debugging - AI identifies exact pixel-level issues
✅ Smart documentation - AI creates guides with real screenshots and annotations
✅ Accessibility audits - AI measures actual contrast ratios and spacing
✅ Responsive testing - AI captures and compares different screen sizes
✅ Design analysis - AI evaluates visual hierarchy and user experience
✅ Professional reports - AI creates detailed visual evidence for bugs and improvements

🚀 Features

Multi-format Support: PNG, JPEG, WebP, BMP with configurable quality
Flexible Capture: Full screen, specific windows, or custom regions
Privacy Protection: PII masking with OCR-based detection for emails, phone numbers, and credit cards
Security Controls: Path validation, rate limiting, audit logging, and configurable policies
Cross-platform: Linux (X11/Wayland), macOS, Windows with native APIs
Multi-monitor Support: Capture from specific displays in multi-monitor setups
Enterprise Security: Window exclusion, audit logging, rate limiting
AI-Optimized: Structured responses perfect for AI agent workflows

Installation

NPM Installation

npm install @ai-capabilities-suite/mcp-screenshot

System Requirements

Linux:

X11: imagemagick package (provides import command)
Wayland: grim package

# Ubuntu/Debian
sudo apt-get install imagemagick grim

# Fedora
sudo dnf install ImageMagick grim

# Arch
sudo pacman -S imagemagick grim

macOS:

Built-in screencapture command (no additional dependencies)
Screen Recording permission required (System Preferences > Security & Privacy > Privacy > Screen Recording)

Windows:

No additional dependencies required

MCP Configuration

Add to your MCP settings file (e.g., ~/.kiro/settings/mcp.json or .kiro/settings/mcp.json):

{
  "mcpServers": {
    "screenshot": {
      "command": "node",
      "args": ["/path/to/mcp-screenshot/dist/cli.js"],
      "env": {
        "SCREENSHOT_ALLOWED_DIRS": "/home/user/screenshots,/tmp",
        "SCREENSHOT_MAX_CAPTURES_PER_MIN": "60",
        "SCREENSHOT_ENABLE_AUDIT_LOG": "true"
      }
    }
  }
}

🛠️ 5 Professional MCP Tools

Purpose-built for AI agents to capture, analyze, and work with visual information:

The server exposes 5 comprehensive MCP tools that enable AI agents to see and understand your applications:

1. screenshot_capture_full

Capture full screen or specific display.

Parameters:

display (string, optional): Display ID to capture (defaults to primary display)
format (string, optional): Image format - png, jpeg, webp, or bmp (default: png)
quality (number, optional): Compression quality 1-100 for lossy formats (default: 90)
savePath (string, optional): File path to save screenshot (returns base64 if not provided)
enablePIIMasking (boolean, optional): Enable PII detection and masking (default: false)

Example:

{
  "name": "screenshot_capture_full",
  "arguments": {
    "format": "png",
    "savePath": "/home/user/screenshots/desktop.png",
    "enablePIIMasking": true
  }
}

Response:

{
  "status": "success",
  "filePath": "/home/user/screenshots/desktop.png",
  "metadata": {
    "width": 1920,
    "height": 1080,
    "format": "png",
    "fileSize": 245678,
    "timestamp": "2024-12-01T10:30:00.000Z",
    "display": {
      "id": "0",
      "name": "Primary Display",
      "resolution": { "width": 1920, "height": 1080 },
      "position": { "x": 0, "y": 0 },
      "isPrimary": true
    },
    "piiMasking": {
      "emailsRedacted": 2,
      "phonesRedacted": 1,
      "creditCardsRedacted": 0,
      "customPatternsRedacted": 0
    }
  }
}

2. screenshot_capture_window

Capture specific application window by ID or title pattern.

Parameters:

windowId (string, optional): Window identifier (use windowId or windowTitle)
windowTitle (string, optional): Window title pattern to match (use windowId or windowTitle)
includeFrame (boolean, optional): Include window frame and title bar (default: false)
format (string, optional): Image format (default: png)
quality (number, optional): Compression quality 1-100 (default: 90)
savePath (string, optional): File path to save screenshot

Example:

{
  "name": "screenshot_capture_window",
  "arguments": {
    "windowTitle": "Chrome",
    "includeFrame": false,
    "format": "jpeg",
    "quality": 85
  }
}

Response:

{
  "status": "success",
  "data": "iVBORw0KGgoAAAANSUhEUgAA...",
  "mimeType": "image/jpeg",
  "metadata": {
    "width": 1280,
    "height": 720,
    "format": "jpeg",
    "fileSize": 89234,
    "timestamp": "2024-12-01T10:31:00.000Z",
    "window": {
      "id": "12345",
      "title": "Google Chrome",
      "processName": "chrome",
      "pid": 5678,
      "bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 }
    }
  }
}

3. screenshot_capture_region

Capture specific rectangular region of the screen.

Parameters:

x (number, required): X coordinate of top-left corner
y (number, required): Y coordinate of top-left corner
width (number, required): Width of region in pixels
height (number, required): Height of region in pixels
format (string, optional): Image format (default: png)
quality (number, optional): Compression quality 1-100 (default: 90)
savePath (string, optional): File path to save screenshot

Example:

{
  "name": "screenshot_capture_region",
  "arguments": {
    "x": 100,
    "y": 100,
    "width": 800,
    "height": 600,
    "format": "png"
  }
}

Response:

{
  "status": "success",
  "data": "iVBORw0KGgoAAAANSUhEUgAA...",
  "mimeType": "image/png",
  "metadata": {
    "width": 800,
    "height": 600,
    "format": "png",
    "fileSize": 123456,
    "timestamp": "2024-12-01T10:32:00.000Z",
    "region": {
      "x": 100,
      "y": 100,
      "width": 800,
      "height": 600
    }
  }
}

4. screenshot_list_displays

List all connected displays with resolution and position information.

Parameters: None

Example:

{
  "name": "screenshot_list_displays",
  "arguments": {}
}

Response:

{
  "status": "success",
  "displays": [
    {
      "id": "0",
      "name": "Primary Display",
      "resolution": { "width": 1920, "height": 1080 },
      "position": { "x": 0, "y": 0 },
      "isPrimary": true
    },
    {
      "id": "1",
      "name": "Secondary Display",
      "resolution": { "width": 1920, "height": 1080 },
      "position": { "x": 1920, "y": 0 },
      "isPrimary": false
    }
  ]
}

5. screenshot_list_windows

List all visible windows with title, process, and position information.

Parameters: None

Example:

{
  "name": "screenshot_list_windows",
  "arguments": {}
}

Response:

{
  "status": "success",
  "windows": [
    {
      "id": "12345",
      "title": "Google Chrome",
      "processName": "chrome",
      "pid": 5678,
      "bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 },
      "isMinimized": false
    },
    {
      "id": "67890",
      "title": "Terminal",
      "processName": "gnome-terminal",
      "pid": 9012,
      "bounds": { "x": 200, "y": 200, "width": 800, "height": 600 },
      "isMinimized": false
    }
  ]
}

Security Configuration

The server enforces security policies to control screenshot operations. Configure via environment variables or security policy file.

Environment Variables

SCREENSHOT_ALLOWED_DIRS: Comma-separated list of allowed directories for saving screenshots
SCREENSHOT_MAX_CAPTURES_PER_MIN: Maximum captures per minute (default: 60)
SCREENSHOT_ENABLE_AUDIT_LOG: Enable audit logging (default: true)
SCREENSHOT_BLOCKED_WINDOWS: Comma-separated list of window title patterns to exclude

Security Policy File

Create a security-policy.json file:

{
  "allowedDirectories": ["/home/user/screenshots", "/tmp/screenshots"],
  "blockedWindowPatterns": [
    ".*Password.*",
    ".*1Password.*",
    ".*LastPass.*",
    ".*Bitwarden.*",
    ".*Authentication.*"
  ],
  "maxCapturesPerMinute": 60,
  "enableAuditLog": true
}

Load the policy when starting the server:

import { MCPScreenshotServer } from "@ai-capabilities-suite/mcp-screenshot";
import * as fs from "fs";

const policy = JSON.parse(fs.readFileSync("security-policy.json", "utf-8"));
const server = new MCPScreenshotServer(policy);
await server.start();

Error Handling

All tools return structured error responses with error codes and remediation suggestions.

Error Codes

Code	Description	Remediation
`PERMISSION_DENIED`	Insufficient permissions to capture	Grant Screen Recording permission (macOS) or check user permissions
`INVALID_PATH`	File path outside allowed directories	Use a path within configured allowed directories
`WINDOW_NOT_FOUND`	Specified window does not exist	Use `screenshot_list_windows` to find available windows
`DISPLAY_NOT_FOUND`	Specified display does not exist	Use `screenshot_list_displays` to find available displays
`UNSUPPORTED_FORMAT`	Requested format not supported	Use png, jpeg, webp, or bmp
`CAPTURE_FAILED`	Screenshot capture failed	Check permissions and try again
`RATE_LIMIT_EXCEEDED`	Too many captures in time window	Wait before making additional requests
`INVALID_REGION`	Invalid region coordinates or dimensions	Ensure coordinates are non-negative and dimensions are positive
`OUT_OF_MEMORY`	Insufficient memory for operation	Reduce capture size or close other applications
`ENCODING_FAILED`	Image encoding failed	Try different format or reduce quality
`FILE_SYSTEM_ERROR`	File system operation failed	Check permissions and disk space

Error Response Format

{
  "status": "error",
  "error": {
    "code": "WINDOW_NOT_FOUND",
    "message": "Window with ID '12345' not found",
    "details": {
      "windowId": "12345"
    },
    "remediation": "Verify the window exists and is visible. Use screenshot_list_windows to see available windows."
  }
}

Troubleshooting

Linux Issues

Problem: import: command not found or grim: command not found

Solution: Install required packages:

# X11
sudo apt-get install imagemagick

# Wayland
sudo apt-get install grim

Problem: Black screen or empty captures

Solution: Check display server environment variables:

echo $DISPLAY  # Should show :0 or similar for X11
echo $WAYLAND_DISPLAY  # Should show wayland-0 or similar for Wayland

macOS Issues

Problem: PERMISSION_DENIED error

Solution: Grant Screen Recording permission:

Open System Preferences > Security & Privacy > Privacy
Select "Screen Recording" from the list
Add your terminal application or Node.js to the allowed list
Restart the application

Problem: Retina display captures are double resolution

Solution: This is expected behavior. Retina displays have 2x pixel density. Use the width and height from metadata to determine actual dimensions.

Windows Issues

Problem: Capture fails with access denied

Solution: Run the application with administrator privileges or check Windows Defender settings.

Problem: Multi-monitor captures show wrong display

Solution: Use screenshot_list_displays to get correct display IDs and positions.

General Issues

Problem: RATE_LIMIT_EXCEEDED error

Solution: The server limits captures to prevent abuse. Wait 60 seconds or adjust maxCapturesPerMinute in security policy.

Problem: INVALID_PATH error when saving

Solution: Ensure the save path is within allowed directories configured in security policy.

Problem: PII masking not working

Solution:

Ensure tesseract.js is properly installed
Check that eng.traineddata language file is available
PII masking requires OCR which may be slow on large images

Problem: Large file sizes

Solution:

Use JPEG format with lower quality (60-80) for smaller files
Use WebP format for best compression
Reduce capture region size if possible

Problem: Out of memory errors

Solution:

Capture smaller regions instead of full screen
Reduce quality settings
Close other applications to free memory
Use streaming for very large captures

Programmatic Usage

TypeScript/JavaScript

import { MCPScreenshotServer } from "@ai-capabilities-suite/mcp-screenshot";

// Create server with custom security policy
const server = new MCPScreenshotServer({
  allowedDirectories: ["/home/user/screenshots"],
  maxCapturesPerMinute: 30,
  enableAuditLog: true,
  blockedWindowPatterns: [".*Password.*"],
});

// Start server
await server.start();

// Server will handle MCP protocol requests via stdio
// Keep process running
process.on("SIGINT", async () => {
  await server.stop();
  process.exit(0);
});

Direct Capture Engine Usage

import { createCaptureEngine } from "@ai-capabilities-suite/mcp-screenshot";

// Create platform-specific capture engine
const engine = createCaptureEngine();

// Capture full screen
const fullScreen = await engine.captureScreen();

// List and capture windows
const windows = await engine.getWindows();
const window = windows.find((w) => w.title.includes("Chrome"));
if (window) {
  const buffer = await engine.captureWindow(window.id, false);
}

// Capture region
const region = await engine.captureRegion(100, 100, 800, 600);

// List displays
const displays = await engine.getDisplays();
console.log(`Found ${displays.length} displays`);

Development

This package is part of the AI Capabilities Suite monorepo.

Build

npm run build

Test

# Run all tests
npm test

# Run specific test suites
npm test -- capture
npm test -- security
npm test -- property

# Run with coverage
npm test -- --coverage

Project Structure

packages/mcp-screenshot/
├── src/
│   ├── capture/          # Platform-specific capture engines
│   ├── processing/       # Image processing and encoding
│   ├── privacy/          # PII detection and masking
│   ├── security/         # Security policy enforcement
│   ├── storage/          # File operations
│   ├── tools/            # MCP tool implementations
│   ├── interfaces/       # TypeScript interfaces
│   ├── types/            # Type definitions
│   ├── errors/           # Error classes
│   ├── server.ts         # MCP server implementation
│   └── cli.ts            # CLI entry point
├── README.md
├── TESTING.md
└── package.json

Contributing

Contributions are welcome! Please ensure:

All tests pass (npm test)
Code follows TypeScript best practices
New features include tests and documentation
Security considerations are addressed

License

MIT

Support

For issues and questions:

GitHub Issues: Create an issue
Documentation: See TESTING.md for testing guide
Security: Report security issues privately to security@example.com

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Registryactive

Package@ai-capabilities-suite/mcp-screenshot

TransportSTDIO

UpdatedDec 5, 2025

View on GitHub

📸 MCP ACS Screenshot Server

Give AI agents visual superpowers to see, analyze, and document your applications like senior UX designers.

This enterprise-grade MCP server transforms AI from code-only assistants into visual experts capable of UI analysis, accessibility auditing, documentation generation, and responsive design testing.

🔗 Repository

This package is now maintained in its own repository: https://github.com/Digital-Defiance/mcp-screenshot

This repository is part of the AI Capabilitites Suite on GitHub.

🤔 Why Do AI Agents Need Visual Capabilities?

AI agents today are powerful but visually blind:

❌ Can read HTML/CSS but can't see actual layouts
❌ Can suggest UI improvements without seeing the real user experience
❌ Can't detect accessibility issues like poor contrast or spacing
❌ Can't create visual documentation or bug reports
❌ Can't analyze responsive design across different screen sizes

Result: You're stuck manually creating screenshots, documentation, and visual analysis that AI should handle.

🎯 Revolutionary Use Cases

📚 "AI, create professional documentation"

You: "Document the checkout process with screenshots"
AI: *Captures each step, analyzes UI flow*
AI: "Generated complete user guide with 6 annotated screenshots, highlighting key interactions and error states..."

🔍 "AI, audit this page for accessibility"

You: "Check if this form meets WCAG guidelines"
AI: *Captures form, analyzes visual elements*
AI: "Found 4 accessibility issues: contrast ratio 2.1:1 (needs 4.5:1), missing focus indicators, touch targets too small..."

🐛 "AI, create a detailed bug report"

You: "The mobile navigation is broken"
AI: *Captures different viewport sizes*
AI: "Created bug report with before/after screenshots, exact breakpoint where it fails, and CSS fix suggestions..."

🎨 "AI, compare these design variations"

You: "Which hero section design works better?"
AI: *Captures both versions, analyzes visual hierarchy*
AI: "Version B has 28% better visual flow—CTA more prominent, text hierarchy clearer, better use of whitespace..."

📱 "AI, test responsive design"

You: "How does this look on different screen sizes?"
AI: *Captures multiple viewport sizes*
AI: "Layout breaks at 768px—sidebar overlaps content. Here's the media query fix with visual proof..."

✨ What This Changes

Before: AI worked blind, relying on code descriptions

❌ "The button looks wrong" → AI guesses the issue
❌ "Create documentation" → AI writes generic text
❌ "Check accessibility" → AI only reviews code
❌ "Test responsive design" → AI can't see actual breakpoints

After: AI sees and analyzes your actual user interface

✅ Visual debugging - AI identifies exact pixel-level issues
✅ Smart documentation - AI creates guides with real screenshots and annotations
✅ Accessibility audits - AI measures actual contrast ratios and spacing
✅ Responsive testing - AI captures and compares different screen sizes
✅ Design analysis - AI evaluates visual hierarchy and user experience
✅ Professional reports - AI creates detailed visual evidence for bugs and improvements

🚀 Features

Multi-format Support: PNG, JPEG, WebP, BMP with configurable quality
Flexible Capture: Full screen, specific windows, or custom regions
Privacy Protection: PII masking with OCR-based detection for emails, phone numbers, and credit cards
Security Controls: Path validation, rate limiting, audit logging, and configurable policies
Cross-platform: Linux (X11/Wayland), macOS, Windows with native APIs
Multi-monitor Support: Capture from specific displays in multi-monitor setups
Enterprise Security: Window exclusion, audit logging, rate limiting
AI-Optimized: Structured responses perfect for AI agent workflows

Installation

NPM Installation

npm install @ai-capabilities-suite/mcp-screenshot

System Requirements

Linux:

X11: imagemagick package (provides import command)
Wayland: grim package

# Ubuntu/Debian
sudo apt-get install imagemagick grim

# Fedora
sudo dnf install ImageMagick grim

# Arch
sudo pacman -S imagemagick grim

macOS:

Built-in screencapture command (no additional dependencies)
Screen Recording permission required (System Preferences > Security & Privacy > Privacy > Screen Recording)

Windows:

No additional dependencies required

MCP Configuration

Add to your MCP settings file (e.g., ~/.kiro/settings/mcp.json or .kiro/settings/mcp.json):

{
  "mcpServers": {
    "screenshot": {
      "command": "node",
      "args": ["/path/to/mcp-screenshot/dist/cli.js"],
      "env": {
        "SCREENSHOT_ALLOWED_DIRS": "/home/user/screenshots,/tmp",
        "SCREENSHOT_MAX_CAPTURES_PER_MIN": "60",
        "SCREENSHOT_ENABLE_AUDIT_LOG": "true"
      }
    }
  }
}

🛠️ 5 Professional MCP Tools

Purpose-built for AI agents to capture, analyze, and work with visual information:

The server exposes 5 comprehensive MCP tools that enable AI agents to see and understand your applications:

1. screenshot_capture_full

Capture full screen or specific display.

Parameters:

display (string, optional): Display ID to capture (defaults to primary display)
format (string, optional): Image format - png, jpeg, webp, or bmp (default: png)
quality (number, optional): Compression quality 1-100 for lossy formats (default: 90)
savePath (string, optional): File path to save screenshot (returns base64 if not provided)
enablePIIMasking (boolean, optional): Enable PII detection and masking (default: false)

Example:

{
  "name": "screenshot_capture_full",
  "arguments": {
    "format": "png",
    "savePath": "/home/user/screenshots/desktop.png",
    "enablePIIMasking": true
  }
}

Response:

{
  "status": "success",
  "filePath": "/home/user/screenshots/desktop.png",
  "metadata": {
    "width": 1920,
    "height": 1080,
    "format": "png",
    "fileSize": 245678,
    "timestamp": "2024-12-01T10:30:00.000Z",
    "display": {
      "id": "0",
      "name": "Primary Display",
      "resolution": { "width": 1920, "height": 1080 },
      "position": { "x": 0, "y": 0 },
      "isPrimary": true
    },
    "piiMasking": {
      "emailsRedacted": 2,
      "phonesRedacted": 1,
      "creditCardsRedacted": 0,
      "customPatternsRedacted": 0
    }
  }
}

2. screenshot_capture_window

Capture specific application window by ID or title pattern.

Parameters:

windowId (string, optional): Window identifier (use windowId or windowTitle)
windowTitle (string, optional): Window title pattern to match (use windowId or windowTitle)
includeFrame (boolean, optional): Include window frame and title bar (default: false)
format (string, optional): Image format (default: png)
quality (number, optional): Compression quality 1-100 (default: 90)
savePath (string, optional): File path to save screenshot

Example:

{
  "name": "screenshot_capture_window",
  "arguments": {
    "windowTitle": "Chrome",
    "includeFrame": false,
    "format": "jpeg",
    "quality": 85
  }
}

Response:

{
  "status": "success",
  "data": "iVBORw0KGgoAAAANSUhEUgAA...",
  "mimeType": "image/jpeg",
  "metadata": {
    "width": 1280,
    "height": 720,
    "format": "jpeg",
    "fileSize": 89234,
    "timestamp": "2024-12-01T10:31:00.000Z",
    "window": {
      "id": "12345",
      "title": "Google Chrome",
      "processName": "chrome",
      "pid": 5678,
      "bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 }
    }
  }
}

3. screenshot_capture_region

Capture specific rectangular region of the screen.

Parameters:

x (number, required): X coordinate of top-left corner
y (number, required): Y coordinate of top-left corner
width (number, required): Width of region in pixels
height (number, required): Height of region in pixels
format (string, optional): Image format (default: png)
quality (number, optional): Compression quality 1-100 (default: 90)
savePath (string, optional): File path to save screenshot

Example:

{
  "name": "screenshot_capture_region",
  "arguments": {
    "x": 100,
    "y": 100,
    "width": 800,
    "height": 600,
    "format": "png"
  }
}

Response:

{
  "status": "success",
  "data": "iVBORw0KGgoAAAANSUhEUgAA...",
  "mimeType": "image/png",
  "metadata": {
    "width": 800,
    "height": 600,
    "format": "png",
    "fileSize": 123456,
    "timestamp": "2024-12-01T10:32:00.000Z",
    "region": {
      "x": 100,
      "y": 100,
      "width": 800,
      "height": 600
    }
  }
}

4. screenshot_list_displays

List all connected displays with resolution and position information.

Parameters: None

Example:

{
  "name": "screenshot_list_displays",
  "arguments": {}
}

Response:

{
  "status": "success",
  "displays": [
    {
      "id": "0",
      "name": "Primary Display",
      "resolution": { "width": 1920, "height": 1080 },
      "position": { "x": 0, "y": 0 },
      "isPrimary": true
    },
    {
      "id": "1",
      "name": "Secondary Display",
      "resolution": { "width": 1920, "height": 1080 },
      "position": { "x": 1920, "y": 0 },
      "isPrimary": false
    }
  ]
}

5. screenshot_list_windows

List all visible windows with title, process, and position information.

Parameters: None

Example:

{
  "name": "screenshot_list_windows",
  "arguments": {}
}

Response:

{
  "status": "success",
  "windows": [
    {
      "id": "12345",
      "title": "Google Chrome",
      "processName": "chrome",
      "pid": 5678,
      "bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 },
      "isMinimized": false
    },
    {
      "id": "67890",
      "title": "Terminal",
      "processName": "gnome-terminal",
      "pid": 9012,
      "bounds": { "x": 200, "y": 200, "width": 800, "height": 600 },
      "isMinimized": false
    }
  ]
}

Security Configuration

The server enforces security policies to control screenshot operations. Configure via environment variables or security policy file.

Environment Variables

SCREENSHOT_ALLOWED_DIRS: Comma-separated list of allowed directories for saving screenshots
SCREENSHOT_MAX_CAPTURES_PER_MIN: Maximum captures per minute (default: 60)
SCREENSHOT_ENABLE_AUDIT_LOG: Enable audit logging (default: true)
SCREENSHOT_BLOCKED_WINDOWS: Comma-separated list of window title patterns to exclude

Security Policy File

Create a security-policy.json file:

{
  "allowedDirectories": ["/home/user/screenshots", "/tmp/screenshots"],
  "blockedWindowPatterns": [
    ".*Password.*",
    ".*1Password.*",
    ".*LastPass.*",
    ".*Bitwarden.*",
    ".*Authentication.*"
  ],
  "maxCapturesPerMinute": 60,
  "enableAuditLog": true
}

Load the policy when starting the server:

import { MCPScreenshotServer } from "@ai-capabilities-suite/mcp-screenshot";
import * as fs from "fs";

const policy = JSON.parse(fs.readFileSync("security-policy.json", "utf-8"));
const server = new MCPScreenshotServer(policy);
await server.start();

Error Handling

All tools return structured error responses with error codes and remediation suggestions.

Error Codes

Code	Description	Remediation
`PERMISSION_DENIED`	Insufficient permissions to capture	Grant Screen Recording permission (macOS) or check user permissions
`INVALID_PATH`	File path outside allowed directories	Use a path within configured allowed directories
`WINDOW_NOT_FOUND`	Specified window does not exist	Use `screenshot_list_windows` to find available windows
`DISPLAY_NOT_FOUND`	Specified display does not exist	Use `screenshot_list_displays` to find available displays
`UNSUPPORTED_FORMAT`	Requested format not supported	Use png, jpeg, webp, or bmp
`CAPTURE_FAILED`	Screenshot capture failed	Check permissions and try again
`RATE_LIMIT_EXCEEDED`	Too many captures in time window	Wait before making additional requests
`INVALID_REGION`	Invalid region coordinates or dimensions	Ensure coordinates are non-negative and dimensions are positive
`OUT_OF_MEMORY`	Insufficient memory for operation	Reduce capture size or close other applications
`ENCODING_FAILED`	Image encoding failed	Try different format or reduce quality
`FILE_SYSTEM_ERROR`	File system operation failed	Check permissions and disk space

Error Response Format

{
  "status": "error",
  "error": {
    "code": "WINDOW_NOT_FOUND",
    "message": "Window with ID '12345' not found",
    "details": {
      "windowId": "12345"
    },
    "remediation": "Verify the window exists and is visible. Use screenshot_list_windows to see available windows."
  }
}

Troubleshooting

Linux Issues

Problem: import: command not found or grim: command not found

Solution: Install required packages:

# X11
sudo apt-get install imagemagick

# Wayland
sudo apt-get install grim

Problem: Black screen or empty captures

Solution: Check display server environment variables:

echo $DISPLAY  # Should show :0 or similar for X11
echo $WAYLAND_DISPLAY  # Should show wayland-0 or similar for Wayland

macOS Issues

Problem: PERMISSION_DENIED error

Solution: Grant Screen Recording permission:

Open System Preferences > Security & Privacy > Privacy
Select "Screen Recording" from the list
Add your terminal application or Node.js to the allowed list
Restart the application

Problem: Retina display captures are double resolution

Solution: This is expected behavior. Retina displays have 2x pixel density. Use the width and height from metadata to determine actual dimensions.

Windows Issues

Problem: Capture fails with access denied

Solution: Run the application with administrator privileges or check Windows Defender settings.

Problem: Multi-monitor captures show wrong display

Solution: Use screenshot_list_displays to get correct display IDs and positions.

General Issues

Problem: RATE_LIMIT_EXCEEDED error

Solution: The server limits captures to prevent abuse. Wait 60 seconds or adjust maxCapturesPerMinute in security policy.

Problem: INVALID_PATH error when saving

Solution: Ensure the save path is within allowed directories configured in security policy.

Problem: PII masking not working

Solution:

Ensure tesseract.js is properly installed
Check that eng.traineddata language file is available
PII masking requires OCR which may be slow on large images

Problem: Large file sizes

Solution:

Use JPEG format with lower quality (60-80) for smaller files
Use WebP format for best compression
Reduce capture region size if possible

Problem: Out of memory errors

Solution:

Capture smaller regions instead of full screen
Reduce quality settings
Close other applications to free memory
Use streaming for very large captures

Programmatic Usage

TypeScript/JavaScript

import { MCPScreenshotServer } from "@ai-capabilities-suite/mcp-screenshot";

// Create server with custom security policy
const server = new MCPScreenshotServer({
  allowedDirectories: ["/home/user/screenshots"],
  maxCapturesPerMinute: 30,
  enableAuditLog: true,
  blockedWindowPatterns: [".*Password.*"],
});

// Start server
await server.start();

// Server will handle MCP protocol requests via stdio
// Keep process running
process.on("SIGINT", async () => {
  await server.stop();
  process.exit(0);
});

Direct Capture Engine Usage

import { createCaptureEngine } from "@ai-capabilities-suite/mcp-screenshot";

// Create platform-specific capture engine
const engine = createCaptureEngine();

// Capture full screen
const fullScreen = await engine.captureScreen();

// List and capture windows
const windows = await engine.getWindows();
const window = windows.find((w) => w.title.includes("Chrome"));
if (window) {
  const buffer = await engine.captureWindow(window.id, false);
}

// Capture region
const region = await engine.captureRegion(100, 100, 800, 600);

// List displays
const displays = await engine.getDisplays();
console.log(`Found ${displays.length} displays`);

Development

This package is part of the AI Capabilities Suite monorepo.

Build

npm run build

Test

# Run all tests
npm test

# Run specific test suites
npm test -- capture
npm test -- security
npm test -- property

# Run with coverage
npm test -- --coverage

Project Structure

packages/mcp-screenshot/
├── src/
│   ├── capture/          # Platform-specific capture engines
│   ├── processing/       # Image processing and encoding
│   ├── privacy/          # PII detection and masking
│   ├── security/         # Security policy enforcement
│   ├── storage/          # File operations
│   ├── tools/            # MCP tool implementations
│   ├── interfaces/       # TypeScript interfaces
│   ├── types/            # Type definitions
│   ├── errors/           # Error classes
│   ├── server.ts         # MCP server implementation
│   └── cli.ts            # CLI entry point
├── README.md
├── TESTING.md
└── package.json

Contributing

Contributions are welcome! Please ensure:

All tests pass (npm test)
Code follows TypeScript best practices
New features include tests and documentation
Security considerations are addressed

License

MIT

Support

For issues and questions:

GitHub Issues: Create an issue
Documentation: See TESTING.md for testing guide
Security: Report security issues privately to security@example.com