Pixel Surgeon

4authSTDIOregistry active

Summary

Connects Claude to Gemini, OpenAI, and Grok image generation APIs, plus Veo 3 for video. You get standard text-to-image and editing tools, but the interesting part is region repair: it can rerender specific tiles or cropped areas of an image to fix garbled text or corrupted regions, then composite them back with histogram matching. Includes style presets like neo-brutalist and retro-futurist infographic modes. Images save to a local directory with a browser viewer for managing output. Useful when you need to iterate on AI-generated images that have text rendering problems or need surgical fixes without regenerating the whole thing.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

pixel-surgeon-mcp architecture

pixel-surgeon-mcp

MCP server for AI image & video generation, editing, and transplant-grade region repair
Powered by Gemini 3.1 Flash Image, OpenAI GPT Image 2, Grok Imagine, and Veo 3

An MCP server that gives Claude (or any MCP client) the ability to generate images, edit them, fix garbled text, and create videos — all through natural language.

How it works

pixel-surgeon-mcp is a multi-provider image generation server. You can use any combination of providers and switch between them per-request:

Gemini (Google) — balanced

Google's image generation pipeline uses a two-stage approach: Gemini 3.1 Pro reasons about your prompt, then Gemini 3.1 Flash Image renders the pixels. Supports 9 aspect ratios at 512/1K/2K/4K resolution. Best price/performance ratio, with a free tier available.

OpenAI GPT Image 2 — highest quality

OpenAI's latest image model with dramatically improved text rendering and visual fidelity. Supports flexible resolutions — pixel-surgeon maps your chosen size and aspect ratio to the optimal pixel dimensions automatically. Quality levels: medium (fast) and high (print-ready). Excellent for infographics, diagrams, and text-heavy images where other models struggle. Slower and more expensive.

Grok Imagine (xAI) — fastest

xAI's Aurora-powered image model. Fastest generation speed and lowest cost. Supports 7 aspect ratios at fixed resolutions (~1K). Good for rapid prototyping and iteration.

Veo 3 (Video)

For video, the server calls Veo 3 with async polling — generating both video and ambient audio. Supports 16:9 and 9:16 at 5s or 8s duration.

Region repair

AI image models struggle with text-heavy images. The fix tools solve this by sending smaller regions to the provider, then stitching the results back with histogram-matched compositing for seamless blending.

Tools

Tool	Description
`generate_image`	Text-to-image generation (single image)
`generate_images`	Parallel batch generation (1-8 images)
`generate_video`	Text-to-video via Veo 3 with audio (5s or 8s)
`edit_image`	Edit an existing image with natural language instructions
`fix_image`	Grid-based tile repair for garbled text (2x2, 3x3, etc.)
`fix_region`	Targeted region repair with automatic aspect ratio snapping
`interactive_fix`	Browser-based crop UI with multi-shot selection
`list_images`	List generated images and videos
`save_image`	Import an external image into the workspace
`remove_background`	Remove image background (alpha channel transparency)

Models

Model	Provider	Resolution	Best for
`gemini-3.1-flash-image`	Google	512 / 1K / 2K / 4K	General image generation, photo-realistic scenes
`gemini-2.5-flash-image`	Google	1K max (free tier)	Quick drafts, prototyping
`gpt-image-2`	OpenAI	Flexible (up to 4K)	Text-heavy images, infographics, diagrams, typography
`gpt-image-1`	OpenAI	3 fixed sizes	Legacy support
`grok-imagine`	xAI	Fixed (~1K per ratio)	Fast iteration, lowest cost

Force a specific model per-call via the model tool parameter, or set DEFAULT_IMAGE_MODEL env var.

Gemini automatic fallback

If a Gemini generation call fails with a billing / prepay error, the server automatically retries on the free-tier gemini-2.5-flash-image model. The viewer shows a yellow banner when this happens. Free-tier limits: 1K max resolution, 10 RPM, 500 RPD.

Style presets

All generation and edit tools support an optional style parameter:

`neo-brutalist`

Magazine editorial, bold typography, halftone textures. Cream, black, and terracotta palette.

`duval-software-infographic`

Duval Software's signature retro-futurist infographic style. 1960s Space Age meets 1980s arcade. Cathode blue, amber, and salmon palette. Great for diagrams and system overviews.

`fractal-arcade`

Dithered fractals, Sierpinski patterns, low-poly. CRT retro, Amiga/EGA palette.

$fractal-arcade style example$

`clean-tech-infographic`

Technical diagrams, system flows, data pipelines. Dark navy, cyan, and electric blue.

Setup

Get your API key(s)

You need at least one provider API key. You can use any combination for maximum flexibility.

Google (Gemini + Veo 3)

Go to Google AI Studio
Sign in with your Google account
Click Create API Key and copy it

Prepayment required. Gemini 3.1 Flash Image and Veo 3 require billing and prepaid credits. The free-tier fallback (2.5 Flash) has limited resolution and rate limits. See Google AI pricing.

OpenAI (GPT Image 2)

Go to OpenAI API
Sign in or create an account
Click Create new secret key and copy it
Ensure you have API credits — image generation is billed per request

GPT Image 2 excels at text rendering, infographics, and diagrams. If you primarily need text-heavy images, this is the provider to use.

xAI (Grok Imagine)

Go to xAI Console
Sign in or create an account
Create an API key and copy it

Grok Imagine is the fastest and cheapest provider. Great for rapid iteration and prototyping. Fixed output resolutions (~1K) with no size control.

Quick start (npx)

No install needed — run directly with npx. Pass whichever API keys you have:

npx pixel-surgeon-mcp

Claude Code CLI

claude mcp add pixel-surgeon \
  -e GOOGLE_API_KEY=your-google-key \
  -e OPENAI_API_KEY=your-openai-key \
  -e XAI_API_KEY=your-xai-key \
  -- npx pixel-surgeon-mcp

Claude Desktop / MCP client config

{
  "mcpServers": {
    "pixel-surgeon": {
      "command": "npx",
      "args": ["pixel-surgeon-mcp"],
      "env": {
        "GOOGLE_API_KEY": "your-google-api-key",
        "OPENAI_API_KEY": "your-openai-api-key",
        "XAI_API_KEY": "your-xai-api-key"
      }
    }
  }
}

Install from source

If you prefer a local clone:

git clone https://github.com/j-east/pixel-surgeon-mcp.git
cd pixel-surgeon-mcp
npm install
npm run build

Image output

Generated images are saved to ~/Pictures/pixel-surgeon/. A local browser viewer auto-launches on first use for full-resolution previews with model selection, respin controls, and search.

Development

npm run dev    # tsx watch mode
npm run build  # compile TypeScript
npm run start  # run compiled server

Key implementation details

Aspect ratio snapping — crops are adjusted to the nearest Gemini-supported ratio while preserving center point
Histogram matching — per-channel RGB normalization ensures composited regions blend seamlessly
Human-in-the-loop — interactive_fix opens a browser crop UI, blocks via Promise until the user submits, fires parallel Gemini calls, and lets the user pick the best result
MCP size limits — full-resolution images are saved to disk; downsampled versions (< 950KB) are returned in MCP responses

Contributing

PRs are welcome! We're especially looking for:

New style presets

Add entries to the STYLE_PRESETS object in src/index.ts. Your PR should include:

The preset definition (name, prompt prefix, default aspect ratio)
2-3 example images generated with the preset (drop them in your PR description)
A short description of the visual style for the README table

Model adapters

The server currently supports Gemini, OpenAI, Grok Imagine, and Veo 3. We'd love adapters for other image/video generation APIs — Stable Diffusion, Flux, etc. If you're interested in adding one, open an issue first so we can align on the interface.

Built by Duval Software

pixel-surgeon-mcp is maintained by John Evans, part of the engineering team at Duval Software — a software engineering firm in Jacksonville Beach, FL building AI-powered tools and custom integrations. If you need MCP servers, AI pipelines, or production tooling built, get in touch.

License

MIT

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Configuration

GOOGLE_API_KEYsecret

Google AI API key (enables Gemini image/video generation)

OPENAI_API_KEYsecret

OpenAI API key (enables GPT Image 1/2 generation)

XAI_API_KEYsecret

xAI API key (enables Grok Imagine generation)

pixel-surgeon-mcp

MCP server for AI image & video generation, editing, and transplant-grade region repair
Powered by Gemini 3.1 Flash Image, OpenAI GPT Image 2, Grok Imagine, and Veo 3

An MCP server that gives Claude (or any MCP client) the ability to generate images, edit them, fix garbled text, and create videos — all through natural language.

How it works

pixel-surgeon-mcp is a multi-provider image generation server. You can use any combination of providers and switch between them per-request:

Gemini (Google) — balanced

OpenAI GPT Image 2 — highest quality

Grok Imagine (xAI) — fastest

xAI's Aurora-powered image model. Fastest generation speed and lowest cost. Supports 7 aspect ratios at fixed resolutions (~1K). Good for rapid prototyping and iteration.

Veo 3 (Video)

For video, the server calls Veo 3 with async polling — generating both video and ambient audio. Supports 16:9 and 9:16 at 5s or 8s duration.

Region repair

Tools

Tool	Description
`generate_image`	Text-to-image generation (single image)
`generate_images`	Parallel batch generation (1-8 images)
`generate_video`	Text-to-video via Veo 3 with audio (5s or 8s)
`edit_image`	Edit an existing image with natural language instructions
`fix_image`	Grid-based tile repair for garbled text (2x2, 3x3, etc.)
`fix_region`	Targeted region repair with automatic aspect ratio snapping
`interactive_fix`	Browser-based crop UI with multi-shot selection
`list_images`	List generated images and videos
`save_image`	Import an external image into the workspace
`remove_background`	Remove image background (alpha channel transparency)

Models

Model	Provider	Resolution	Best for
`gemini-3.1-flash-image`	Google	512 / 1K / 2K / 4K	General image generation, photo-realistic scenes
`gemini-2.5-flash-image`	Google	1K max (free tier)	Quick drafts, prototyping
`gpt-image-2`	OpenAI	Flexible (up to 4K)	Text-heavy images, infographics, diagrams, typography
`gpt-image-1`	OpenAI	3 fixed sizes	Legacy support
`grok-imagine`	xAI	Fixed (~1K per ratio)	Fast iteration, lowest cost

Force a specific model per-call via the model tool parameter, or set DEFAULT_IMAGE_MODEL env var.

Gemini automatic fallback

Style presets

All generation and edit tools support an optional style parameter:

`neo-brutalist`

Magazine editorial, bold typography, halftone textures. Cream, black, and terracotta palette.

`duval-software-infographic`

Duval Software's signature retro-futurist infographic style. 1960s Space Age meets 1980s arcade. Cathode blue, amber, and salmon palette. Great for diagrams and system overviews.

`fractal-arcade`

Dithered fractals, Sierpinski patterns, low-poly. CRT retro, Amiga/EGA palette.

$fractal-arcade style example$

`clean-tech-infographic`

Technical diagrams, system flows, data pipelines. Dark navy, cyan, and electric blue.

Setup

Get your API key(s)

You need at least one provider API key. You can use any combination for maximum flexibility.

Google (Gemini + Veo 3)

Go to Google AI Studio
Sign in with your Google account
Click Create API Key and copy it

Prepayment required. Gemini 3.1 Flash Image and Veo 3 require billing and prepaid credits. The free-tier fallback (2.5 Flash) has limited resolution and rate limits. See Google AI pricing.

OpenAI (GPT Image 2)

Go to OpenAI API
Sign in or create an account
Click Create new secret key and copy it
Ensure you have API credits — image generation is billed per request

GPT Image 2 excels at text rendering, infographics, and diagrams. If you primarily need text-heavy images, this is the provider to use.

xAI (Grok Imagine)

Go to xAI Console
Sign in or create an account
Create an API key and copy it

Grok Imagine is the fastest and cheapest provider. Great for rapid iteration and prototyping. Fixed output resolutions (~1K) with no size control.

Quick start (npx)

No install needed — run directly with npx. Pass whichever API keys you have:

npx pixel-surgeon-mcp

Claude Code CLI

claude mcp add pixel-surgeon \
  -e GOOGLE_API_KEY=your-google-key \
  -e OPENAI_API_KEY=your-openai-key \
  -e XAI_API_KEY=your-xai-key \
  -- npx pixel-surgeon-mcp

Claude Desktop / MCP client config

{
  "mcpServers": {
    "pixel-surgeon": {
      "command": "npx",
      "args": ["pixel-surgeon-mcp"],
      "env": {
        "GOOGLE_API_KEY": "your-google-api-key",
        "OPENAI_API_KEY": "your-openai-api-key",
        "XAI_API_KEY": "your-xai-api-key"
      }
    }
  }
}

Install from source

If you prefer a local clone:

git clone https://github.com/j-east/pixel-surgeon-mcp.git
cd pixel-surgeon-mcp
npm install
npm run build

Image output

Generated images are saved to ~/Pictures/pixel-surgeon/. A local browser viewer auto-launches on first use for full-resolution previews with model selection, respin controls, and search.

Development

npm run dev    # tsx watch mode
npm run build  # compile TypeScript
npm run start  # run compiled server

Key implementation details

Aspect ratio snapping — crops are adjusted to the nearest Gemini-supported ratio while preserving center point
Histogram matching — per-channel RGB normalization ensures composited regions blend seamlessly
Human-in-the-loop — interactive_fix opens a browser crop UI, blocks via Promise until the user submits, fires parallel Gemini calls, and lets the user pick the best result
MCP size limits — full-resolution images are saved to disk; downsampled versions (< 950KB) are returned in MCP responses

Contributing

PRs are welcome! We're especially looking for:

New style presets

Add entries to the STYLE_PRESETS object in src/index.ts. Your PR should include:

Pixel Surgeon

pixel-surgeon-mcp

How it works

Gemini (Google) — balanced

OpenAI GPT Image 2 — highest quality

Grok Imagine (xAI) — fastest

Veo 3 (Video)

Region repair

Tools

Models

Gemini automatic fallback

Style presets

neo-brutalist

duval-software-infographic

fractal-arcade

clean-tech-infographic

Setup

Get your API key(s)

Google (Gemini + Veo 3)

OpenAI (GPT Image 2)

xAI (Grok Imagine)

Quick start (npx)

Claude Code CLI

Claude Desktop / MCP client config

Install from source

Image output

Development

Key implementation details

Contributing

New style presets

Model adapters

Built by Duval Software

License

Configuration

Pixel Surgeon

pixel-surgeon-mcp

How it works

Gemini (Google) — balanced

OpenAI GPT Image 2 — highest quality

Grok Imagine (xAI) — fastest

Veo 3 (Video)

Region repair

Tools

Models

Gemini automatic fallback

Style presets

neo-brutalist

duval-software-infographic

fractal-arcade

clean-tech-infographic

Setup

Get your API key(s)

Google (Gemini + Veo 3)

OpenAI (GPT Image 2)

xAI (Grok Imagine)

Quick start (npx)

Claude Code CLI

Claude Desktop / MCP client config

Install from source

Image output

Development

Key implementation details

Contributing

New style presets

Model adapters

Built by Duval Software

License

Configuration

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers

`neo-brutalist`

`duval-software-infographic`

`fractal-arcade`

`clean-tech-infographic`

`neo-brutalist`

`duval-software-infographic`

`fractal-arcade`

`clean-tech-infographic`