Plugs Gemini Pro into Claude Desktop for AI-powered PDF manipulation. You get three main operations: open a PDF to start a session, edit existing pages with natural language prompts, and generate new pages that match the visual style of your document. Under the hood it uses poppler for rendering, Tesseract OCR to keep text layers searchable, and stores all versions locally in an output directory. The undo system lets you roll back changes without fear. You'll need a Gemini API key and system dependencies installed, but once configured it runs entirely on your machine. Built on the original NanoPDF project, adapted for MCP workflows.
NanoPDF MCP Server is a Model Context Protocol (MCP) implementation that brings AI-powered PDF editing and generation capabilities directly to Claude Desktop. It enables users to modify existing PDF pages or generate new ones using Google's Gemini 3 Pro model, all within a privacy-focused local environment.
The server requires poppler (for PDF rendering) and tesseract (for OCR) installed on your system.
brew install poppler tesseract
sudo apt-get install poppler-utils tesseract-ocr
A Google Gemini API key is required to power the AI features.
GEMINI_API_KEY: Your Google AI Studio API key.This project uses uv for lightning-fast Python package management.
# Navigate to the server directory
cd mcp-server
# Install dependencies
uv sync
Add the NanoPDF server to your Claude Desktop configuration:
Path: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"nanopdf": {
"command": "/Users/username/.local/bin/uv", # replace with your uv path
"args": [
"run",
"--directory",
"/Users/username/nano-pdf-mcp/mcp-server", # replace with your mcp-server path
"server.py"
]
}
}
}
[!IMPORTANT] Change
usernameto your actual macOS username. Ensure the--directorypath is the absolute path to yourmcp-serverfolder.
For detailed tool descriptions and workflow examples, please refer to the Usage Guide.
open_pdf(pdf_path="/path/to/document.pdf")edit_pdf_page(session_id="...", page_number=1, prompt="...")mcp-server/: Core MCP server implementation.
server.py: Main entry point and tool definitions.mcp_pdf_utils.py: PDF processing, OCR, and rendering logic.mcp_ai_utils.py: Gemini Pro Vision API integration.history_manager.py: Local session and version history tracker.output/: Local storage for sessions and previews.LICENSE: Project license (MIT).USAGE.md: Detailed usage instructions and examples.Based on the original NanoPDF project. Adapted for the Model Context Protocol.
MIT License
GEMINI_API_KEY*secretGoogle AI Studio API key for Gemini Pro
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent