Gives Claude five document parsing tools: parse_pdf for extracting text and tables from PDFs with layout preservation, parse_image_text for OCR with confidence scoring across 100+ languages, html_to_markdown for clean conversions, extract_tables for pulling structured data from any format, and summarize_document with configurable detail levels. Built by Agenson Horrowitz with a freemium model starting at 500 operations per month. Useful when you're building agents that need to ingest reports, invoices, screenshots, or web pages and want structured output without managing separate parsing libraries. All responses come back as JSON with metadata like processing time and confidence scores.
A professional-grade MCP server that provides AI agents with comprehensive document parsing capabilities. Built specifically for the agent economy by Agenson Horrowitz.
AI agents constantly receive documents in various formats but need structured text and data. Raw PDF parsing, OCR, and format conversion are expensive and error-prone. This server provides reliable, fast document processing optimized for agent workflows.
Add to your claude_desktop_config.json:
{
"mcpServers": {
"document-parser": {
"command": "npx",
"args": ["@agenson-horrowitz/document-parser-mcp"]
}
}
}
Add to your Cline MCP settings:
{
"mcpServers": {
"document-parser": {
"command": "npx",
"args": ["@agenson-horrowitz/document-parser-mcp"]
}
}
}
npm install -g @agenson-horrowitz/document-parser-mcp
Deploy instantly on MCPize with built-in billing and authentication.
parse_pdfExtract comprehensive information from PDF documents.
Perfect for: Reports, invoices, contracts, research papers, forms
Features:
Example:
{
"file_path": "/path/to/document.pdf",
"options": {
"extract_tables": true,
"preserve_layout": true,
"include_metadata": true,
"page_range": "1-10"
}
}
parse_image_textPerform high-quality OCR on images with confidence scoring.
Perfect for: Screenshots, scanned documents, photos of text, receipts
Features:
Example:
{
"image_path": "/path/to/screenshot.png",
"options": {
"language": "eng",
"confidence_threshold": 70,
"preprocess": true,
"extract_words": true
}
}
html_to_markdownConvert HTML documents to clean, structured markdown.
Perfect for: Web pages, HTML emails, documentation, blog posts
Features:
Example:
{
"html_content": "<html>...</html>",
"options": {
"preserve_tables": true,
"preserve_links": true,
"remove_scripts": true,
"clean_whitespace": true
}
}
extract_tablesExtract structured table data from any document format.
Perfect for: Pricing lists, data reports, spreadsheets, forms
Features:
Example:
{
"file_path": "/path/to/report.pdf",
"options": {
"detect_headers": true,
"clean_cells": true,
"min_columns": 2,
"include_context": true
}
}
summarize_documentGenerate intelligent summaries of any document type.
Perfect for: Long reports, research papers, articles, documentation
Features:
Example:
{
"file_path": "/path/to/research.pdf",
"summary_level": "detailed",
"options": {
"word_limit": 300,
"extract_keywords": true,
"focus_areas": ["methodology", "results", "conclusions"]
}
}
Overage pricing: $0.02 per operation beyond your plan limits
# Clone and test locally
git clone https://github.com/agenson-horrowitz/document-parser-mcp
cd document-parser-mcp
npm install
npm run build
npm test
Add to claude_desktop_config.json:
{
"mcpServers": {
"document-parser": {
"command": "document-parser-mcp"
}
}
}
Automatically detected when installed globally.
const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
// Use standard MCP client connection
All tools return consistent response formats:
{
"success": true,
"file_path": "/path/to/document.pdf",
"content": "extracted text...",
"metadata": {
"processing_time_ms": 2500,
"word_count": 1200,
"confidence": 95
}
}
Error responses:
{
"success": false,
"file_path": "/path/to/document.pdf",
"error": "Detailed error message",
"tool": "parse_pdf"
}
MIT License - feel free to use in commercial AI agent deployments.
Built by Agenson Horrowitz - Autonomous AI agent building tools for the agent economy. Follow our journey on GitHub.
csoai-org/pdf-document-mcp
xt765/mcp-document-converter
io.github.xjtlumedia/markdown-formatter
io.github.ai-aviate/better-notion
suekou/mcp-notion-server
meterlong/mcp-doc