Converts Word, PDF, Excel, PowerPoint, and CSV files into clean Markdown, with smart routing that depends on file size. Small documents get converted directly. Large ones return a document map showing sections, token counts, and density metrics, so your agent can pull only what it needs with targeted section extraction or keyword search. The MCP server exposes two tools: shuck for conversion with options like sections filtering, grep search, and token budgets, and list_formats for checking supported file types. Useful when you need an agent to work with office documents without dumping entire 50 page reports into context, or when you want structured extraction from spreadsheets and presentations.
Any file in, Markdown out — read only what matters.
shuck-file converts documents to clean Markdown for AI agents and LLMs. Small files output directly; large files return a document map with section summaries, token counts, and actionable next steps — so agents only pull what they need.
AI agents need a bridge that's context-aware:
shuck report.docx → full Markdown on stdoutshuck report.docx → document map with sections and extraction optionsshuck report.docx --sections s1,s3 → only what you needshuck report.docx --grep "revenue" → find without reading everything| Format | Extension | Library | What's Preserved |
|---|---|---|---|
| Word | .docx | python-docx | Headings, bold/italic, lists, tables |
.pdf | pdfplumber | Text content, page breaks | |
| Excel | .xlsx | openpyxl | All sheets as Markdown tables |
| PowerPoint | .pptx | python-pptx | Titles, text, tables, speaker notes |
| CSV | .csv | stdlib | All rows/columns as a table |
pip install shuck-file
This installs the shuck CLI command and the MCP server.
git clone https://github.com/Shan-Zhu/shuck-file.git
cd shuck-file
pip install -e .
# Convert a document
shuck report.docx
# Force full output (bypass map mode)
shuck large-report.pdf --all
# Search within a document
shuck report.pdf --grep "revenue"
Small files output directly, large files return a document map.
# Small file → direct Markdown output
shuck document.pdf
# Large file → document map with sections table + next steps
shuck large-report.pdf
# Force full output (bypass map mode)
shuck report.pdf --all
# Extract specific sections
shuck report.pdf --sections s1,s3
# Tables only
shuck report.pdf --tables-only
# Search within document
shuck report.pdf --grep "revenue"
# Token budget (smart compression)
shuck report.pdf --budget 4000
# Combinations work
shuck report.pdf --sections s2,s3 --budget 2000
# Column headers and types
shuck data.xlsx --schema-only
# Headers + first N rows
shuck data.xlsx --sample 5
# Force map mode (even on small files)
shuck probe document.docx
# Force full extraction (alias for --all)
shuck pull document.docx
# Write to file
shuck document.pdf -o output.md
# Write to directory (auto-named)
shuck document.pdf -d ./converted/
# Skip YAML frontmatter
shuck document.pdf --no-frontmatter
# List supported formats
shuck --formats
When a file is large, shuck returns a document map:
# Document Map: quarterly-report.pdf
**6 pages | ~12,400 tokens | 6 sections**
## Sections
| # | Title | Type | Tokens | Density |
|---|-------|------|--------|---------|
| s1 | Executive Summary | narrative | 450 | high |
| s2 | Q3 Financial Results | mixed | 2,800 | high |
| s3 | Revenue Breakdown | tabular | 3,200 | high |
| ...
## Next Steps
- `shuck quarterly-report.pdf --all` -- full document (~12,400 tokens)
- `shuck quarterly-report.pdf --sections s1,s2` -- high-density (~3,250 tokens)
- `shuck quarterly-report.pdf --grep "..."` -- search for keywords
shuck-file includes an MCP (Model Context Protocol) server, making it available to any MCP-compatible AI tool.
claude mcp add shuck-file -- shuck-file
Or add to your project's .mcp.json:
{
"mcpServers": {
"shuck-file": {
"command": "shuck-file",
"args": []
}
}
}
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"shuck-file": {
"command": "shuck-file",
"args": []
}
}
}
Add to your MCP configuration:
{
"mcpServers": {
"shuck-file": {
"command": "shuck-file",
"args": []
}
}
}
shuck-file registers as an MCP server via the mcp.servers entry point. Tools exposed:
shuck — Convert a document to Markdown with all options (mode, sections, grep, budget, etc.)list_formats — List supported document formatsInstall as a Claude Code plugin for the /shuck skill:
claude plugin add /path/to/shuck-file
src/shuck_file/
├── cli.py # CLI entrypoint
├── server.py # MCP Server (FastMCP)
├── core/
│ ├── router.py # Auto-routing logic
│ ├── segmenter.py # Document segmentation
│ ├── mapper.py # Map mode renderer
│ ├── budget.py # Smart compression
│ ├── grep.py # In-document search
│ ├── frontmatter.py # YAML frontmatter
│ └── models.py # Data models
├── extractors/
│ ├── base.py # Base extractor ABC
│ ├── docx_ext.py # Word extractor
│ ├── pdf_ext.py # PDF extractor
│ ├── xlsx_ext.py # Excel extractor
│ ├── pptx_ext.py # PowerPoint extractor
│ └── csv_ext.py # CSV extractor
plugin/ # Claude Code plugin wrapper
tests/
├── test_extractors.py
├── test_router.py
├── test_segmenter.py
├── test_budget.py
└── test_grep.py
MIT
csoai-org/pdf-document-mcp
xt765/mcp-document-converter
io.github.xjtlumedia/markdown-formatter
io.github.ai-aviate/better-notion
suekou/mcp-notion-server
meterlong/mcp-doc