shuck-file

6STDIOregistry active

Summary

Converts Word, PDF, Excel, PowerPoint, and CSV files into clean Markdown, with smart routing that depends on file size. Small documents get converted directly. Large ones return a document map showing sections, token counts, and density metrics, so your agent can pull only what it needs with targeted section extraction or keyword search. The MCP server exposes two tools: shuck for conversion with options like sections filtering, grep search, and token budgets, and list_formats for checking supported file types. Useful when you need an agent to work with office documents without dumping entire 50 page reports into context, or when you want structured extraction from spreadsheets and presentations.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

🇨🇳 中文

shuck-file

Any file in, Markdown out — read only what matters.

shuck-file converts documents to clean Markdown for AI agents and LLMs. Small files output directly; large files return a document map with section summaries, token counts, and actionable next steps — so agents only pull what they need.

Why shuck-file?

AI agents need a bridge that's context-aware:

Small file → shuck report.docx → full Markdown on stdout
Large file → shuck report.docx → document map with sections and extraction options
Targeted extraction → shuck report.docx --sections s1,s3 → only what you need
Search → shuck report.docx --grep "revenue" → find without reading everything

Supported Formats

Format	Extension	Library	What's Preserved
Word	`.docx`	python-docx	Headings, bold/italic, lists, tables
PDF	`.pdf`	pdfplumber	Text content, page breaks
Excel	`.xlsx`	openpyxl	All sheets as Markdown tables
PowerPoint	`.pptx`	python-pptx	Titles, text, tables, speaker notes
CSV	`.csv`	stdlib	All rows/columns as a table

Installation

Via pip (recommended)

pip install shuck-file

This installs the shuck CLI command and the MCP server.

From source

git clone https://github.com/Shan-Zhu/shuck-file.git
cd shuck-file
pip install -e .

Quick Start

# Convert a document
shuck report.docx

# Force full output (bypass map mode)
shuck large-report.pdf --all

# Search within a document
shuck report.pdf --grep "revenue"

Usage

Auto-Routing (default)

Small files output directly, large files return a document map.

# Small file → direct Markdown output
shuck document.pdf

# Large file → document map with sections table + next steps
shuck large-report.pdf

Extraction Options

# Force full output (bypass map mode)
shuck report.pdf --all

# Extract specific sections
shuck report.pdf --sections s1,s3

# Tables only
shuck report.pdf --tables-only

# Search within document
shuck report.pdf --grep "revenue"

# Token budget (smart compression)
shuck report.pdf --budget 4000

# Combinations work
shuck report.pdf --sections s2,s3 --budget 2000

Excel/CSV Specific

# Column headers and types
shuck data.xlsx --schema-only

# Headers + first N rows
shuck data.xlsx --sample 5

Power User Subcommands

# Force map mode (even on small files)
shuck probe document.docx

# Force full extraction (alias for --all)
shuck pull document.docx

Output Control

# Write to file
shuck document.pdf -o output.md

# Write to directory (auto-named)
shuck document.pdf -d ./converted/

# Skip YAML frontmatter
shuck document.pdf --no-frontmatter

# List supported formats
shuck --formats

Map Mode Output

When a file is large, shuck returns a document map:

# Document Map: quarterly-report.pdf

**6 pages | ~12,400 tokens | 6 sections**

## Sections

| # | Title | Type | Tokens | Density |
|---|-------|------|--------|---------|
| s1 | Executive Summary | narrative | 450 | high |
| s2 | Q3 Financial Results | mixed | 2,800 | high |
| s3 | Revenue Breakdown | tabular | 3,200 | high |
| ...

## Next Steps

- `shuck quarterly-report.pdf --all` -- full document (~12,400 tokens)
- `shuck quarterly-report.pdf --sections s1,s2` -- high-density (~3,250 tokens)
- `shuck quarterly-report.pdf --grep "..."` -- search for keywords

MCP Server

shuck-file includes an MCP (Model Context Protocol) server, making it available to any MCP-compatible AI tool.

Claude Code

claude mcp add shuck-file -- shuck-file

Or add to your project's .mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Windsurf

Add to your MCP configuration:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Any MCP Client

shuck-file registers as an MCP server via the mcp.servers entry point. Tools exposed:

shuck — Convert a document to Markdown with all options (mode, sections, grep, budget, etc.)
list_formats — List supported document formats

Claude Code Plugin

Install as a Claude Code plugin for the /shuck skill:

claude plugin add /path/to/shuck-file

Architecture

src/shuck_file/
├── cli.py                # CLI entrypoint
├── server.py             # MCP Server (FastMCP)
├── core/
│   ├── router.py          # Auto-routing logic
│   ├── segmenter.py       # Document segmentation
│   ├── mapper.py          # Map mode renderer
│   ├── budget.py          # Smart compression
│   ├── grep.py            # In-document search
│   ├── frontmatter.py     # YAML frontmatter
│   └── models.py          # Data models
├── extractors/
│   ├── base.py            # Base extractor ABC
│   ├── docx_ext.py        # Word extractor
│   ├── pdf_ext.py         # PDF extractor
│   ├── xlsx_ext.py        # Excel extractor
│   ├── pptx_ext.py        # PowerPoint extractor
│   └── csv_ext.py         # CSV extractor
plugin/                    # Claude Code plugin wrapper
tests/
├── test_extractors.py
├── test_router.py
├── test_segmenter.py
├── test_budget.py
└── test_grep.py

License

MIT

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

shuck-file

Any file in, Markdown out — read only what matters.

Why shuck-file?

AI agents need a bridge that's context-aware:

Small file → shuck report.docx → full Markdown on stdout
Large file → shuck report.docx → document map with sections and extraction options
Targeted extraction → shuck report.docx --sections s1,s3 → only what you need
Search → shuck report.docx --grep "revenue" → find without reading everything

Supported Formats

Format	Extension	Library	What's Preserved
Word	`.docx`	python-docx	Headings, bold/italic, lists, tables
PDF	`.pdf`	pdfplumber	Text content, page breaks
Excel	`.xlsx`	openpyxl	All sheets as Markdown tables
PowerPoint	`.pptx`	python-pptx	Titles, text, tables, speaker notes
CSV	`.csv`	stdlib	All rows/columns as a table

Installation

Via pip (recommended)

pip install shuck-file

This installs the shuck CLI command and the MCP server.

From source

git clone https://github.com/Shan-Zhu/shuck-file.git
cd shuck-file
pip install -e .

Quick Start

# Convert a document
shuck report.docx

# Force full output (bypass map mode)
shuck large-report.pdf --all

# Search within a document
shuck report.pdf --grep "revenue"

Usage

Auto-Routing (default)

Small files output directly, large files return a document map.

# Small file → direct Markdown output
shuck document.pdf

# Large file → document map with sections table + next steps
shuck large-report.pdf

Extraction Options

# Force full output (bypass map mode)
shuck report.pdf --all

# Extract specific sections
shuck report.pdf --sections s1,s3

# Tables only
shuck report.pdf --tables-only

# Search within document
shuck report.pdf --grep "revenue"

# Token budget (smart compression)
shuck report.pdf --budget 4000

# Combinations work
shuck report.pdf --sections s2,s3 --budget 2000

Excel/CSV Specific

# Column headers and types
shuck data.xlsx --schema-only

# Headers + first N rows
shuck data.xlsx --sample 5

Power User Subcommands

# Force map mode (even on small files)
shuck probe document.docx

# Force full extraction (alias for --all)
shuck pull document.docx

Output Control

# Write to file
shuck document.pdf -o output.md

# Write to directory (auto-named)
shuck document.pdf -d ./converted/

# Skip YAML frontmatter
shuck document.pdf --no-frontmatter

# List supported formats
shuck --formats

Map Mode Output

When a file is large, shuck returns a document map:

# Document Map: quarterly-report.pdf

**6 pages | ~12,400 tokens | 6 sections**

## Sections

| # | Title | Type | Tokens | Density |
|---|-------|------|--------|---------|
| s1 | Executive Summary | narrative | 450 | high |
| s2 | Q3 Financial Results | mixed | 2,800 | high |
| s3 | Revenue Breakdown | tabular | 3,200 | high |
| ...

## Next Steps

- `shuck quarterly-report.pdf --all` -- full document (~12,400 tokens)
- `shuck quarterly-report.pdf --sections s1,s2` -- high-density (~3,250 tokens)
- `shuck quarterly-report.pdf --grep "..."` -- search for keywords

MCP Server

shuck-file includes an MCP (Model Context Protocol) server, making it available to any MCP-compatible AI tool.

Claude Code

claude mcp add shuck-file -- shuck-file

Or add to your project's .mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Windsurf

Add to your MCP configuration:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Any MCP Client

shuck-file registers as an MCP server via the mcp.servers entry point. Tools exposed:

shuck — Convert a document to Markdown with all options (mode, sections, grep, budget, etc.)
list_formats — List supported document formats

Claude Code Plugin

Install as a Claude Code plugin for the /shuck skill:

claude plugin add /path/to/shuck-file

Architecture

src/shuck_file/
├── cli.py                # CLI entrypoint
├── server.py             # MCP Server (FastMCP)
├── core/
│   ├── router.py          # Auto-routing logic
│   ├── segmenter.py       # Document segmentation
│   ├── mapper.py          # Map mode renderer
│   ├── budget.py          # Smart compression
│   ├── grep.py            # In-document search
│   ├── frontmatter.py     # YAML frontmatter
│   └── models.py          # Data models
├── extractors/
│   ├── base.py            # Base extractor ABC
│   ├── docx_ext.py        # Word extractor
│   ├── pdf_ext.py         # PDF extractor
│   ├── xlsx_ext.py        # Excel extractor
│   ├── pptx_ext.py        # PowerPoint extractor
│   └── csv_ext.py         # CSV extractor
plugin/                    # Claude Code plugin wrapper
tests/
├── test_extractors.py
├── test_router.py
├── test_segmenter.py
├── test_budget.py
└── test_grep.py

License

MIT

shuck-file

shuck-file

Why shuck-file?

Supported Formats

Installation

Via pip (recommended)

From source

Quick Start

Usage

Auto-Routing (default)

Extraction Options

Excel/CSV Specific

Power User Subcommands

Output Control

Map Mode Output

MCP Server

Claude Code

Cursor

Windsurf

Any MCP Client

Claude Code Plugin

Architecture

License

shuck-file

shuck-file

Why shuck-file?

Supported Formats

Installation

Via pip (recommended)

From source

Quick Start

Usage

Auto-Routing (default)

Extraction Options

Excel/CSV Specific

Power User Subcommands

Output Control

Map Mode Output

MCP Server

Claude Code

Cursor

Windsurf

Any MCP Client

Claude Code Plugin

Architecture

License

Related Documents & Knowledge MCP Servers

Related Documents & Knowledge MCP Servers