Ocr Super Surya

492 installs17 stars

Summary

This wraps Surya, a GPU-accelerated OCR engine that handles 90+ languages and claims 2x better accuracy than Tesseract. You'd reach for it when processing multilingual documents, extracting text from screenshots, or dealing with complex layouts and tables. The skill documentation is thorough about gotchas: it walks through OneDrive path issues on Windows, API breaking changes between versions (the langs argument got removed in 0.17.x), and transformers 5.x compatibility problems. If you're on a machine with a GPU and need reliable text extraction beyond what lightweight OCR provides, this is worth trying. Just note Surya itself requires a commercial license if your company does over $2M in revenue.

Install to Claude Code

npx -y skills add aktsmm/agent-skills --skill ocr-super-surya --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Files

SKILL.mdView on GitHub

OCR Super Surya

GPU-optimized OCR using Surya.

When to Use

OCR, extract text from image, text recognition, 画像から文字
Extracting text from screenshots, photos, or scanned images
Processing PDFs with embedded images
Multi-language document OCR (90+ languages including Japanese)

Features

Feature	Description
Accuracy	2x better than Tesseract (0.97 vs 0.88)
GPU	PyTorch-based, CUDA optimized
Languages	90+ including CJK
Layout	Document layout, table recognition

Quick Start

Installation

# 1. Check GPU
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

# 2. Install (with CUDA if GPU available)
pip install surya-ocr

# If CUDA=False but you have GPU, reinstall PyTorch:
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Windows + uv 環境（OneDrive配下でのインストール）

OneDrive 配下のフォルダでは uv のハードリンクが失敗するため、以下の手順を使う：

# キャッシュをOneDrive外に設定
$env:UV_CACHE_DIR = "C:\Temp\uv_cache"

# 仮想環境をOneDrive外に作成
uv venv C:\Users\<USERNAME>\ocr_env --python 3.12

# surya-ocrをインストール（link-mode=copy でハードリンクを回避）
uv pip install surya-ocr --python C:\Users\<USERNAME>\ocr_env\Scripts\python.exe --link-mode=copy

# transformers 5.x は非互換 → 4.x を強制
uv pip install "transformers<5.0" --python C:\Users\<USERNAME>\ocr_env\Scripts\python.exe --link-mode=copy

Usage

# CLI
python scripts/ocr_helper.py image.png
python scripts/ocr_helper.py document.pdf -l ja en -o result.txt

# Or use surya directly
surya_ocr image.png --output_dir ./results

Python API

import sys, io
# Windows CP932エンコードエラー対策
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor

image = Image.open("document.png").convert("RGB")
found_pred = FoundationPredictor()
rec_pred = RecognitionPredictor(found_pred)  # v0.13+ : FoundationPredictor必須
det_pred = DetectionPredictor()

# v0.17.x以降: langs引数は廃止 → 渡さないこと
for page in rec_pred([image], det_predictor=det_pred):
    for line in page.text_lines:
        if line.text.strip():
            print(line.text)

API変更履歴 (v0.17.x):

RecognitionPredictor(foundation_predictor) - FoundationPredictor が必須引数に変更

__call__() から langs 引数が削除（自動検出に変更）

GPU Configuration

Variable	Default	Description
`RECOGNITION_BATCH_SIZE`	512	Reduce for lower VRAM
`DETECTOR_BATCH_SIZE`	36	Reduce if OOM

export RECOGNITION_BATCH_SIZE=256
surya_ocr image.png

Scripts

Script	Description
`scripts/ocr_helper.py`	Helper with OOM auto-retry, batch support

Troubleshooting

エラー	原因	対処
`RecognitionPredictor.__init__() missing 1 required positional argument: 'foundation_predictor'`	v0.13+ でAPIが変更	`found_pred = FoundationPredictor()` を作成して引数に渡す
`TypeError: __call__() got an unexpected keyword argument 'langs'`	v0.17.x で `langs` 引数廃止	`langs` 引数を削除する
`AttributeError: 'SuryaDecoderConfig' object has no attribute 'pad_token_id'`	`transformers 5.x` との非互換	`pip install "transformers<5.0"` でダウングレード
`failed to hardlink file ... OneDrive` (uv, os error 396)	OneDrive のハードリンク制限	`--link-mode=copy` を付けてインストール＋`UV_CACHE_DIR` をOneDrive外に設定
`UnicodeEncodeError: 'cp932' codec can't encode character`	Windows のCP932デフォルトエンコード	`sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')` を先頭に追加

License Note

Surya: GPL-3.0 (code), commercial license required for >$2M revenue

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Categories

AI & Agent Building Office & Documents

First SeenJun 3, 2026

View on GitHub

Feature

Description

Accuracy

2x better than Tesseract (0.97 vs 0.88)

GPU

PyTorch-based, CUDA optimized

Languages

90+ including CJK

Layout

Document layout, table recognition

Quick Start

Installation

# 1. Check GPU python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')" # 2. Install (with CUDA if GPU available) pip install surya-ocr # If CUDA=False but you have GPU, reinstall PyTorch: pip uninstall torch torchvision torchaudio -y pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Windows + uv 環境（OneDrive配下でのインストール）

OneDrive 配下のフォルダでは uv のハードリンクが失敗するため、以下の手順を使う：

# キャッシュをOneDrive外に設定 $env:UV_CACHE_DIR = "C:\Temp\uv_cache" # 仮想環境をOneDrive外に作成 uv venv C:\Users\<USERNAME>\ocr_env --python 3.12 # surya-ocrをインストール（link-mode=copy でハードリンクを回避） uv pip install surya-ocr --python C:\Users\<USERNAME>\ocr_env\Scripts\python.exe --link-mode=copy # transformers 5.x は非互換 → 4.x を強制 uv pip install "transformers<5.0" --python C:\Users\<USERNAME>\ocr_env\Scripts\python.exe --link-mode=copy

Usage

# CLI python scripts/ocr_helper.py image.png python scripts/ocr_helper.py document.pdf -l ja en -o result.txt # Or use surya directly surya_ocr image.png --output_dir ./results

Python API

import sys, io # Windows CP932エンコードエラー対策 sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8') from PIL import Image from surya.recognition import RecognitionPredictor from surya.detection import DetectionPredictor from surya.foundation import FoundationPredictor image = Image.open("document.png").convert("RGB") found_pred = FoundationPredictor() rec_pred = RecognitionPredictor(found_pred) # v0.13+ : FoundationPredictor必須 det_pred = DetectionPredictor() # v0.17.x以降: langs引数は廃止 → 渡さないこと for page in rec_pred([image], det_predictor=det_pred): for line in page.text_lines: if line.text.strip(): print(line.text)

API変更履歴 (v0.17.x):

RecognitionPredictor(foundation_predictor) - FoundationPredictor が必須引数に変更
__call__() から langs 引数が削除（自動検出に変更）

Variable

Default

Description

RECOGNITION_BATCH_SIZE

512

Reduce for lower VRAM

DETECTOR_BATCH_SIZE

Reduce if OOM

Script

Description

scripts/ocr_helper.py

Helper with OOM auto-retry, batch support

Troubleshooting

エラー

原因

対処

RecognitionPredictor.__init__() missing 1 required positional argument: 'foundation_predictor'

v0.13+ でAPIが変更

found_pred = FoundationPredictor() を作成して引数に渡す

TypeError: __call__() got an unexpected keyword argument 'langs'

v0.17.x で langs 引数廃止

langs 引数を削除する

AttributeError: 'SuryaDecoderConfig' object has no attribute 'pad_token_id'

transformers 5.x との非互換

pip install "transformers<5.0" でダウングレード

failed to hardlink file ... OneDrive (uv, os error 396)

OneDrive のハードリンク制限

--link-mode=copy を付けてインストール＋UV_CACHE_DIR をOneDrive外に設定

UnicodeEncodeError: 'cp932' codec can't encode character

Windows のCP932デフォルトエンコード

sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8') を先頭に追加

Ocr Super Surya

Install to Claude Code

OCR Super Surya

When to Use

Features

Quick Start

Installation

Windows + uv 環境（OneDrive配下でのインストール）

Usage

Python API

GPU Configuration

Scripts

Troubleshooting

License Note

Ocr Super Surya

Install to Claude Code

OCR Super Surya

When to Use

Features

Quick Start

Installation

Windows + uv 環境（OneDrive配下でのインストール）

Usage

Python API

GPU Configuration

Scripts

Troubleshooting

License Note

Recommended

Recommended