CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Pyspark Mcp

annasmazhar/pyspark_mcp
STDIOregistry active
Summary

This is a PySpark migration and optimization toolkit built on SQLGlot. It converts SQL between dialects (PostgreSQL, Oracle, Redshift, MySQL, Snowflake) and generates PySpark DataFrame API code from SQL queries. The AWS Glue integration generates complete job templates, handles DynamicFrame conversions, and analyzes S3 partitioning strategies. You also get code review tools that scan existing PySpark for performance issues, suggest join strategies, and detect duplication across hundreds of files with concurrent batch processing. Reach for this when migrating legacy SQL workloads to Spark or when you need to generate Glue jobs without writing boilerplate. It won't handle recursive CTEs natively but provides Spark SQL equivalents and guidance for edge cases.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

PySpark MCP Server

SQL migration assistance, AWS Glue job generation, and Spark code optimization — as an MCP server.

CI Pipeline Python 3.11+ License: MIT

What It Does

  • SQL Dialect Transpilation — Convert between PostgreSQL, Oracle, Redshift, MySQL, Snowflake, and Spark SQL using SQLGlot
  • PySpark DataFrame API Generation — Generate DataFrame API code from SQL with optimization hints
  • AWS Glue Integration — Job templates, DynamicFrame conversions, Data Catalog definitions, S3 optimization strategies
  • Batch Processing — Process hundreds of SQL files concurrently
  • Code Review & Optimization — Analyze existing PySpark code for performance improvements
  • Pattern Detection — Find code duplication and suggest refactoring

What It Doesn't Do

  • Recursive CTEs → provides Spark SQL equivalent + guidance (PySpark has no native recursive CTE support)
  • MERGE/PIVOT/CONNECT BY → transpiles to Spark SQL, provides DataFrame API guidance
  • Perfect 1:1 DataFrame API transpilation for all SQL — complex queries get Spark SQL + optimization recommendations

Quick Start

pip install -e .
pyspark-mcp  # starts the MCP server

MCP Configuration

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "pyspark": {
      "command": "pyspark-mcp",
      "args": []
    }
  }
}

Hermes Agent

Add to ~/.hermes/config.yaml:

mcp:
  servers:
    pyspark:
      command: pyspark-mcp
      enabled_tools: all

Docker

docker compose up -d

Tools

SQL Conversion

  • convert_sql_to_pyspark — Convert SQL to PySpark with dialect detection
  • analyze_sql_context — Analyze SQL complexity and suggest approach

AWS Glue

  • generate_aws_glue_job_template — Generate complete Glue job scripts
  • convert_dataframe_to_dynamic_frame — DataFrame ↔ DynamicFrame conversion
  • generate_data_catalog_table_definition — Data Catalog table definitions
  • generate_incremental_processing_job — Incremental/CDC job generation
  • analyze_s3_optimization_opportunities — S3 layout and partitioning analysis

Optimization

  • review_pyspark_code — Code review with performance recommendations
  • optimize_pyspark_code — Suggest optimizations for existing code
  • recommend_join_strategy — Broadcast vs shuffle join recommendations
  • suggest_partitioning_strategy — Partitioning recommendations

Batch Processing

  • batch_process_files — Process multiple SQL files concurrently
  • batch_process_directory — Convert entire directories

Development

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Test
pytest tests/ -v --cov=pyspark_tools

# Format
black pyspark_tools tests
isort pyspark_tools tests

# Lint
flake8 pyspark_tools tests

Architecture

pyspark_tools/
├── server.py              # FastMCP server + tool definitions
├── sql_converter.py       # SQLGlot-based transpilation + DataFrame API generation
├── aws_glue_integration.py # Glue job templates, DynamicFrame, Data Catalog
├── advanced_optimizer.py  # Performance analysis + optimization suggestions
├── batch_processor.py     # Concurrent file processing
├── code_reviewer.py       # PySpark code review patterns
├── duplicate_detector.py  # Code deduplication
├── data_source_analyzer.py # Data source analysis
└── file_utils.py          # File I/O utilities

CI/CD

  • ✅ 256 tests passing
  • ✅ 71% code coverage
  • ✅ Code quality checks (black, isort, flake8)
  • ✅ Python 3.11 tested

License

MIT — see LICENSE.


mcp-name: io.github.AnnasMazhar/pyspark-mcp

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
Cloud & Infrastructure
Registryactive
Packagepyspark-tools
TransportSTDIO
UpdatedMay 9, 2026
View on GitHub

Related Cloud & Infrastructure MCP Servers

View all →
K8s

silenceper/mcp-k8s

Provides Kubernetes resource management and Helm operations via MCP for easy automation and LLM integration.
145
Containerization Assist

azure/containerization-assist

TypeScript MCP server for AI-powered containerization workflows with Docker and Kubernetes support
41
AWS Builder

io.github.evozim/aws-builder

AWS CloudFormation and Terraform infrastructure blueprint builder.
Kubernetes

strowk/mcp-k8s-go

MCP server connecting to Kubernetes
381
Kubernetes

reza-gholizade/k8s-mcp-server

Provides a standardized MCP interface to interact with Kubernetes clusters, enabling resource management, metrics, logs, and events.
156
MCP Server Kubernetes

flux159/mcp-server-kubernetes

Provides unified Kubernetes management via MCP, enabling kubectl-like operations, Helm interactions, and observability.
1.4k