CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Spark SQL

aidancorrell/spark-sql-mcp-server
authSTDIOregistry active
Summary

Connects Claude to any HiveServer2-compatible system (Spark, EMR, Hive, Impala) over the Thrift protocol. Exposes four tools: list_databases, list_tables, describe_table, and execute_query for read-only SQL operations. Enforces safety by blocking non-SELECT statements and automatically limiting unbounded queries. Supports multiple auth methods including LDAP, Kerberos, and NOSASL. Built for AWS EMR workflows with SSH tunnel support, though it works with any Spark cluster exposing port 10000. Useful when you need Claude to explore schemas and run analytics queries against production data lakes without write access. Credentials stay local via environment variables. Ships with Docker Compose setup and sample data for local testing.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

Spark SQL MCP Server

An MCP server that enables AI assistants to query Spark SQL clusters via the Thrift/HiveServer2 protocol.

Works with any HiveServer2-compatible system: Apache Spark, AWS EMR, Hive, Impala, Presto.

Features

  • Query Spark SQL — Execute read-only SQL queries against your Spark cluster
  • Schema Discovery — List databases, tables, and describe table structures
  • Multiple Auth Methods — NONE, LDAP, NOSASL, CUSTOM, and Kerberos authentication
  • EMR Compatible — Works with AWS EMR clusters out of the box
  • Read-Only Enforcement — Only SELECT, SHOW, DESCRIBE, EXPLAIN, and WITH statements are allowed
  • Safety Defaults — Automatic LIMIT clause on unbounded queries, sanitized error messages

Installation

pip install spark-sql-mcp-server

Or run directly with uvx:

uvx spark-sql-mcp-server

Quick Start

1. Set Environment Variables

export SPARK_HOST="your-emr-master-node.amazonaws.com"
export SPARK_PORT="10000"        # default
export SPARK_DATABASE="default"  # default
export SPARK_AUTH="NONE"         # NONE | LDAP | KERBEROS | CUSTOM | NOSASL

2. Add to Claude Code

Global (all projects) — add to ~/.claude.json under your project's mcpServers:

{
  "mcpServers": {
    "spark-sql": {
      "command": "uvx",
      "args": ["spark-sql-mcp-server"],
      "env": {
        "SPARK_HOST": "your-emr-master-node.amazonaws.com",
        "SPARK_PORT": "10000",
        "SPARK_AUTH": "NONE"
      }
    }
  }
}

Project-level — add to .claude/mcp.json in your repo:

{
  "mcpServers": {
    "spark-sql": {
      "command": "uvx",
      "args": ["spark-sql-mcp-server"],
      "env": {
        "SPARK_HOST": "your-emr-master-node.amazonaws.com",
        "SPARK_PORT": "10000",
        "SPARK_AUTH": "NONE"
      }
    }
  }
}

3. Add to Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "spark-sql": {
      "command": "uvx",
      "args": ["spark-sql-mcp-server"],
      "env": {
        "SPARK_HOST": "your-emr-master-node.amazonaws.com",
        "SPARK_PORT": "10000"
      }
    }
  }
}

4. Query

Ask Claude things like:

  • "What databases are available in our Spark cluster?"
  • "Show me the schema of the sales.transactions table"
  • "Query the top 10 customers by revenue from the analytics database"

Available Tools

ToolDescription
list_databasesList all available databases
list_tablesList tables in a database
describe_tableGet table schema (columns, types)
execute_queryRun read-only SQL queries with formatted results

Authentication

No Auth (default)

export SPARK_AUTH="NONE"

LDAP

export SPARK_AUTH="LDAP"
export SPARK_USERNAME="your-username"
export SPARK_PASSWORD="your-password"

Kerberos

export SPARK_AUTH="KERBEROS"
export SPARK_KERBEROS_SERVICE_NAME="hive"  # default
# Ensure you have a valid Kerberos ticket (kinit)

AWS EMR Setup

  1. Security Group — Allow inbound traffic on port 10000 from your IP
  2. SSH Tunnel (recommended):
    ssh -i your-key.pem -L 10000:localhost:10000 hadoop@your-emr-master
    
  3. Set SPARK_HOST=localhost

Development

git clone https://github.com/aidancorrell/spark-sql-mcp-server.git
cd spark-sql-mcp-server
pip install -e ".[dev]"
pytest
ruff check .

Local Testing with Docker

A Docker Compose setup provides a local Spark Thrift Server with sample data for integration testing.

# Start the Spark Thrift Server
cd docker && docker compose up -d

# Wait for it to be ready (takes ~30s on first start)
docker logs -f spark-thrift-server  # look for "Sample data loaded."

# Run integration tests
pytest -m integration -v

# Tear down
cd docker && docker compose down -v

The local server comes with sample tables: default.employees, default.orders, and test_db.metrics.

Unit tests run by default with pytest (integration tests are skipped unless -m integration is specified).

Using the local server with Claude Code

With the Docker Spark server running, add it to your MCP config to test the server interactively.

Global — add to ~/.claude.json under your project's mcpServers:

{
  "spark-sql": {
    "command": "uvx",
    "args": ["spark-sql-mcp-server"],
    "env": {
      "SPARK_HOST": "localhost",
      "SPARK_PORT": "10000",
      "SPARK_AUTH": "NONE"
    }
  }
}

Project-level — add to .claude/mcp.json:

{
  "mcpServers": {
    "spark-sql": {
      "command": "uvx",
      "args": ["spark-sql-mcp-server"],
      "env": {
        "SPARK_HOST": "localhost",
        "SPARK_PORT": "10000",
        "SPARK_AUTH": "NONE"
      }
    }
  }
}

Then start a new Claude Code session and ask it to query the sample data.

Security

Read-Only Enforcement

The execute_query tool only allows read-only SQL statements. Queries must start with one of: SELECT, SHOW, DESCRIBE, DESC, EXPLAIN, or WITH. All other statement types (DROP, INSERT, DELETE, CREATE, ALTER, SET, ADD JAR, etc.) are rejected before reaching the Spark cluster.

Error Sanitization

Database errors are sanitized before being returned to the MCP client. Internal details such as server hostnames, file paths, and stack traces are not exposed. Connection failures report only the target host/port and error type.

Credential Handling

  • Passwords are never included in log output or error messages
  • The SparkConfig object masks passwords in its string representation
  • SPARK_PASSWORD is marked as a secret in the MCP registry schema

Known Limitations

  • No TLS/SSL support — Thrift connections are unencrypted. For production use with LDAP auth, use an SSH tunnel to protect credentials in transit.
  • No query timeout — Long-running queries are not automatically cancelled. Rely on Spark cluster-level timeout configuration.
  • No per-user access control — All queries execute with the privileges of the configured Spark user. Use HiveServer2 authorization (Ranger, Sentry) to restrict access at the database level.
  • Auth mode defaults to NONE — Appropriate for local development but not for production. Set SPARK_AUTH to LDAP or KERBEROS for authenticated environments.

License

MIT

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

Configuration

SPARK_HOST*

Hostname of the Spark Thrift Server

SPARK_PORT

Port of the Spark Thrift Server (default: 10000)

SPARK_DATABASE

Default database to use

SPARK_AUTH

Authentication method: NONE, LDAP, KERBEROS, CUSTOM, or NOSASL

SPARK_USERNAME

Username for LDAP authentication

SPARK_PASSWORDsecret

Password for LDAP authentication

SPARK_KERBEROS_SERVICE_NAME

Kerberos service name (default: hive)

Categories
Databases
Registryactive
Packagespark-sql-mcp-server
TransportSTDIO
AuthRequired
UpdatedFeb 8, 2026
View on GitHub

Related Databases MCP Servers

View all →
Postgres

ai.waystation/postgres

Connect to your PostgreSQL database to query data and schemas.
54
Read Only Local Postgres Mcp Server

hovecapital/read-only-local-postgres-mcp-server

MCP server for read-only PostgreSQL database queries in Claude Desktop
2
Database Mcp

cocaxcode/database-mcp

MCP server for database connectivity. Multi-DB (PostgreSQL, MySQL, SQLite), 19 tools.
1
Mcp Mysql

io.github.infoinlet-marketplace/mcp-mysql

Read-only MySQL/MariaDB for AI agents — query, list/describe tables, health. SQL-guarded.
Database Admin

io.github.cybeleri/database-admin

Database admin MCP: schema inspection, query optimization for PostgreSQL and MySQL
Postgres Secured (Aegis Zero-Trust)

io.github.yash-0620/postgres-mcp-secured

Enterprise PostgreSQL MCP secured by Aegis Zero-Trust to block unauthorized SQL injections.