Connects Claude to any HiveServer2-compatible system (Spark, EMR, Hive, Impala) over the Thrift protocol. Exposes four tools: list_databases, list_tables, describe_table, and execute_query for read-only SQL operations. Enforces safety by blocking non-SELECT statements and automatically limiting unbounded queries. Supports multiple auth methods including LDAP, Kerberos, and NOSASL. Built for AWS EMR workflows with SSH tunnel support, though it works with any Spark cluster exposing port 10000. Useful when you need Claude to explore schemas and run analytics queries against production data lakes without write access. Credentials stay local via environment variables. Ships with Docker Compose setup and sample data for local testing.
An MCP server that enables AI assistants to query Spark SQL clusters via the Thrift/HiveServer2 protocol.
Works with any HiveServer2-compatible system: Apache Spark, AWS EMR, Hive, Impala, Presto.
pip install spark-sql-mcp-server
Or run directly with uvx:
uvx spark-sql-mcp-server
export SPARK_HOST="your-emr-master-node.amazonaws.com"
export SPARK_PORT="10000" # default
export SPARK_DATABASE="default" # default
export SPARK_AUTH="NONE" # NONE | LDAP | KERBEROS | CUSTOM | NOSASL
Global (all projects) — add to ~/.claude.json under your project's mcpServers:
{
"mcpServers": {
"spark-sql": {
"command": "uvx",
"args": ["spark-sql-mcp-server"],
"env": {
"SPARK_HOST": "your-emr-master-node.amazonaws.com",
"SPARK_PORT": "10000",
"SPARK_AUTH": "NONE"
}
}
}
}
Project-level — add to .claude/mcp.json in your repo:
{
"mcpServers": {
"spark-sql": {
"command": "uvx",
"args": ["spark-sql-mcp-server"],
"env": {
"SPARK_HOST": "your-emr-master-node.amazonaws.com",
"SPARK_PORT": "10000",
"SPARK_AUTH": "NONE"
}
}
}
}
Add to your claude_desktop_config.json:
{
"mcpServers": {
"spark-sql": {
"command": "uvx",
"args": ["spark-sql-mcp-server"],
"env": {
"SPARK_HOST": "your-emr-master-node.amazonaws.com",
"SPARK_PORT": "10000"
}
}
}
}
Ask Claude things like:
sales.transactions table"| Tool | Description |
|---|---|
list_databases | List all available databases |
list_tables | List tables in a database |
describe_table | Get table schema (columns, types) |
execute_query | Run read-only SQL queries with formatted results |
export SPARK_AUTH="NONE"
export SPARK_AUTH="LDAP"
export SPARK_USERNAME="your-username"
export SPARK_PASSWORD="your-password"
export SPARK_AUTH="KERBEROS"
export SPARK_KERBEROS_SERVICE_NAME="hive" # default
# Ensure you have a valid Kerberos ticket (kinit)
ssh -i your-key.pem -L 10000:localhost:10000 hadoop@your-emr-master
SPARK_HOST=localhostgit clone https://github.com/aidancorrell/spark-sql-mcp-server.git
cd spark-sql-mcp-server
pip install -e ".[dev]"
pytest
ruff check .
A Docker Compose setup provides a local Spark Thrift Server with sample data for integration testing.
# Start the Spark Thrift Server
cd docker && docker compose up -d
# Wait for it to be ready (takes ~30s on first start)
docker logs -f spark-thrift-server # look for "Sample data loaded."
# Run integration tests
pytest -m integration -v
# Tear down
cd docker && docker compose down -v
The local server comes with sample tables: default.employees, default.orders, and test_db.metrics.
Unit tests run by default with pytest (integration tests are skipped unless -m integration is specified).
With the Docker Spark server running, add it to your MCP config to test the server interactively.
Global — add to ~/.claude.json under your project's mcpServers:
{
"spark-sql": {
"command": "uvx",
"args": ["spark-sql-mcp-server"],
"env": {
"SPARK_HOST": "localhost",
"SPARK_PORT": "10000",
"SPARK_AUTH": "NONE"
}
}
}
Project-level — add to .claude/mcp.json:
{
"mcpServers": {
"spark-sql": {
"command": "uvx",
"args": ["spark-sql-mcp-server"],
"env": {
"SPARK_HOST": "localhost",
"SPARK_PORT": "10000",
"SPARK_AUTH": "NONE"
}
}
}
}
Then start a new Claude Code session and ask it to query the sample data.
The execute_query tool only allows read-only SQL statements. Queries must start with one of: SELECT, SHOW, DESCRIBE, DESC, EXPLAIN, or WITH. All other statement types (DROP, INSERT, DELETE, CREATE, ALTER, SET, ADD JAR, etc.) are rejected before reaching the Spark cluster.
Database errors are sanitized before being returned to the MCP client. Internal details such as server hostnames, file paths, and stack traces are not exposed. Connection failures report only the target host/port and error type.
SparkConfig object masks passwords in its string representationSPARK_PASSWORD is marked as a secret in the MCP registry schemaSPARK_AUTH to LDAP or KERBEROS for authenticated environments.MIT
SPARK_HOST*Hostname of the Spark Thrift Server
SPARK_PORTPort of the Spark Thrift Server (default: 10000)
SPARK_DATABASEDefault database to use
SPARK_AUTHAuthentication method: NONE, LDAP, KERBEROS, CUSTOM, or NOSASL
SPARK_USERNAMEUsername for LDAP authentication
SPARK_PASSWORDsecretPassword for LDAP authentication
SPARK_KERBEROS_SERVICE_NAMEKerberos service name (default: hive)
hovecapital/read-only-local-postgres-mcp-server
cocaxcode/database-mcp
io.github.infoinlet-marketplace/mcp-mysql
io.github.cybeleri/database-admin
io.github.yash-0620/postgres-mcp-secured