Warehouse Mcp

authSTDIOregistry active

Summary

This is a production-grade read-only gateway to seven major data warehouses: Postgres, Oracle, Snowflake, BigQuery, Redshift, DuckDB, and MotherDuck. It exposes 16 tools for querying, catalog browsing, column profiling, and time-series bucketing with dialect-aware SQL validation. The five-tier role system (semantic_only, metadata_only, reader_restricted, reader, admin) maps cleanly to per-key authorization, and you can delegate to native warehouse roles for RLS and column-level security. JSONL audit logs and optional PII masking are built in. Deploy the Docker image or run via npx. Reach for this when you want LLMs querying your warehouse without writing infrastructure or duplicating access policies.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

warehouse-mcp

Production MCP (Model Context Protocol) server for your data warehouse. Read-only enforcement, five-tier role-based access, optional warehouse-role impersonation (for native RLS / CLS), JSONL audit log, optional output PII masking, optional semantic-metadata layer (glossary + table docs) with per-session toggle. Self-host the Docker image, install via npx, or wait for the upcoming managed cloud variant.

Status: v0.4.x — production-ready for the v1 warehouse list. See CHANGELOG for what shipped in each release.

Demo Videos

Warehouse MCP in action using Cursor

Supported warehouses

Warehouse	Adapter docs
Postgres 12+	docs/adapters/postgres.md
Oracle 12c+ (Thin mode, no Instant Client)	docs/adapters/oracle.md
Amazon Redshift (cluster + Serverless)	docs/adapters/redshift.md
Snowflake (key-pair auth)	docs/adapters/snowflake.md
Google BigQuery	docs/adapters/bigquery.md
DuckDB (local file or `:memory:`)	docs/adapters/duckdb.md
MotherDuck (cloud-hosted DuckDB, `md:` paths)	docs/adapters/duckdb.md

Databricks SQL is a fast-follow.

Tools exposed (16, all read-only)

Tool	Purpose
`query`	Execute a SELECT (validator-enforced read-only, dialect-aware)
`list_schemas`, `list_tables`, `describe_table`	Browse the catalog
`find_columns`	Search column names across the warehouse with a LIKE pattern
`get_foreign_keys`	Discover declared FK relationships for safe joins
`get_view_definition`	Read the SQL body of a view (where business logic lives)
`sample_table`, `count_rows`	Peek at data, check size before scanning
`column_stats`, `top_values`	Profile a single column
`time_series`	Bucket by hour/day/week/month/quarter/year — dialect-correct everywhere
`search_value`	Find a literal across a table's text columns
`glossary_lookup`	Read business-glossary terms from the semantic layer (in-memory, no warehouse I/O)
`schema_lookup`	Read schema-level docs — purpose, owner, refresh, table list
`table_lookup`	Read full semantic doc for one table — description + column metadata

Roles (five tiers)

Role	Tools allowed
`semantic_only`	Only the three semantic-lookup tools — zero warehouse access; pair with `semantic=on` for a docs-viewer persona
`metadata_only`	Catalog discovery + semantic lookups — never reads row data
`reader_restricted`	Aggregates / samples / time series — no arbitrary SELECT
`reader`	Adds `query` and `search_value` (the general analyst tier)
`admin`	Everything; future write tools when `ENABLE_WRITE_TOOLS` ships

Per-key role assigned via MCP_API_KEYS=key:role[:set_role=warehouse_role][:semantic=on|off]. The optional set_role= directive issues SET ROLE on Postgres/Redshift so the warehouse's own RLS / CLS / masking policies enforce per-key access — no policy duplication in MCP. The optional semantic=on|off directive overrides SEMANTIC_DEFAULT for that one key, controlling whether the session sees the warehouse://semantic/* resources (see docs/semantic-metadata.md).

For deployments with multiple existing DB roles (finance, hr, payroll, etc.) and many human users, see docs/multi-role-deployment.md — walks through mapping ~10 DB roles to MCP keys, the recommended <area> / <area>_restricted pattern, and when to graduate from static keys to OIDC.

Sample mcp connection json(change based on your role):

{ "mcpServers": { "warehouse-admin-semanticon": { "command": "npx", "args": [ "-y", "mcp-remote@latest", "http://localhost:3001/mcp", "--header", "Authorization: Bearer admin-key-change-me" ] }, "warehouse-admin-semanticoff": { "command": "npx", "args": [ "-y", "mcp-remote@latest", "http://localhost:3001/mcp", "--header", "Authorization: Bearer admin-key-nosemantic" ] }, "warehouse-reader-semanticon": { "command": "npx", "args": [ "-y", "mcp-remote@latest", "http://localhost:3001/mcp", "--header", "Authorization: Bearer reader-key-change-me" ] }, "warehouse-reader-semanticoff": { "command": "npx", "args": [ "-y", "mcp-remote@latest", "http://localhost:3001/mcp", "--header", "Authorization: Bearer reader-key-nosemantic" ] }, "warehouse-reader-restricted-semanticon": { "command": "npx", "args": [ "-y", "mcp-remote@latest", "http://localhost:3001/mcp", "--header", "Authorization: Bearer restricted-key-change-me" ] }, "warehouse-reader-restricted-semanticoff": { "command": "npx", "args": [ "-y", "mcp-remote@latest", "http://localhost:3001/mcp", "--header", "Authorization: Bearer restricted-key-nosemantic" ] }, "warehouse-metadata-only-semanticon": { "command": "npx", "args": [ "-y", "mcp-remote@latest", "http://localhost:3001/mcp", "--header", "Authorization: Bearer metadata-only-key-change-me" ] }, "warehouse-metadata-only-semanticoff": { "command": "npx", "args": [ "-y", "mcp-remote@latest", "http://localhost:3001/mcp", "--header", "Authorization: Bearer metadata-only-key-nosemantic" ] }, "warehouse-docs-viewer-only": { "command": "npx", "args": [ "-y", "mcp-remote@latest", "http://localhost:3001/mcp", "--header", "Authorization: Bearer docs_viewer_key_change_me" ] } } }

Quick start

Option A — Docker compose (5-minute demo with seeded Postgres)

git clone https://github.com/kalehdoo/warehouse-mcp.git
cd warehouse-mcp
docker compose up
# server on http://localhost:3001, seeded ecommerce data in Postgres

Option B — Docker against your own warehouse

The same image bundles every adapter; pick one with WAREHOUSE_TYPE plus the matching credentials. For credentials, prefer --env-file (or your secrets manager) over inline -e flags so passwords don't end up in shell history.

# Postgres (REDSHIFT_* env vars for Redshift; same driver under the hood)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=postgres \
  -e PG_HOST=db -e PG_DATABASE=analytics -e PG_USER=mcp_reader -e PG_PASSWORD=... \
  -e MCP_API_KEYS="$(openssl rand -hex 24):reader" \
  ghcr.io/kalehdoo/warehouse-mcp:latest

# Oracle (Thin mode, no Instant Client)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=oracle \
  -e ORACLE_USER=MCP_READER -e ORACLE_PASSWORD=... \
  -e ORACLE_CONNECT_STRING="db.host:1521/SERVICE" \
  ghcr.io/kalehdoo/warehouse-mcp:latest

# Snowflake (key-pair, mount the .p8)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=snowflake \
  -e SNOWFLAKE_ACCOUNT=xy12345.us-east-1 -e SNOWFLAKE_USER=MCP_READER \
  -e SNOWFLAKE_PRIVATE_KEY_PATH=/keys/snowflake.p8 \
  -e SNOWFLAKE_WAREHOUSE=COMPUTE_WH -e SNOWFLAKE_DATABASE=ANALYTICS \
  -v /opt/keys:/keys:ro \
  ghcr.io/kalehdoo/warehouse-mcp:latest

# BigQuery (mount the service-account JSON)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=bigquery \
  -e GOOGLE_APPLICATION_CREDENTIALS=/keys/bq-sa.json \
  -e BIGQUERY_PROJECT=my-gcp-project \
  -v /opt/keys:/keys:ro \
  ghcr.io/kalehdoo/warehouse-mcp:latest

# DuckDB (file or in-memory)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=duckdb -e DUCKDB_PATH=:memory: \
  ghcr.io/kalehdoo/warehouse-mcp:latest

# MotherDuck (hosted DuckDB)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=duckdb -e DUCKDB_PATH=md:my_database \
  -e MOTHERDUCK_TOKEN=... \
  ghcr.io/kalehdoo/warehouse-mcp:latest

Option C — npx (no container)

npx warehouse-mcp@latest init     # interactive setup; writes .env, prints Claude Desktop snippet
npx warehouse-mcp doctor          # verify the connection without booting the server
npx warehouse-mcp start           # bind the MCP server to MCP_SERVER_PORT (default 3001)

Then point an AI client at it. Drop-in configs:

For a step-by-step walkthrough from "I have a warehouse" to "Claude is querying it", see the onboarding guide. For common errors, see troubleshooting.

Local development

nvm use            # Node 20
npm install
npm test           # unit tests, sub-second
npm run lint
cp .env.example .env

For testcontainers integration tests against real Postgres: npm run test:integration (Docker required).

env setup

MCP_API_KEYS
WAREHOUSE_TYPE
SEMANTIC_DIR

Optional: Semantic metadata (v0.4+)

Teach the AI agent what your warehouse means, not just what's in it. Drop YAML files describing your business glossary and table semantics into a directory, point SEMANTIC_DIR at it, and the agent gets two access channels into the same in-memory index:

MCP resources at warehouse://semantic/* — what tool-aware clients (Cursor, MCP Inspector) read proactively before issuing queries.
MCP tools glossary_lookup, schema_lookup, table_lookup — same data via the tool channel, for clients (Claude Desktop) whose UI is tool-centric and may suppress resource-only servers. Pure in-memory Map.gets — no warehouse I/O, no audit cost beyond a normal tool log line.

Format follows dbt's schema.yml v2 with one extension (meta.schema: per model). Customers using dbt can point SEMANTIC_DIR at their existing models/ directory and reuse most of what they have. See docs/semantic-metadata.md and the starter docs/semantic-templates/.

Per-session toggle

Whether a session sees the semantic layer is independent of whether the YAMLs are loaded — SEMANTIC_DIR controls loading; SEMANTIC_DEFAULT=on|off and the per-key semantic=on|off option (or include_semantic JWT claim) control per-session visibility. The YAMLs are always loaded and validated at boot so you can flip a kill-switch without restarting. Three precedence layers, highest first:

Per-JWT claim — include_semantic: true|false (OIDC).
Per-API-key option — semantic=on|off in MCP_API_KEYS.
Server default — SEMANTIC_DEFAULT=on|off (default on).

Each tool-call audit row records the resolved include_semantic value so you can correlate query quality with semantic exposure after the fact.

Optional: OpenTelemetry tracing

Off by default. Set OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318/v1/traces in your env and the server will emit per-tool-call spans. Resource attributes: service.name=warehouse-mcp, service.version=<package.json version>. Span attributes: warehouse.tenant, warehouse.role. Works with any OTLP/HTTP backend (Grafana Tempo, Honeycomb, Datadog APM, New Relic, SigNoz).

Optional: Output PII masking

Off by default. Set GUARDRAIL_PII_MASK=on and the server masks emails, SSNs, formatted phones, IPv4 addresses, and Luhn-validated credit cards in result rows. Mask level depends on the caller's role: admin sees raw, reader sees partial (a***@example.com), reader_restricted sees full redaction tags.

How it works

Read the architecture doc for the request flow — what files execute when an MCP client makes a call, how the guardrail pipeline composes around the tool handler, and the recipes for adding a new tool / guardrail / adapter. Single-page orientation for operators, security reviewers, and contributors.

Security

Read the threat model before deploying. It covers the OWASP Top 10 mapping, what the codebase mitigates, and what is left to your deployment (TLS, secrets management, network isolation, cost guardrails). Report vulnerabilities per SECURITY.md.

Contributing

Issues and PRs welcome. Start with CONTRIBUTING.md — it covers the dev workflow, the adapter contract, and how to add a new warehouse. By participating you agree to the Code of Conduct.

License

Apache-2.0

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Configuration

MCP_TRANSPORTdefault: stdio

Set to 'stdio' for desktop AI clients (Claude Desktop, Cursor). Defaults to 'http'.

WAREHOUSE_TYPE*

Which warehouse to connect to. One of: postgres, oracle, redshift, snowflake, bigquery, duckdb. (DuckDB also handles MotherDuck via DUCKDB_PATH=md:<db>.)

MCP_API_KEYSsecret

Comma-separated bearer keys with role and optional warehouse-role impersonation: 'key1:reader,key2:admin', 'key3:reader:set_role=alice'. Leave empty for stdio (OS process boundary is the trust boundary). Required for HTTP.

DUCKDB_PATH

DuckDB file path, ':memory:', or 'md:<database>' for MotherDuck. Required when WAREHOUSE_TYPE=duckdb.

MOTHERDUCK_TOKENsecret

MotherDuck service token. Required when DUCKDB_PATH starts with 'md:'.

PG_HOST

Postgres host. Required when WAREHOUSE_TYPE=postgres. (Use REDSHIFT_HOST for Redshift.)

PG_DATABASE

Postgres database name.

PG_USER

Postgres user.

PG_PASSWORDsecret

Postgres password.

ORACLE_USER

Oracle user. Required when WAREHOUSE_TYPE=oracle.

ORACLE_PASSWORDsecret

Oracle password.

ORACLE_CONNECT_STRING

Oracle Easy Connect (e.g. host:1521/SERVICE), TNS descriptor, or alias.

SNOWFLAKE_ACCOUNT

Snowflake account identifier (e.g. xy12345.us-east-1). Required when WAREHOUSE_TYPE=snowflake.

SNOWFLAKE_USER

Snowflake username.

SNOWFLAKE_PRIVATE_KEY_PATH

Path to PKCS8 private key (.p8). Snowflake auth is key-pair only; password auth is not supported.

BIGQUERY_PROJECT

GCP project id. Required when WAREHOUSE_TYPE=bigquery.

GOOGLE_APPLICATION_CREDENTIALS

Path to BigQuery service-account JSON. Omit on GKE / Cloud Run with workload identity.

GUARDRAIL_PII_MASKdefault: off

Set to 'on' to enable role-aware PII masking on result rows (emails, SSNs, phones, IPs, Luhn-validated CCs). Off by default.

MCP_RATE_LIMIT_RPMdefault: 0

Per-principal token-bucket rate limit on tool invocations. 0 = disabled. Recommend 60 for production.

warehouse-mcp

Status: v0.4.x — production-ready for the v1 warehouse list. See CHANGELOG for what shipped in each release.

Demo Videos

Warehouse MCP in action using Cursor

Supported warehouses

Warehouse	Adapter docs
Postgres 12+	docs/adapters/postgres.md
Oracle 12c+ (Thin mode, no Instant Client)	docs/adapters/oracle.md
Amazon Redshift (cluster + Serverless)	docs/adapters/redshift.md
Snowflake (key-pair auth)	docs/adapters/snowflake.md
Google BigQuery	docs/adapters/bigquery.md
DuckDB (local file or `:memory:`)	docs/adapters/duckdb.md
MotherDuck (cloud-hosted DuckDB, `md:` paths)	docs/adapters/duckdb.md

Databricks SQL is a fast-follow.

Tools exposed (16, all read-only)

Tool	Purpose
`query`	Execute a SELECT (validator-enforced read-only, dialect-aware)
`list_schemas`, `list_tables`, `describe_table`	Browse the catalog
`find_columns`	Search column names across the warehouse with a LIKE pattern
`get_foreign_keys`	Discover declared FK relationships for safe joins
`get_view_definition`	Read the SQL body of a view (where business logic lives)
`sample_table`, `count_rows`	Peek at data, check size before scanning
`column_stats`, `top_values`	Profile a single column
`time_series`	Bucket by hour/day/week/month/quarter/year — dialect-correct everywhere
`search_value`	Find a literal across a table's text columns
`glossary_lookup`	Read business-glossary terms from the semantic layer (in-memory, no warehouse I/O)
`schema_lookup`	Read schema-level docs — purpose, owner, refresh, table list
`table_lookup`	Read full semantic doc for one table — description + column metadata

Roles (five tiers)

Role	Tools allowed
`semantic_only`	Only the three semantic-lookup tools — zero warehouse access; pair with `semantic=on` for a docs-viewer persona
`metadata_only`	Catalog discovery + semantic lookups — never reads row data
`reader_restricted`	Aggregates / samples / time series — no arbitrary SELECT
`reader`	Adds `query` and `search_value` (the general analyst tier)
`admin`	Everything; future write tools when `ENABLE_WRITE_TOOLS` ships

Sample mcp connection json(change based on your role):

Quick start

Option A — Docker compose (5-minute demo with seeded Postgres)

git clone https://github.com/kalehdoo/warehouse-mcp.git
cd warehouse-mcp
docker compose up
# server on http://localhost:3001, seeded ecommerce data in Postgres

Option B — Docker against your own warehouse

# Postgres (REDSHIFT_* env vars for Redshift; same driver under the hood)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=postgres \
  -e PG_HOST=db -e PG_DATABASE=analytics -e PG_USER=mcp_reader -e PG_PASSWORD=... \
  -e MCP_API_KEYS="$(openssl rand -hex 24):reader" \
  ghcr.io/kalehdoo/warehouse-mcp:latest

# Oracle (Thin mode, no Instant Client)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=oracle \
  -e ORACLE_USER=MCP_READER -e ORACLE_PASSWORD=... \
  -e ORACLE_CONNECT_STRING="db.host:1521/SERVICE" \
  ghcr.io/kalehdoo/warehouse-mcp:latest

# Snowflake (key-pair, mount the .p8)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=snowflake \
  -e SNOWFLAKE_ACCOUNT=xy12345.us-east-1 -e SNOWFLAKE_USER=MCP_READER \
  -e SNOWFLAKE_PRIVATE_KEY_PATH=/keys/snowflake.p8 \
  -e SNOWFLAKE_WAREHOUSE=COMPUTE_WH -e SNOWFLAKE_DATABASE=ANALYTICS \
  -v /opt/keys:/keys:ro \
  ghcr.io/kalehdoo/warehouse-mcp:latest

# BigQuery (mount the service-account JSON)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=bigquery \
  -e GOOGLE_APPLICATION_CREDENTIALS=/keys/bq-sa.json \
  -e BIGQUERY_PROJECT=my-gcp-project \
  -v /opt/keys:/keys:ro \
  ghcr.io/kalehdoo/warehouse-mcp:latest

# DuckDB (file or in-memory)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=duckdb -e DUCKDB_PATH=:memory: \
  ghcr.io/kalehdoo/warehouse-mcp:latest

# MotherDuck (hosted DuckDB)
docker run -d -p 3001:3001 \
  -e WAREHOUSE_TYPE=duckdb -e DUCKDB_PATH=md:my_database \
  -e MOTHERDUCK_TOKEN=... \
  ghcr.io/kalehdoo/warehouse-mcp:latest

Option C — npx (no container)

npx warehouse-mcp@latest init     # interactive setup; writes .env, prints Claude Desktop snippet
npx warehouse-mcp doctor          # verify the connection without booting the server
npx warehouse-mcp start           # bind the MCP server to MCP_SERVER_PORT (default 3001)

Then point an AI client at it. Drop-in configs:

For a step-by-step walkthrough from "I have a warehouse" to "Claude is querying it", see the onboarding guide. For common errors, see troubleshooting.

Local development

nvm use            # Node 20
npm install
npm test           # unit tests, sub-second
npm run lint
cp .env.example .env

For testcontainers integration tests against real Postgres: npm run test:integration (Docker required).

env setup

MCP_API_KEYS
WAREHOUSE_TYPE
SEMANTIC_DIR

Optional: Semantic metadata (v0.4+)

MCP resources at warehouse://semantic/* — what tool-aware clients (Cursor, MCP Inspector) read proactively before issuing queries.
MCP tools glossary_lookup, schema_lookup, table_lookup — same data via the tool channel, for clients (Claude Desktop) whose UI is tool-centric and may suppress resource-only servers. Pure in-memory Map.gets — no warehouse I/O, no audit cost beyond a normal tool log line.

Per-session toggle

Per-JWT claim — include_semantic: true|false (OIDC).
Per-API-key option — semantic=on|off in MCP_API_KEYS.
Server default — SEMANTIC_DEFAULT=on|off (default on).

Each tool-call audit row records the resolved include_semantic value so you can correlate query quality with semantic exposure after the fact.

Optional: OpenTelemetry tracing

Optional: Output PII masking

How it works

Security

Contributing

Issues and PRs welcome. Start with CONTRIBUTING.md — it covers the dev workflow, the adapter contract, and how to add a new warehouse. By participating you agree to the Code of Conduct.

License

Apache-2.0

Warehouse Mcp

warehouse-mcp

Demo Videos

Supported warehouses

Tools exposed (16, all read-only)

Roles (five tiers)

Sample mcp connection json(change based on your role):

Quick start

Option A — Docker compose (5-minute demo with seeded Postgres)

Option B — Docker against your own warehouse

Option C — npx (no container)

Local development

env setup

Optional: Semantic metadata (v0.4+)

Per-session toggle

Optional: OpenTelemetry tracing

Optional: Output PII masking

How it works

Security

Contributing

License

Configuration

Warehouse Mcp

warehouse-mcp

Demo Videos

Supported warehouses

Tools exposed (16, all read-only)

Roles (five tiers)

Sample mcp connection json(change based on your role):

Quick start

Option A — Docker compose (5-minute demo with seeded Postgres)

Option B — Docker against your own warehouse

Option C — npx (no container)

Local development

env setup

Optional: Semantic metadata (v0.4+)

Per-session toggle

Optional: OpenTelemetry tracing

Optional: Output PII masking

How it works

Security

Contributing

License

Configuration

Related Databases MCP Servers

Related Databases MCP Servers