Exposes the Socrata SODA API across hundreds of government open data portals, from municipal crime stats to federal spending datasets. You get six tools covering the full workflow: list portals, search datasets by keyword or category, fetch typed column schemas, execute SoQL queries with WHERE/GROUP BY/aggregation, and optionally spill large result sets into DuckDB for analytical SQL. The schema inspection tool is critical because Socrata column types determine query syntax (bare numbers vs. quoted strings). When result sets exceed 5,000 rows, the server can register them with DataCanvas and let you run full SQL instead of paginating through SoQL. Reach for this when you need structured access to civic data without manually navigating web portals or writing raw HTTP calls.
Search and query government open-data portals (Socrata SODA API) via MCP. STDIO or Streamable HTTP.
Six tools covering the full Socrata workflow — portal discovery, dataset search, schema inspection, SoQL querying, and DuckDB-powered analytical SQL over large result sets:
| Tool | Description |
|---|---|
socrata_list_portals | List known Socrata-powered government open-data portals with domain, organization name, and dataset count |
socrata_find_datasets | Search for datasets across all Socrata portals or scope to one portal via the Discovery API |
socrata_get_dataset | Fetch full metadata and typed column schema for a dataset by ID — required before writing SoQL queries |
socrata_query_dataset | Execute a SoQL query against any dataset: search, select, where, group, having, order, with DataCanvas spillover |
socrata_dataframe_describe | List registered tables in a DataCanvas session — schema, row count, column names |
socrata_dataframe_query | Run SELECT-only SQL against DataCanvas tables populated by socrata_query_dataset |
socrata_list_portalsList known Socrata-powered government open-data portals.
socrata_find_datasets), organization name, and dataset countsocrata_find_datasetsSearch for datasets across all Socrata portals or scope to a single portal.
domain parameter["Public Safety", "Transportation"]) and tags (e.g. ["covid19"])socrata_get_dataset for typed schema before writing queriessocrata_get_datasetFetch full metadata and column schema for a Socrata dataset by ID.
data_type determines correct WHERE clause syntax: Number → bare literals (year=2023), Text → single-quoted strings (year='2023'):@computed_region_*) to reduce noisesocrata_query_dataset querysocrata_query_datasetExecute a SoQL query against any dataset on any Socrata portal.
search parameter for quick full-text lookup across all text columns ($q)select, where, group, having, order for full analytical control=, !=, >, <, LIKE, IN(...), BETWEEN, IS NULL, starts_with(), contains(), AND, OR, NOTcount(*), sum(), avg(), min(), max() with group and havingtotal_count returned when result is truncatedassembled_query in the response echoes the SoQL string for learning the syntaxCANVAS_PROVIDER_TYPE=duckdb and result hits the limit, rows spill to a DataCanvas table for SQL-based analysissocrata_dataframe_describeList registered tables in a DataCanvas session.
CANVAS_PROVIDER_TYPE=duckdb is setsocrata_query_dataset spills a large result setsocrata_dataframe_querysocrata_dataframe_queryRun SELECT-only SQL against DataCanvas tables populated by socrata_query_dataset.
year > 2020, amount < 500)read_csv, read_parquet) are rejectedCANVAS_PROVIDER_TYPE=duckdb is set| Type | Name | Description |
|---|---|---|
| Resource | socrata://datasets/{domain}/{datasetId} | Fetch full metadata and column schema for a dataset by stable URI — same payload as socrata_get_dataset |
| Resource | socrata://portals | Paginated list of known Socrata portals with organization name and dataset count |
| Prompt | explore_open_data | Structured six-step civic data investigation workflow: find portal → discover datasets → inspect schema → query → aggregate → synthesize |
All resource data is also reachable via tools. Use the corresponding tool for agent workflows — resources are for clients that support URI-addressable data.
Built on @cyanheads/mcp-ts-core:
none, jwt, oauthin-memory, filesystem, Supabase, Cloudflare KV/R2/D1Socrata-specific:
SOCRATA_APP_TOKEN) for higher per-IP rate limitsSOCRATA_DEFAULT_DOMAINAgent-friendly output:
socrata_query_dataset response so agents can learn and refine syntaxinvalid_id, not_found, soql_error, rate_limited) with actionable recovery textAdd the following to your MCP client configuration file.
{
"mcpServers": {
"socrata-mcp-server": {
"type": "stdio",
"command": "bunx",
"args": ["@cyanheads/socrata-mcp-server@latest"],
"env": {
"MCP_TRANSPORT_TYPE": "stdio",
"MCP_LOG_LEVEL": "info"
}
}
}
}
Or with npx (no Bun required):
{
"mcpServers": {
"socrata-mcp-server": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@cyanheads/socrata-mcp-server@latest"],
"env": {
"MCP_TRANSPORT_TYPE": "stdio",
"MCP_LOG_LEVEL": "info"
}
}
}
}
Or with Docker:
{
"mcpServers": {
"socrata-mcp-server": {
"type": "stdio",
"command": "docker",
"args": [
"run", "-i", "--rm",
"-e", "MCP_TRANSPORT_TYPE=stdio",
"ghcr.io/cyanheads/socrata-mcp-server:latest"
]
}
}
}
For Streamable HTTP, set the transport and start the server:
MCP_TRANSPORT_TYPE=http MCP_HTTP_PORT=3010 bun run start:http
# Server listens at http://localhost:3010/mcp
git clone https://github.com/cyanheads/socrata-mcp-server.git
cd socrata-mcp-server
bun install
cp .env.example .env
# edit .env and set SOCRATA_APP_TOKEN if you have one
All configuration is validated at startup via Zod schemas in src/config/server-config.ts. Key environment variables:
| Variable | Description | Default |
|---|---|---|
SOCRATA_APP_TOKEN | Socrata app token (X-App-Token header). Without a token, requests share a throttled pool per source IP. | — |
SOCRATA_DEFAULT_DOMAIN | Default portal domain when domain is omitted from tool calls. | data.seattle.gov |
MCP_TRANSPORT_TYPE | Transport: stdio or http. | stdio |
MCP_HTTP_PORT | Port for HTTP server. | 3010 |
MCP_AUTH_MODE | Auth mode: none, jwt, or oauth. | none |
MCP_LOG_LEVEL | Log level (RFC 5424): debug, info, notice, warning, error. | info |
CANVAS_PROVIDER_TYPE | Set to duckdb to enable DataCanvas spillover for large result sets. | — |
LOGS_DIR | Directory for log files (Node.js only). | <project-root>/logs |
STORAGE_PROVIDER_TYPE | Storage backend: in-memory, filesystem, supabase, cloudflare-kv/r2/d1. | in-memory |
OTEL_ENABLED | Enable OpenTelemetry instrumentation. | false |
See .env.example for the full list of optional overrides.
Build and run:
# One-time build
bun run rebuild
# Run the built server
bun run start:stdio
# or
bun run start:http
Run checks and tests:
bun run devcheck # Lint, format, typecheck, security audit
bun run test # Vitest test suite
docker build -t socrata-mcp-server .
docker run --rm -e MCP_TRANSPORT_TYPE=http -p 3010:3010 socrata-mcp-server
The Dockerfile defaults to HTTP transport, stateless session mode, and logs to /var/log/socrata-mcp-server. OpenTelemetry peer dependencies are installed by default — build with --build-arg OTEL_ENABLED=false to omit them.
| Directory | Purpose |
|---|---|
src/index.ts | createApp() entry point — registers tools, resources, prompts, and inits the Socrata service. |
src/config | Server-specific environment variable parsing and validation with Zod. |
src/mcp-server/tools | Tool definitions (*.tool.ts). Six tools covering portal listing, dataset search, schema fetch, SoQL query, and DataCanvas SQL. |
src/mcp-server/resources | Resource definitions (*.resource.ts). Dataset metadata and portal catalog resources. |
src/mcp-server/prompts | Prompt definitions (*.prompt.ts). Civic data investigation workflow prompt. |
src/services/socrata | Socrata service layer — SODA 2.1 API client, Discovery API, query builder, type normalization. |
tests/ | Unit and integration tests mirroring src/. |
See CLAUDE.md for development guidelines and architectural rules. The short version:
try/catch in tool logicctx.log for request-scoped logging, ctx.state for tenant-scoped storagesocrata_get_dataset before writing WHERE clauses — column data_type determines quotingIssues and pull requests are welcome. Run checks and tests before submitting:
bun run devcheck
bun run test
Apache-2.0 — see LICENSE for details.
SOCRATA_APP_TOKENSocrata app token (X-App-Token header). Without a token, requests share a throttled pool per source IP. Free to register at any Socrata portal.
SOCRATA_DEFAULT_DOMAINdefault: data.seattle.govDefault portal domain when domain is omitted from tool calls (e.g. data.seattle.gov, data.cityofnewyork.us).
MCP_LOG_LEVELdefault: infoSets the minimum log level for output (e.g., 'debug', 'info', 'warn').
MCP_HTTP_HOSTdefault: 127.0.0.1The hostname for the HTTP server.
MCP_HTTP_PORTdefault: 3010The port to run the HTTP server on.
MCP_HTTP_ENDPOINT_PATHdefault: /mcpThe endpoint path for the MCP server.
MCP_AUTH_MODEdefault: noneAuthentication mode to use: 'none', 'jwt', or 'oauth'.
com.mcparmory/google-search
io.github.pipeworx-io/brave-search
marcopesani/mcp-server-serper
brave/brave-search-mcp-server
com.mcparmory/google-search-console
acamolese/google-search-console-mcp