A comprehensive Datadog integration that exposes 165 tools across metrics, monitors, logs, APM, RUM, incidents, CI/CD, and fleet automation. Unlike read-only alternatives, this supports full SLO lifecycle operations (create, update, delete), fleet deployment scheduling, and status page management. The aggregation tools are the standout feature: analyze-monitor-state and slo-compliance-snapshot collapse 5-7 sequential API calls into single structured responses with partial failure handling. Ships with category toggles and field projection to keep token usage manageable despite the wide API surface. Self-hosted via stdio or HTTP, so you control the deployment and can point it at internal or sovereign Datadog sites. Reach for this when you need write operations or fleet automation that the official Bits AI MCP doesn't cover.
The Datadog MCP that answers "why is this happening?" — not just "what's the value?"
Aggregation tools that fold 5–7 sequential API calls into one structured response. Full SLO CRUD. Fleet automation. The widest Datadog API coverage in any MCP — 163 tools built on the @us-all MCP standard.
analyze-monitor-state and slo-compliance-snapshot collapse 5–7 sequential API calls into one structured response with a caveats array for partial failures. No other Datadog MCP ships this pattern.extractFields projection, DD_TOOLS/DD_DISABLE 16-category toggles, and a search-tools meta-tool keep LLM context low across 163 tools.slo-compliance-snapshot renders as a visual card on ChatGPT clients via _meta["openai/outputTemplate"]. Claude clients receive the same JSON content (non-breaking).MCP_TRANSPORT=http for ChatGPT Apps SDK or remote clients (Bearer auth via MCP_HTTP_TOKEN).Connect the server to Claude Desktop or Claude Code, then paste any of these:
checkout-service. Pull the linked monitors, the recent error spikes from APM, and which deployments touched the service in the last 24h."datadog-agent 7.55.0 rollout to the staging cluster, weekends only, starting next Saturday."Datadog's official MCP (Bits AI MCP, GA 2026-03-09) is complementary, not a replacement:
| Official Datadog MCP | @us-all/datadog-mcp (this) | |
|---|---|---|
| Tool count | 16+ core toolsets | 163 tools across full API surface |
| Deployment | Remote (managed by Datadog) | Self-host stdio (npx / Docker / npm) |
| Auth | Datadog SSO | API + APP key |
| Sites | Public Datadog sites | Any site, incl. internal/sovereign; US5 default |
| SLO writes | ❌ | ✅ create/update/delete SLOs + corrections |
| Fleet automation | ❌ | ✅ 15 tools |
| Status pages | ❌ | ✅ 21 tools |
| Aggregation tools | ❌ | ✅ analyze-monitor-state, slo-compliance-snapshot |
| MCP Prompts | ❌ | ✅ 4 (triage-incident, audit-monitor-noise, analyze-rum-error-spike, investigate-slow-trace) |
| MCP Resources | ❌ | ✅ dd://service/{serviceName}, dd://team/{teamId}, dd://synthetics/{testId}, etc. |
Use the official Bits AI MCP for fast managed onboarding and SSO. Use this when you need full API coverage, SLO/fleet/status-page write parity, or self-hosting (internal sites, isolated networks, dev/CI sandboxes).
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"datadog": {
"command": "npx",
"args": ["-y", "@us-all/datadog-mcp"],
"env": {
"DD_API_KEY": "<your-api-key>",
"DD_APP_KEY": "<your-app-key>",
"DD_SITE": "datadoghq.com"
}
}
}
}
claude mcp add datadog -s user \
-e DD_API_KEY=<your-api-key> -e DD_APP_KEY=<your-app-key> -e DD_SITE=datadoghq.com \
-- npx -y @us-all/datadog-mcp
docker run -e DD_API_KEY=... -e DD_APP_KEY=... -e DD_SITE=datadoghq.com \
ghcr.io/us-all/datadog-mcp-server:latest
git clone https://github.com/us-all/datadog-mcp-server.git
cd datadog-mcp-server && pnpm install && pnpm build
node dist/index.js
| Variable | Required | Default | Description |
|---|---|---|---|
DD_API_KEY | ✅ | — | Datadog API key |
DD_APP_KEY | ✅ | — | Datadog Application key |
DD_SITE | ❌ | us5.datadoghq.com | Datadog site (see table below) |
DD_ALLOW_WRITE | ❌ | false | Set true to enable mutations (create/update/delete) |
DD_TOOLS | ❌ | — | Comma-sep allowlist of categories. Only these load — biggest token saver. |
DD_DISABLE | ❌ | — | Comma-sep denylist. Ignored when DD_TOOLS is set. |
MCP_TRANSPORT | ❌ | stdio | http to enable Streamable HTTP transport |
MCP_HTTP_TOKEN | conditional | — | Bearer token. Required when MCP_TRANSPORT=http |
MCP_HTTP_PORT | ❌ | 3000 | HTTP listen port |
MCP_HTTP_HOST | ❌ | 127.0.0.1 | HTTP bind host (DNS rebinding protection auto-enabled for localhost) |
MCP_HTTP_SKIP_AUTH | ❌ | false | Skip Bearer auth — e.g. behind a reverse proxy that handles it |
Categories (16): metrics, monitors, dashboards, logs, apm, rum, incidents, security, synthetics, ci, infra, fleet, status-pages, oncall, teams, account.
When MCP_TRANSPORT=http: POST /mcp (Bearer-auth JSON-RPC) + GET /health (public liveness).
Sites:
| Site | Value | Region |
|---|---|---|
| US1 | datadoghq.com | US (Virginia) |
| US3 | us3.datadoghq.com | US (Virginia) |
| US5 | us5.datadoghq.com | US (Oregon) |
| EU1 | datadoghq.eu | EU (Frankfurt) |
| AP1 | ap1.datadoghq.com | Asia-Pacific (Tokyo) |
Naive setup loads ~25K tokens of tool schema before any conversation. Three knobs mitigate:
| Scenario | Tools | Schema tokens | vs default |
|---|---|---|---|
| default (all categories) | 163 | 25,200 | — |
typical (DD_TOOLS=metrics,monitors,logs,apm,dashboards) | 55 | 9,300 | −63% |
narrow (DD_TOOLS=metrics,monitors) | 24 | 3,800 | −85% |
DD_TOOLS=metrics,monitors,logs,apm (biggest win).extractFields response projection — get-dashboard { dashboardId: "abc", extractFields: "id,title,widgets.*.definition.type" }.search-tools meta-tool — always enabled; lets the LLM discover tools at runtime instead of preloading all schemas.By default, all writes are blocked to prevent accidental mutations by AI agents. The following require DD_ALLOW_WRITE=true:
create-monitor, update-monitor, delete-monitor, mute-monitor, create-dashboard, update-dashboard, delete-dashboard, send-logs, post-event, trigger-synthetics, create-synthetics-test, update-synthetics-test, delete-synthetics-test, create-downtime, cancel-downtime, create-case, update-case-status, send-dora-deployment, send-dora-incident, create-slo, update-slo, delete-slo, plus all fleet/status-page/security writes.
Workflow templates the model can invoke directly:
triage-incident — given an incident ID, walks linked monitors, recent error spikes, and recent deploys.audit-monitor-noise — flag flapping monitors via alert frequency × MTTR.analyze-rum-error-spike — diff RUM error rates across two windows, attribute to top error groups.investigate-slow-trace — given a slow trace ID, traverse the span tree and surface bottleneck spans.Read-only entities by URI: dd://monitor/{id}, dd://dashboard/{id}, dd://slo/{id}, dd://incident/{id}, dd://service/{serviceName}, dd://team/{teamId} (team + members), dd://synthetics/{testId}, dd://host/{name}.
163 tools across 16 categories. Use the search-tools meta-tool to discover at runtime; the full list is collapsed below.
| Domain | Tools |
|---|---|
| Status Pages | 21 |
| RUM (events + apps + metrics + retention) | 27 |
| Metrics, Hosts, SLOs, Downtimes, Containers, Processes | 19 |
| Fleet Automation | 15 |
| Synthetics, Logs/Spans Metrics, SLO Corrections | 16 |
| Monitors, Dashboards, Notebooks, Events | 16 |
| Incidents, Cases, Error Tracking, Audit | 13 |
| OnCall, Teams, Users, Services, Bots | 11 |
| Security signals + rules + suppressions | 9 |
| APM, CI Visibility, DORA, Network Devices | 9 |
| + aggregations | analyze-monitor-state, slo-compliance-snapshot |
| + meta | search-tools |
query-metrics, get-metrics, get-metric-metadata, list-active-metrics, list-metric-tags
get-monitors, get-monitor, create-monitor, update-monitor, delete-monitor, mute-monitor, validate-monitor, analyze-monitor-state (aggregation)
get-dashboards, get-dashboard, create-dashboard, update-dashboard, delete-dashboard
search-logs, aggregate-logs, send-logs
get-events, post-event
get-incidents, get-incident, search-incidents, create-incident, update-incident, delete-incident
search-spans
search-rum-events, aggregate-rum, list-rum-applications, get-rum-application, create-rum-application, update-rum-application, delete-rum-application, list-rum-metrics, get-rum-metric, create-rum-metric, update-rum-metric, delete-rum-metric, list-rum-retention-filters, get-rum-retention-filter, create-rum-retention-filter, update-rum-retention-filter, delete-rum-retention-filter
list-slos, get-slo, get-slo-history, create-slo, update-slo, delete-slo, slo-compliance-snapshot (aggregation), plus 5 SLO-correction tools
list-synthetics, get-synthetics-result, trigger-synthetics, create-synthetics-test, update-synthetics-test, delete-synthetics-test
list-hosts, get-host-totals, list-containers, list-processes
list-downtimes, create-downtime, cancel-downtime
search-security-signals, get-security-signal, list-security-rules, get-security-rule, delete-security-rule, list-security-suppressions, get-security-suppression, create-security-suppression, delete-security-suppression
search-ci-pipelines, aggregate-ci-pipelines, search-ci-tests, aggregate-ci-tests
list-cases, get-case, create-case, update-case-status
list-error-tracking-issues, get-error-tracking-issue
send-dora-deployment, send-dora-incident
list-network-devices, get-network-device
list-notebooks, get-notebook
get-team-oncall, get-oncall-schedule
list-services, get-service-definition
list-teams, get-team, create-team, update-team, delete-team, get-team-members
get-usage-summary, list-users
5 each for logs-metrics, spans-metrics, apm-retention-filters (list/get/create/update/delete)
Full lifecycle: pages, components, degradations, maintenances. See src/tools/status-pages.ts.
Agents, deployments, schedules. See src/tools/fleet.ts.
search-audit-logs
search-tools — query other tools by keyword; always enabled regardless of DD_TOOLS.
Claude → MCP stdio → index.ts → tools/*.ts → @datadog/datadog-api-client → Datadog API
Built on @us-all/mcp-toolkit:
extractFields — token-efficient response projectionsaggregate(fetchers, caveats) — fan-out helper for aggregation toolscreateWrapToolHandler — domain-specific redaction (DD_API_KEY/DD_APP_KEY) + Datadog ApiException error extractionsearch-tools meta-toolNode.js 22+ • TypeScript strict ESM • pnpm • @modelcontextprotocol/sdk • @datadog/datadog-api-client (official) • zod • dotenv • vitest + dd-trace.
See CONTRIBUTING.md. New shared patterns belong in @us-all/mcp-toolkit — single source of truth for the 7-server suite.
DD_API_KEY*secretDatadog API key
DD_APP_KEY*secretDatadog application key
DD_SITEdefault: datadoghq.comDatadog site (datadoghq.com, datadoghq.eu, us3.datadoghq.com, etc.)
DD_TOOLSComma-separated category allowlist (e.g. metrics,monitors,logs). Default: all categories enabled.
DD_DISABLEComma-separated category disablelist.
DD_ALLOW_WRITEdefault: falseSet to 'true' to enable write/destructive tools. Default read-only.
io.github.infoinlet-marketplace/mcp-observability
betterdb-inc/monitor
com.mcparmory/datadog
thotischner/observability-mcp
io.github.tantiope/datadog-mcp
oaslananka/mcp-health-monitor