Dd Apm

810 installs125 stars

Summary

This handles Datadog APM setup and trace analysis, with a strong focus on getting Single Step Instrumentation working correctly in Kubernetes. The skill is opinionated about common failures (like using the wrong Helm chart or accidentally having ddtrace dependencies that silently break SSI) and walks you through operator install, DatadogAgent CR configuration, and pod restarts. It also covers Linux SSI and includes pup commands for searching traces, analyzing service dependencies, and debugging performance issues. The routing logic is detailed, maybe overly so, but if you've ever fought with Kubernetes auto-instrumentation that looks like it worked but didn't, you'll appreciate the paranoia.

Install to Claude Code

npx -y skills add datadog-labs/agent-skills --skill dd-apm --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Files

SKILL.mdView on GitHub

Datadog APM

Distributed tracing, service maps, and performance analysis.

Routing — Read This First

Match the user's request to one of the entries below. Each entry has the same shape: triggers → which sub-skill to load → the anti-pattern to avoid. If a request seems to fit more than one entry, see "Overlap disambiguation". If nothing matches, see "None of the above" at the end.

Kubernetes APM install / instrument / onboard — trigger when the user mentions Kubernetes, K8s, EKS, GKE, AKS, kind, minikube, K3s, helm, DatadogAgent CR, kubectl, SSI on a cluster, pod injection, or init containers.

Immediately read .claude/skills/dd-apm/k8s-ssi/agent-install/SKILL.md now, then .claude/skills/dd-apm/k8s-ssi/enable-ssi/SKILL.md, then .claude/skills/dd-apm/k8s-ssi/verify-ssi/SKILL.md — do not proceed from memory.

Common wrong approaches that LOOK like they work but silently fail:

helm install datadog datadog/datadog — the standard chart does NOT support SSI via DatadogAgent CR.

Adding ddtrace imports or ddtrace-run to the app — SSI auto-instruments WITHOUT any code changes.

admission.datadoghq.com/enabled annotations — that's admission controller config injection, not SSI init container injection.

Linux APM install / instrument / onboard — trigger when the user mentions a single host, VM, EC2 instance, bare-metal, RHEL/Ubuntu/Debian, systemd, or no orchestrator.

Immediately read .claude/skills/dd-apm/linux-ssi/agent-install/SKILL.md now, then .claude/skills/dd-apm/linux-ssi/enable-ssi/SKILL.md, then .claude/skills/dd-apm/linux-ssi/verify-ssi/SKILL.md — do not proceed from memory.

Do NOT install the agent via plain apt-get install datadog-agent (or yum equivalent) and assume SSI follows — host auto-instrumentation requires the install script with the SSI flags, which the sub-skill walks through.

Service rename / service remapping — trigger when the user mentions renaming a service, collapsing multiple service names, stripping suffixes/prefixes, or cleaning up inferred services.

Immediately read .claude/skills/dd-apm/service-remapping/SKILL.md now — do not proceed from memory.

Do NOT change tags.datadoghq.com/service labels or DD_SERVICE env vars to rename a service in Datadog. That requires a rollout and only affects new data. Use a service remapping rule — it rewrites the name at ingestion time with no deployment change.

Overlap disambiguation

When a request could plausibly fit more than one entry above, use these tiebreakers:

Hint	Route to
Cluster orchestrator mentioned (EKS/GKE/AKS/kind/K3s/minikube) — even if "just one node"	k8s-ssi
Single host, VM, or EC2 with no orchestrator	linux-ssi
"Several services that should be one"	service-remapping — the sub-skill picks the rule type based on whether the duplicates are real instrumented services or inferred entities (DBs, queues, external APIs)
"My service shows under the wrong name"	First check `DD_SERVICE` on the deploy. If correct and the name is still wrong → service-remapping.
"Reduce APM volume / cost / noise"	No sub-skill yet. Ask whether the user means sampling (fewer ingested traces) or retention filters (less indexed data) before suggesting commands.

None of the above

If the request doesn't match any entry above, continue reading the trace-search, service analysis, and metrics content below. If even that doesn't fit, ask the user to clarify — do not invent a workflow.

Requirements

Datadog Labs Pup should be installed. See Setup Pup if not.

Command Execution Order (Token-Efficient)

For scoped commands, use this order:

Check context first (prior outputs, conversation, saved values).
If a required value is missing, run a discovery command first.
If still ambiguous, ask the user to confirm.
Then run the target command.
Avoid speculative commands likely to fail.

Quick Start

pup auth login
# Confirm env tag with the user first (do not assume production/prod/prd).
pup apm services list --env <env> --from 1h --to now
pup traces search --query "service:api-gateway" --from 1h

Services

List Services

pup apm services list --env <env> --from 1h --to now
pup apm services stats --env <env> --from 1h --to now

Service Stats

pup apm services stats --env <env> --from 1h --to now

Service Map

# View dependencies
pup apm flow-map --query "service:api-gateway&from=$(($(date +%s)-3600))000&to=$(date +%s)000" --env <env> --limit 10

Traces

Search Traces

# By service
pup traces search --query "service:api-gateway" --from 1h

# Errors only
pup traces search --query "service:api-gateway status:error" --from 1h

# Slow traces (>1s)
pup traces search --query "service:api-gateway @duration:>1000ms" --from 1h

# With specific tag
pup traces search --query "service:api-gateway @http.url:/api/users" --from 1h

Trace Detail

# No direct get command for a single trace ID.
# Use traces search with a narrow query and time window.
pup traces search --query "trace_id:<trace_id>" --from 1h

Key Metrics

Metric	What It Measures
`trace.http.request.hits`	Request count
`trace.http.request.duration`	Latency
`trace.http.request.errors`	Error count
`trace.http.request.apdex`	User satisfaction

Service Level Objectives

Link APM to SLOs:

pup slos create --file slo.json

Common Queries

Goal	Query
Slowest endpoints	`avg:trace.http.request.duration{*} by {resource_name}`
Error rate	`sum:trace.http.request.errors{} / sum:trace.http.request.hits{}`
Throughput	`sum:trace.http.request.hits{*}.as_rate()`

Troubleshooting

Problem	Fix
No traces	Check ddtrace installed, DD_TRACE_ENABLED=true
Missing service	Verify DD_SERVICE env var
Traces not linked	Check trace headers propagated
High cardinality	Don't tag with user_id/request_id

References/Docs

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Datadog APM

Distributed tracing, service maps, and performance analysis.

Routing — Read This First

Common wrong approaches that LOOK like they work but silently fail:

helm install datadog datadog/datadog — the standard chart does NOT support SSI via DatadogAgent CR.

Adding ddtrace imports or ddtrace-run to the app — SSI auto-instruments WITHOUT any code changes.

admission.datadoghq.com/enabled annotations — that's admission controller config injection, not SSI init container injection.

Linux APM install / instrument / onboard — trigger when the user mentions a single host, VM, EC2 instance, bare-metal, RHEL/Ubuntu/Debian, systemd, or no orchestrator.

Do NOT install the agent via plain apt-get install datadog-agent (or yum equivalent) and assume SSI follows — host auto-instrumentation requires the install script with the SSI flags, which the sub-skill walks through.

Service rename / service remapping — trigger when the user mentions renaming a service, collapsing multiple service names, stripping suffixes/prefixes, or cleaning up inferred services.

Immediately read .claude/skills/dd-apm/service-remapping/SKILL.md now — do not proceed from memory.

Do NOT change tags.datadoghq.com/service labels or DD_SERVICE env vars to rename a service in Datadog. That requires a rollout and only affects new data. Use a service remapping rule — it rewrites the name at ingestion time with no deployment change.

Overlap disambiguation

When a request could plausibly fit more than one entry above, use these tiebreakers:

Hint	Route to
Cluster orchestrator mentioned (EKS/GKE/AKS/kind/K3s/minikube) — even if "just one node"	k8s-ssi
Single host, VM, or EC2 with no orchestrator	linux-ssi
"Several services that should be one"	service-remapping — the sub-skill picks the rule type based on whether the duplicates are real instrumented services or inferred entities (DBs, queues, external APIs)
"My service shows under the wrong name"	First check `DD_SERVICE` on the deploy. If correct and the name is still wrong → service-remapping.
"Reduce APM volume / cost / noise"	No sub-skill yet. Ask whether the user means sampling (fewer ingested traces) or retention filters (less indexed data) before suggesting commands.

None of the above

Requirements

Datadog Labs Pup should be installed. See Setup Pup if not.

Command Execution Order (Token-Efficient)

For scoped commands, use this order:

Check context first (prior outputs, conversation, saved values).
If a required value is missing, run a discovery command first.
If still ambiguous, ask the user to confirm.
Then run the target command.
Avoid speculative commands likely to fail.

Quick Start

pup auth login
# Confirm env tag with the user first (do not assume production/prod/prd).
pup apm services list --env <env> --from 1h --to now
pup traces search --query "service:api-gateway" --from 1h

Services

List Services

pup apm services list --env <env> --from 1h --to now
pup apm services stats --env <env> --from 1h --to now

Service Stats

pup apm services stats --env <env> --from 1h --to now

Service Map

# View dependencies
pup apm flow-map --query "service:api-gateway&from=$(($(date +%s)-3600))000&to=$(date +%s)000" --env <env> --limit 10

Traces

Search Traces

# By service
pup traces search --query "service:api-gateway" --from 1h

# Errors only
pup traces search --query "service:api-gateway status:error" --from 1h

# Slow traces (>1s)
pup traces search --query "service:api-gateway @duration:>1000ms" --from 1h

# With specific tag
pup traces search --query "service:api-gateway @http.url:/api/users" --from 1h

Trace Detail

# No direct get command for a single trace ID.
# Use traces search with a narrow query and time window.
pup traces search --query "trace_id:<trace_id>" --from 1h

Key Metrics

Metric	What It Measures
`trace.http.request.hits`	Request count
`trace.http.request.duration`	Latency
`trace.http.request.errors`	Error count
`trace.http.request.apdex`	User satisfaction

Service Level Objectives

Link APM to SLOs:

pup slos create --file slo.json

Common Queries

Goal	Query
Slowest endpoints	`avg:trace.http.request.duration{*} by {resource_name}`
Error rate	`sum:trace.http.request.errors{} / sum:trace.http.request.hits{}`
Throughput	`sum:trace.http.request.hits{*}.as_rate()`

Troubleshooting

Problem	Fix
No traces	Check ddtrace installed, DD_TRACE_ENABLED=true
Missing service	Verify DD_SERVICE env var
Traces not linked	Check trace headers propagated
High cardinality	Don't tag with user_id/request_id

Dd Apm

Install to Claude Code

Datadog APM

Routing — Read This First

Overlap disambiguation

None of the above

Requirements

Command Execution Order (Token-Efficient)

Quick Start

Services

List Services

Service Stats

Service Map

Traces

Search Traces

Trace Detail

Key Metrics

Service Level Objectives

Common Queries

Troubleshooting

References/Docs

Dd Apm

Install to Claude Code

Datadog APM

Routing — Read This First

Overlap disambiguation

None of the above

Requirements

Command Execution Order (Token-Efficient)

Quick Start

Services

List Services

Service Stats

Service Map

Traces

Search Traces

Trace Detail

Key Metrics

Service Level Objectives

Common Queries

Troubleshooting

References/Docs

Recommended

Recommended