This gives Claude direct control over Arize's experiment workflow through the ax CLI. It can create experiments against your datasets, export runs for analysis, and compare model outputs with evaluations like correctness or relevance. The skill enforces real API calls for every dataset example, no fake outputs or scores, which matters when you're actually benchmarking models. It handles both REST and Arrow Flight exports, automatically escalating to Flight when you hit the 500-run pagination limit. Good for A/B testing different prompts or models, running evals at scale, or pulling experiment data into your own analysis pipeline. Assumes you already have ax installed and an Arize profile configured.
npx -y skills add arize-ai/arize-skills --skill arize-experiment --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills