If you're running human review loops on LLM outputs in Arize, this handles the setup work. You define label schemas (categorical like correct/incorrect, continuous scores, or freeform text) and create queues that route spans or dataset examples to reviewers. The CLI covers the full CRUD cycle for both configs and queues, plus you can bulk annotate spans through the Python SDK. It's the bridge between your traces and the people who need to label them. The docs are thorough on the schema options and queue assignment logic, which matters when you're coordinating multiple reviewers across different label types.
npx -y skills add arize-ai/arize-skills --skill arize-annotation --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills