A solid pattern for when you need quality gates on implementation work. This spins up three agents: a meta-judge to write evaluation criteria, an implementation agent to do the actual work, and a judge to verify against those criteria. The orchestrator (you) just coordinates and never touches code directly, which keeps contexts clean. It runs meta-judge and implementation in parallel, then loops up to twice if the judge fails the work. The rubric generation before judging is smart since generic code review often misses task-specific requirements. Main tradeoff is token cost from running multiple agents, but you get consistent verification and feedback loops without polluting your main context with implementation details.
npx -y skills add neolabhq/context-engineering-kit --skill do-and-judge --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills