A disciplined take on iterating skills and workflow docs with actual benchmarks instead of vibes. You freeze a charter and evaluator first, baseline the current artifact, make one change at a time, and keep only measured improvements with append-only logs. The eligibility gate is smart: it can conclude "no ratchet justified" or "support surfaces drifted but the core is fine" instead of pretending every run needs mutation work. Works best when you have 3-5 representative prompts and deterministic checks. Explicitly scopes out GPU-bound ML experiments and hosted eval platforms like LangSmith or Promptfoo. If you have been rewriting SKILL.md files by feel and want git-friendly proof that edits actually helped, this is the structure.
npx -y skills add akillness/oh-my-skills --skill skill-autoresearch --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills