If you're diving into mechanistic interpretability or trying to understand what's actually going on inside neural networks, this gets you up and running with SAELens, Anthropic's library for training sparse autoencoders. The core idea is decomposing polysemantic neurons (ones that fire for multiple unrelated concepts) into sparse, interpretable features. It's based on research showing you can extract monosemantic features from superposition, making model internals way more readable. The skill wraps the SAELens library (1,100+ GitHub stars), so you're working with battle-tested code. Honestly most useful if you're doing AI safety research or need to peek under the hood of transformers, less so for standard ML workflows.
npx -y skills add davila7/claude-code-templates --skill sparse-autoencoder-training --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills