If you're working with transformer models and attention is your bottleneck, this implements Flash Attention to speed things up 2-4x while cutting memory usage by 10-20x. It uses IO-aware tiling and recomputation tricks under the hood. The skill shows you how to use PyTorch's native scaled_dot_product_attention (which auto-detects Flash Attention support) plus presumably the standalone flash-attn library for more control. Worth noting this has passed multiple security audits and comes from a repo with solid GitHub traction. The performance gains are real and measurable, especially on longer sequences where standard attention becomes prohibitively expensive.
npx -y skills add orchestra-research/ai-research-skills --skill optimizing-attention-flash --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
sickn33/antigravity-awesome-skills
moizibnyousaf/ai-agent-skills
github/awesome-copilot