This pulls together the main techniques for extending transformer context windows beyond their default limits. You get implementations of RoPE, YaRN, ALiBi, and position interpolation, which are the go-to methods when you need to push models like LLaMA or Mistral from their standard 4k-8k token limits up to 32k, 64k, or beyond. It's aimed at the fine-tuning and deployment stage rather than training from scratch. The approach here is practical: minimal compute overhead while maintaining model quality across longer sequences. If you're hitting context limits on a pre-trained model and don't want to retrain, this gives you the cookbook of what actually works in production.
npx -y skills add davila7/claude-code-templates --skill long-context --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills