This is a template for setting up vLLM, the high-performance LLM serving framework that uses PagedAttention and continuous batching to achieve significantly higher throughput than standard transformers. The skill gives you code snippets for both offline inference and OpenAI-compatible server setup. It's straightforward if you need to serve open models like Llama with better resource utilization. The repository it comes from has solid traction with 27.7K stars, though the skill itself shows basic examples rather than production configurations. Worth grabbing if you're moving beyond HuggingFace transformers and want faster inference without diving into vLLM's full documentation.
npx -y skills add davila7/claude-code-templates --skill serving-llms-vllm --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills