Serving Llms Vllm

330 installs9.2k stars

Summary

If you need to serve LLMs at scale without melting your infrastructure, this is the go-to solution. vLLM delivers 24x higher throughput than standard transformers by using PagedAttention for memory-efficient KV caching and continuous batching that mixes prefill and decode requests. The skill wraps the Python library so you can spin up high-performance inference servers or run offline batches. It's built for production workloads where you're actually paying attention to tokens per second and GPU utilization. The installation is straightforward, and you get access to the same engine that powers a lot of commercial LLM APIs. Worth noting the skill comes from orchestra-research's AI research collection, so expect research-grade tooling rather than hand-holding docs.

Install to Claude Code

npx -y skills add orchestra-research/ai-research-skills --skill serving-llms-vllm --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Files

SKILL.md

Select a file.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Serving Llms Vllm

Install to Claude Code

Serving Llms Vllm

Install to Claude Code

Recommended

Recommended