This is a practical skill for running quantized LLMs when you're hitting memory limits. It covers GPTQ, a post-training quantization method that compresses models down to 4-bit precision with minimal accuracy loss. The guidance is useful: it tells you when to use GPTQ versus AWQ, with specific hardware examples like fitting 70B+ models on consumer GPUs like the RTX 4090. You get about 4× memory reduction and 3-4× inference speedup compared to FP16. The skill has solid traction with 282 installs and comes from a well-starred template repository. Good reference if you're optimizing deployment costs or working with limited hardware.
npx -y skills add davila7/claude-code-templates --skill gptq --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills