This is a comprehensive PyTorch training reference that covers everything from optimizer selection (Muon vs AdamW for different parameter types) to domain-specific patterns for LLMs, vision, diffusion, and biomedical applications. It includes actual code snippets for training loops, learning rate schedules, and mixed precision setup, plus decision tables for architecture selection based on data scale. The scaling laws section gives you Chinchilla-optimal token counts, and there's practical debugging advice for loss spikes and OOM errors. What makes this useful is the specificity: it tells you to use lr * (d_model / 768)^(-0.5) for dimension scaling and eps=1e-10 for AdamW in bfloat16, not just "tune your hyperparameters." If you're setting up training from scratch or debugging why your loss won't converge, this beats hunting through papers and GitHub issues.
npx -y skills add orchestra-research/ai-research-skills --skill ml-training-recipes --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
sickn33/antigravity-awesome-skills
moizibnyousaf/ai-agent-skills
github/awesome-copilot