Production fork of slime for enterprise RL training on massive MoE models. The real draw here is unified FP8 training and inference with bit-wise alignment between SGLang rollouts and Megatron training, which prevents the routing inconsistencies that kill MoE stability. Supports INT4 quantization-aware training (fit 1TB models on single H200s) and speculative RL with EAGLE for 25%+ rollout speedup. If you're training DeepSeek V3 or Qwen3-MoE scale models and need production stability over research flexibility, this is the tool. Includes Rollout Routing Replay to record and replay expert routing decisions, eliminating quantization-induced discrepancies that cause RL collapse.
npx -y skills add orchestra-research/ai-research-skills --skill miles-rl-training --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
sickn33/antigravity-awesome-skills
moizibnyousaf/ai-agent-skills
github/awesome-copilot