This is a Ray-based RLHF framework built for training large models (7B-70B+) with vLLM inference acceleration. It supports PPO, GRPO, RLOO, and DPO algorithms in one package, claiming 2× speedup over DeepSpeedChat through distributed architecture and GPU resource sharing via ZeRO-3. The standout feature is the Hybrid Engine that lets vLLM and DeepSpeed share GPUs through sleep modes, which matters when you're running actor, critic, reward, and reference models simultaneously. Choose GRPO if you want to skip the critic model and save memory, or stick with PPO for maximum control. The setup is Docker-heavy and Ray-centric, so expect some infrastructure overhead compared to simpler frameworks like TRL, but that's the trade-off for scaling to multi-node clusters.
npx -y skills add orchestra-research/ai-research-skills --skill openrlhf-training --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
sickn33/antigravity-awesome-skills
moizibnyousaf/ai-agent-skills
github/awesome-copilot