Openrlhf Training

250 installs9.2k stars

Summary

This is a Ray-based RLHF framework built for training large models (7B-70B+) with vLLM inference acceleration. It supports PPO, GRPO, RLOO, and DPO algorithms in one package, claiming 2× speedup over DeepSpeedChat through distributed architecture and GPU resource sharing via ZeRO-3. The standout feature is the Hybrid Engine that lets vLLM and DeepSpeed share GPUs through sleep modes, which matters when you're running actor, critic, reward, and reference models simultaneously. Choose GRPO if you want to skip the critic model and save memory, or stick with PPO for maximum control. The setup is Docker-heavy and Ray-centric, so expect some infrastructure overhead compared to simpler frameworks like TRL, but that's the trade-off for scaling to multi-node clusters.

Install to Claude Code

npx -y skills add orchestra-research/ai-research-skills --skill openrlhf-training --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Files

SKILL.md

Select a file.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Openrlhf Training

Install to Claude Code

Openrlhf Training

Install to Claude Code

Recommended

Recommended