This is a full async RL framework for training personalized AI agents from conversation feedback without blocking inference. It runs four independent loops: serving, rollout collection, judge evaluation, and policy training via GRPO or on-policy distillation. You get plugin APIs for custom loss functions and reward models, plus ready-to-run scripts for terminal, GUI, SWE, and tool-call agents. The combined method (binary RL plus OPD) is the recommended approach. Deployment works locally or on Tinker cloud via Ray. If you're trying to improve an agent through actual usage rather than static datasets, this gives you the scaffolding to do continuous learning in the background.
npx -y skills add aradotso/trending-skills --skill openclaw-rl-training --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills