Group Relative Policy Optimization (GRPO)

It is a Reinforcement Learning algorithm that improves a policy by comparing several sampled behaviors for the same input.

GRPO is closely related to PPO, or Proximal Policy Optimization, but it estimates the quality of an action using rewards relative to a group of sampled alternatives.

Boyang Yan

Explorer

Group Relative Policy Optimization (GRPO)

Graph View

Backlinks