It is a Reinforcement Learning algorithm that improves a policy by comparing several sampled behaviors for the same input.
GRPO is closely related to PPO, or Proximal Policy Optimization, but it estimates the quality of an action using rewards relative to a group of sampled alternatives.