Source record / Research

[2605.20722] AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

Researchers introduced Adaptive Group Policy Optimization (AGPO) for reinforcement learning in large language models.

Why this matters

AGPO outperforms existing methods, improving performance on math and STEM benchmarks significantly.

Source check

This record is extracted from a published AI Today issue and tied to the original source URL. Treat the source as the record of evidence for the summary.

Open original source (opens in new tab)