[2510.01833] Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning
Researchers introduced a new framework called Plan-Then-Action Enhanced Reasoning with Group Relative Policy Optimization for large language models. This two-stage method aims to improve both high-level planning and detailed reasoning, leading to more accurate and efficient outputs. The framework has shown significant enhancements across ten reasoning benchmarks in mathematics and natural sciences.