TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents
Researchers introduced Trajectory-Search Rollouts (TSR) to enhance multi-turn reinforcement learning for language models.
TSR improves trajectory quality and learning stability, boosting performance by up to 15% in various tasks.
This record is extracted from a published AI Today issue and tied to the original source URL. Treat the source as the record of evidence for the summary.