 |
Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
Runzhe Wu,
Ankur Samanta,
Ayush Jain,
Scott Fujimoto,
Jeongyeol Kwon,
Ben Kretzu,
Youliang Yu,
Kaveh Hassani,
Boris Vidolov,
Yonathan Efroni
EACL 2026
We find that in RL post-training of multi-task LLMs, certain tasks produce much larger gradients, which biases model updates toward those tasks, even though these larger gradients don't translate to greater learning gains.
arXiv
|
 |
Structure Enables Effective Self-Localization of Errors in LLMs
Ankur Samanta,
Akshayaa Magesh,
Ayush Jain,
Kavosh Asadi,
Youliang Yu,
Daniel Jiang,
Boris Vidolov,
Kaveh Hassani
Paul Sajda,
Jalaj Bhandari,
Yonathan Efroni,
Preprint
We show that language models can explicitly self-localize errors in incorrect reasoning when it is structured as discrete, semantically coherent thought steps, enabling effective self-correction through targeted backtracking and resampling.
arXiv
|
 |
Self-Improvement of Language Models by Post-Training on Multi-Agent Debate
Ankur Samanta,
Akshayaa Magesh,
Runzhe Wu,
Ayush Jain,
Youliang Yu,
Daniel Jiang,
Boris Vidolov,
Paul Sajda,
Yonathan Efroni,
Kaveh Hassani
Preprint
Language models learn to maintain consistent answers across diverse reasoning paths and ground arguments in peer reasoning by reinforcing their own debate consensus, driving reasoning self-improvement.
arXiv |
Code
|
|
Teaching Assistant (USC): Deep Learning and its Applications (CSCI566, CSCI599)
- Fall 2024: Prof. Yan Liu
- Spring 2024: Prof. Yue Zhao
- Spring 2023: Prof. Jesse Thomason
- Fall 2020: Prof. Joseph J Lim
- Spring 2019: Prof. Joseph J Lim
- Fall 2019: Prof. Joseph J Lim
|
- ICLR: 2023, 2024, 2025, 2026
- NeurIPS: 2023, 2024, 2025
- ICML: 2025
- RLC: 2025
- CoRL: 2021, 2022, 2023, 2024
- AAAI: 2026
|
|