Ayush Jain

I am a Research Scientist at Meta in the Applied Reinforcement Learning team. I build RL algorithms for agents that act under complex action spaces across recommender systems, robotics, and LLMs.

I completed my PhD at USC with Prof. Joseph J. Lim and Prof. Erdem Bıyık. I have worked or interned at Meta Reality Labs, Microsoft Research Montreal, Naver AI, and Samsung Research Korea. I received my undergraduate degree from IIT Delhi.

Ayush Jain

Research

Structure Enables Effective Self-Localization teaser
Structure Enables Effective Self-Localization of Errors in LLMs
LLMs ICML 2026

We show that language models can explicitly self-localize errors in incorrect reasoning when it is structured as discrete, semantically coherent thought steps.

When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
Robotics ICLR 2026

We propose an inverse RL method for learning from constrained demonstrators and finding shorter trajectories to the goal.

Credit Assignment with Resets teaser
Credit Assignment with Resets in Language Model Reasoning
LLMs Preprint

We propose reset-based policy optimization methods that improve credit assignment in multi-step language model reasoning by resampling continuations from intermediate reasoning states.

Imbalanced Gradients teaser
Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
LLMs EACL 2026

Certain tasks produce much larger gradients during RL post-training of multi-task LLMs, biasing model updates without translating to greater learning gains.

Multi-Agent Debate teaser
Self-Improvement of Language Models by Post-Training on Multi-Agent Debate
LLMs Preprint

Language models improve by reinforcing their own debate consensus across diverse reasoning paths.

Q3C teaser
Actor-Free Continuous Control via Structurally Maximizable Q-Functions
RL Algorithms NeurIPS 2025

Actor-free Q-learning in continuous action spaces by learning a wire-fitted Q-function.

SAVO teaser
Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
RL Algorithms RLC 2025, Reinforcement Learning Conference
Outstanding Paper Award on Empirical Reinforcement Learning Research

We identify that TD3 gets stuck in local optima in tasks with complex Q-functions and propose a new actor architecture to find better optima.

QMP teaser
QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing
Grace Zhang*, Ayush Jain*, Injune Hwang, Shao-Hua Sun, Joseph J. Lim
RL Algorithms Robotics ICLR 2025

We introduce behavior-sharing for efficient multitask reinforcement learning, complementary with parameter-sharing and data-sharing.

Know Your Action Set: Learning Action Relations for Reinforcement Learning
RL Algorithms ICLR 2022

For optimal decision-making under a varying action space, we learn relations between available actions using a graph-attention policy architecture.

Generalization to New Actions in Reinforcement Learning
Ayush Jain*, Andrew Szot*, Joseph J. Lim
RL Algorithms ICML 2020

Our RL framework enables agents to solve sequential decision-making tasks even when available actions have not been seen before.

UID paper teaser
Uniform Information Density Effects on Syntactic Choice in Hindi
Ayush Jain*, Vishal Singh*, Sidharth Ranjan*, Rajakrishnan Rajkumar, Sumeet Agarwal
COLING 2018 Workshop

This work investigates the extent to which word order choices in Hindi are influenced by the drive to minimize information variance in a sentence.

Teaching

Teaching Assistant (USC): Deep Learning and its Applications (CSCI566, CSCI599)

  • Fall 2024: Prof. Yan Liu
  • Spring 2024: Prof. Yue Zhao
  • Spring 2023: Prof. Jesse Thomason
  • Fall 2020: Prof. Joseph J Lim
  • Spring 2019: Prof. Joseph J Lim
  • Fall 2019: Prof. Joseph J Lim

Reviewing

  • ICLR: 2023, 2024, 2025, 2026
  • NeurIPS: 2023, 2024, 2025, 2026
  • ICML: 2025, 2026
  • RLC: 2025, 2026
  • CoRL: 2021, 2022, 2023, 2024
  • AAAI: 2026