Ayush Jain

I am a Research Scientist at Meta Meta in the Applied Reinforcement Learning team. I work on reinforcement learning algorithms and architectures that learn under complex action spaces for large-scale recommenders, robotics, and LLMs.

I completed my PhD at University of Southern California, co-advised by Prof. Joseph J. Lim and Prof. Erdem Bıyık. I was fortunate to intern at Meta Reality Labs, Microsoft Research Montreal and Naver AI, Seoul. Before joining USC, I spent two years in Seoul, working at Samsung Research Korea. Earlier, I graduated from IIT Delhi, where I worked under the guidance of Prof. Sumeet Agarwal and Prof. Rajakrishnan Rajkumar.

Email  |  Twitter  |  CV  |  Google Scholar  |  LinkedIn

Research
Q3C Teaser Actor-Free Continuous Control via Structurally Maximizable Q-Functions
Yigit Korkmaz*, Urvi Bhuwania*, Ayush Jain, Erdem Bıyık
NeurIPS 2025

Actor-free Q-learning in Continuous Action Spaces by learning a "Wire-fitted Q-function".

arXiv | Code

SAVO Teaser Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
Ayush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Bıyık, Joseph J. Lim
RLC 2025, Reinforcement Learning Conference
Outstanding Paper Award on Empirical Reinforcement Learning Research

We identify that TD3 gets stuck in local optima in tasks with complex Q-functions and propose a new actor architecture to find better optima.

Paper | arXiv

When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
Xinhu Li, Ayush Jain, Zhaojing Yang, Yigit Korkmaz, Erdem Bıyık
Preprint

Expert demonstrators are often constrained due to indirect control, setup restrictions, and hardware safety. We propose an inverse RL method to learn from such constrained demonstrations and find shorter trajectories to the goal.

arXiv | Project Page

MACA Teaser Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment
Ankur Samanta, Akshayaa Magesh, Youliang Yu, Runzhe Wu, Ayush Jain, Daniel Jiang, Boris Vidolov, Paul Sajda, Yonathan Efroni, Kaveh Hassani
Preprint

Language models learn to maintain consistent answers across diverse reasoning paths and ground arguments in peer reasoning by reinforcing their own debate consensus, driving reasoning self-improvement.

arXiv | Code

QMP Teaser QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing
Grace Zhang*, Ayush Jain*, Injune Hwang, Shao-Hua Sun, Joseph J. Lim
ICLR 2025

We introduce behavior-sharing for efficient multitask reinforcement learning, complementary with parameter-sharing and data-sharing.

Paper | arXiv | Project Page | Code

Know Your Action Set: Learning Action Relations for Reinforcement Learning
Ayush Jain*, Norio Kosaka*, Kyung-Min Kim, Joseph J. Lim
ICLR 2022

For optimal decision-making under a varying action space, we learn the relations between the available actions using a graph-attention network based policy architecture.

Paper | Project Page | Code | Talk

Generalization to New Actions in Reinforcement Learning
Ayush Jain*, Andrew Szot*, Joseph J. Lim
ICML 2020

Our proposed RL framework enables agents to solve sequential decision-making tasks even when the available actions (tools or skills) have not been seen before.

Paper | Project Page | arXiv | Code | Talk | Environment

UID Uniform Information Density Effects on Syntactic Choice in Hindi
Ayush Jain*, Vishal Singh*, Sidharth Ranjan*, Rajakrishnan Rajkumar, Sumeet Agarwal
COLING 2018 Workshop on Linguistic Complexity and Natural Language Processing

This work investigates the extent to which word order choices in Hindi language are influenced by the drive to minimize the information variance in a sentence.

Teaching

Teaching Assistant (USC): Deep Learning and its Applications (CSCI566, CSCI599)

  • Fall 2024: Prof. Yan Liu
  • Spring 2024: Prof. Yue Zhao
  • Spring 2023: Prof. Jesse Thomason
  • Fall 2020: Prof. Joseph J Lim
  • Spring 2019: Prof. Joseph J Lim
  • Fall 2019: Prof. Joseph J Lim

Reviewing
  • ICLR: 2023, 2024, 2025, 2026
  • NeurIPS: 2023, 2024, 2025
  • ICML: 2025
  • RLC: 2025
  • CoRL: 2021, 2022, 2023, 2024
  • AAAI: 2026

Credits to the Coolest template!