I completed my masters degreen at the RLAI Lab
working with Dr. Martha White. As a masters student I worked on off policy learning, variance reduction strategies for policy gradients and most importantly credit assignment in neural networks.
I completed my bachelors degree in Computer Science and Engineering from Indian Institute of Technology (IIT), Patna, India. While as an undergrad, I did my thesis under the supervision of Dr Sriparna Saha and Prof. Pushpak Bhattacharyya.
My research work has been focused on using Hierarchical Reinforcement Learning to develop decision-making policies for chatbots, that can function in a generic manner for multiple domains, tasks and languages.
I'm interested in robotics and artificial intelligence, specifically in the use of reinforcement learning algorithms for modelling control and decision making policies in robots. I aim to work towards self-adaptability of agents in different environments, making them more robust to noise and stochasticity of their surroundings.
Credit Assignment in Neural Networks using Reinforcement Learning
accepted in Neural Information Processing (NeurIPs), 2021
Introduces a novel MDP, which makes the model resilient to NLU (Natural Languague Understanding) failures. A policy for dialogue strategy in a task oriented setting particularly for airline-centric databases.
Course project report, where we investigate some methods to reuse off policy data in order to decease the sample complexity of PPO algorithms. We used two versions of PPO algorithm, first from OpenAI Baselines and second was our own implementation from scratch on PyTorch. We tested our methods on the Reacher Task in SenseAct on the UR5 Robotic Arm Platform.
I investigate whether we need to develop specialized accelerated approaches specially for TD learning, or we can employ optimizers from supervised learning like ADAM and RMSProp. This problem arises because TD learning is not a gradient descent method.
I interned at IBM Research India from May 2018 to August 2018 under the mentorship of Dr Kedar Kulkarni. I worked in the Operations Research Team, on Multi-Asset clustering and developing accurate and resource economic distance metric for clustering.
Indraprastha Institute of Information Technology, Delhi
I interned at IIIT Delhi from May 2017 to july 2017, under the mentorship of Dr Saket Anand and Dr Sanjit Kaul. I worked on extracting reward functions for autonomous car steering, using Inverse Reinforcement Learning framework. Here is a brief
Course Project for CS544 Network Science. This project finds the influential people in a Twitter network based on a set of tweets related to a particular event. In our case, it is the discovery of the Higgs Boson particle.