Dhawal Gupta

I am a PhD student at the Autonomous Learning Lab (ALL) in University of Massachussetts, Amherst working with Dr. Bruno Castro Da Silva and Dr. Philip Thomas on reinforcement learning problems.

I completed my masters degreen at the RLAI Lab working with Dr. Martha White. As a masters student I worked on off policy learning, variance reduction strategies for policy gradients and most importantly credit assignment in neural networks.

I completed my bachelors degree in Computer Science and Engineering fromĀ Indian Institute of Technology (IIT), Patna, India. While as an undergrad, I did my thesis under the supervision of Dr Sriparna Saha andĀ Prof. Pushpak Bhattacharyya. My research work has been focused on using Hierarchical Reinforcement Learning to develop decision-making policies for chatbots, that can function in a generic manner for multiple domains, tasks and languages.

Email : dhawgupta [at] gmail [dot] com

GitHub  /  Google Scholar  /  LinkedIn  /  CV


I'm interested in robotics and artificial intelligence, specifically in the use of reinforcement learning algorithms for modelling control and decision making policies in robots. I aim to work towards self-adaptability of agents in different environments, making them more robust to noise and stochasticity of their surroundings.

Credit Assignment in Neural Networks using Reinforcement Learning

Dhawal Gupta, Gabor Mihucz, Matthew Schlegel, James Kostas, Philip Thomas, Martha White
accepted in Neural Information Processing (NeurIPs), 2021

Credit Assignment in Neural Networks using Reinforcement Learning
Dhawal Gupta
Masters Thesis, 2021 Submitted in partial fulfillment of the Masters degree at University of Alberta.
Thesis / Slides

Gradient Temporal Difference Learning with Regularized Corrections
Sina Ghiassian, Andy Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White
International Conference on Machine Learning (ICML), 2020
Paper / Code

Towards Integrated Dialogue Policy Learning for Multiple Domains and Intents using Hierarchical Deep Reinforcement Learning
Dhawal Gupta *, Tulika Saha * , Sriparna Saha, Pushpak Bhattacharyya,
Expert Systems with Applications, 2020
Emotion Aided Dialogue Act Classification for Task-Independent Conversations in a Multi-modal Framework.
Tulika Saha*, Dhawal Gupta*, Sriparna Saha, Pushpak Bhattacharyya,
Cognitive Computation, 2020
A Generic Dialogue Manager using Reinforcement Learning in a Multilingual Multi-Intent Multi-Domain Setting
Dhawal Gupta
Bachelors Thesis, 2019 Submitted in partial fulfillment of the B.Tech degree at IIT Patna.
Thesis / Slides / Poster

Reinforcment Learning based Dialogue Management Strategy
Tulika Saha, Dhawal Gupta, Sriparna Saha, Pushpak Bhattacharyya,
25th International Conference on Neural Information Processing (ICONIP), 2018

Introduces a novel MDP, which makes the model resilient to NLU (Natural Languague Understanding) failures. A policy for dialogue strategy in a task oriented setting particularly for airline-centric databases.

Bayesian Optimization based Terrestrial Gait Tuning for a 12-DOF Alligator-Inspired Robot with Active Body Undulation
Krishna Agrawal, Kushagra Jain, Dhawal Gupta, Raunak Srivastav, Abhijeet Agnihotri, Atul Thakur,
42nd Mechanisms and Robotics Conference (MR) ASME IDETC/CIE, 2018

Fabrication, control and gait tuning of an alligator inspired robot having design based on a quadruped robot with an active spine. Achieved an improvement of 1.93


Applicability of Momentum in the Methods of Temporal Learning
Dhawal Gupta

Course project report, comparing the applicability of momentum in temporal difference updates alongside eligibility traces


Investigating the Utility of Off-Policy Data in PPO Algorithm
Yufeng Yuan & Dhawal Gupta

Course project report, where we investigate some methods to reuse off policy data in order to decease the sample complexity of PPO algorithms. We used two versions of PPO algorithm, first from OpenAI Baselines and second was our own implementation from scratch on PyTorch. We tested our methods on the Reacher Task in SenseAct on the UR5 Robotic Arm Platform.


Utility of accelerated temporal difference methods over gradient based optimizers
Dhawal Gupta

I investigate whether we need to develop specialized accelerated approaches specially for TD learning, or we can employ optimizers from supervised learning like ADAM and RMSProp. This problem arises because TD learning is not a gradient descent method.


IBM Research

I interned at IBM Research India from May 2018 to August 2018 under the mentorship of Dr Kedar Kulkarni. I worked in the Operations Research Team, on Multi-Asset clustering and developing accurate and resource economic distance metric for clustering.


Indraprastha Institute of Information Technology, Delhi

I interned at IIIT Delhi from May 2017 to july 2017, under the mentorship of Dr Saket Anand and Dr Sanjit Kaul. I worked on extracting reward functions for autonomous car steering, using Inverse Reinforcement Learning framework. Here is a brief report.


Smart Containers
Zenin Easa, Dhawal Gupta, Jimson Mathew, 2017
video / code / webapp / poster

Prototype of an IoT product named smart containers that can be used to monitor the food consumption, using minimal number of sensors. This project was presented in Intel HEC and ISED 2016.


ABU Asia-Pacific Robot Contest (Robocon)
Ashwin Goyal, Dhawal Gupta, Atul Thakur (Mentor), 2017

Robocon is an annual Asia Pacific Competition. I was Founder and Vice Captain of the Robocon Team for IIT Patna.

Other Projects

Using GANs to generate photo-realistic fundus images manifesting diabetic retinopathy
Dhawal Gupta, Raghav Jindal , Rushikesh Pedganokar
code/ report

Task of generating disease-manifested fundus images using DC-GAN. This was a part of CS551 Deep Learning course and our project was adjudged first among a total of 26 projects.


Fuzzy Controller for Inverted Pendulum
Abhishek Agrawal, Raghv Jindal, Sahil Sharma, Dhawal Gupta
code/ report

Built a fuzzy controller for a inverted pendulum which works on different profile using the weighted centroid method for appropriate current calulcation for balancing the pendulum


Simulation of Unix File System
Dhawal Gupta, Sahil Sharma, Tarun Garg, Ashutosh Drolia

Built a codebase to simulate the command in-memory for a Linux file system using C/C++ as part of the CS341 Operating Systems Lab


Implementation of Distributed Hash Table (Chord Protocol)
Dhawal Gupta

An implementation of Chord protocol in Python with the facility to handle random adds and drops in the P2P network. Deployable on different Virtual Machines.


Analysis of Twitter Network
Dhawal Gupta

Course Project for CS544 Network Science. This project finds the influential people in a Twitter network based on a set of tweets related to a particular event. In our case, it is the discovery of the Higgs Boson particle.

Interests and Hobbies
  • Running
  • Recently I have started to climb (thanks to Csaba).
  • Used to do some Art as a kid, here are some good ones paintings
  • Reading (See what I am reading).
  • I like to play Video Games.
  • Tinkering with stuff.
  • Watching anime and reading manga.
Website layout from here