Follow
Dhawal Gupta
Title
Cited by
Cited by
Year
Gradient Temporal-Difference Learning with Regularized Corrections
S Ghiassian, A Patterson, S Garg, D Gupta, A White, M White
International Conference on Machine Learning, 3524-3534, 2020
512020
Emotion Aided Dialogue Act Classification for Task-Independent Conversations in a Multi-modal Framework
T Saha, D Gupta, S Saha, P Bhattacharyya
Cognitive Computation, 1-13, 2020
262020
Towards integrated dialogue policy learning for multiple domains and intents using Hierarchical Deep Reinforcement Learning
T Saha, D Gupta, S Saha, P Bhattacharyya
Expert Systems with Applications 162, 113650, 2020
192020
Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
S Sun, D Gupta, M Iyyer
arXiv preprint arXiv:2309.09055, 2023
152023
A Mixture-of-Expert Approach to RL-based Dialogue Management
Y Chow, A Tulepbergenov, O Nachum, MK Ryu, M Ghavamzadeh, ...
arXiv preprint arXiv:2206.00059, 2022
132022
Behavior Alignment via Reward Function Optimization
D Gupta, Y Chandak, SM Jordan, PS Thomas, BC da Silva
arXiv preprint arXiv:2310.19007, 2023
102023
A hierarchical approach for efficient multi-intent dialogue policy learning
T Saha, D Gupta, S Saha, P Bhattacharyya
Multimedia Tools and Applications, 1-26, 2020
102020
Reinforcement Learning Based Dialogue Management Strategy
T Saha, D Gupta, S Saha, P Bhattacharyya
International Conference on Neural Information Processing, 359-372, 2018
92018
Structural Credit Assignment in Neural Networks using Reinforcement Learning
D Gupta, G Mihucz, MK Schlegel, JE Kostas, PS Thomas, M White
Thirty-Fifth Conference on Neural Information Processing Systems, 2021
82021
A unified dialogue management strategy for multi-intent dialogue conversations in multiple languages
T Saha, D Gupta, S Saha, P Bhattacharyya
Transactions on Asian and Low-Resource Language Information Processing 20 (6 …, 2021
42021
Bayesian Optimization Based Terrestrial Gait Tuning for a 12-DOF Alligator-Inspired Robot With Active Body Undulation
K Agrawal, K Jain, D Gupta, R Srivastav, A Agnihotri, A Thakur
ASME 2018 International Design Engineering Technical Conferences and …, 2018
42018
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
D Gupta, Y Chow, M Ghavamzadeh, C Boutilier
arXiv preprint arXiv:2302.10850, 2023
22023
Mitigating the curse of horizon in Monte-Carlo returns
A Ayoub, D Szepesvari, F Zanini, B Chan, D Gupta, BC da Silva, ...
Reinforcement Learning Journal, 2024
12024
Coagent Networks: Generalized and Scaled
JE Kostas, SM Jordan, Y Chandak, G Theocharous, D Gupta, M White, ...
arXiv preprint arXiv:2305.09838, 2023
12023
Applicability of Momentum in the Methods of Temporal Learning
D Gupta
12020
ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
K Choudhary, D Gupta, PS Thomas
arXiv preprint arXiv:2406.05646, 2024
2024
From Past to Future: Rethinking Eligibility Traces
D Gupta, SM Jordan, S Chaudhari, B Liu, PS Thomas, BC da Silva
arXiv preprint arXiv:2312.12972, 2023
2023
A Generic Dialogue Manager using Reinforcement Learning in a Multilingual Multi-intent Multi-domain Setting
D Gupta
2019
Utility of accelerated temporal difference methods over gradient based optimizers
D Gupta
Investigating the Utility of Off-Policy Data in PPO Algorithm
Y Yuan, D Gupta
The system can't perform the operation now. Try again later.
Articles 1–20