Error bounds of imitating policies and environments T Xu, Z Li, Y Yu Advances in Neural Information Processing Systems 33, 15737-15749, 2020 | 105* | 2020 |
Self-Guided Evolution Strategies with Historical Estimated Gradients FY Liu, ZN Li, C Qian IJCAI, 1474-1480, 2020 | 18 | 2020 |
Rethinking ValueDice - Does It Really Improve Performance? Z Li, T Xu, Y Yu, ZQ Luo ICLR Blog, 2022 | 14 | 2022 |
HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning Z Li, Y Li, Y Zhang, T Zhang, ZQ Luo International Conference on Learning Representations, 2022 | 13 | 2022 |
Understanding adversarial imitation learning in small sample regime: A stage-coupled analysis T Xu, Z Li, Y Yu, ZQ Luo arXiv preprint arXiv:2208.01899, 2022 | 8* | 2022 |
Remax: A simple, effective, and efficient method for aligning large language models Z Li, T Xu, Y Zhang, Y Yu, R Sun, ZQ Luo arXiv preprint arXiv:2310.10505, 2023 | 3 | 2023 |
When is RL better than DPO in RLHF? A Representation and Optimization Perspective Z Li, T Xu, Y Yu ICLR Tiny Paper, 2024 | 2* | 2024 |
Imitation Learning from Imperfection: Theoretical Justifications and Algorithms Z Li, T Xu, Z Qin, Y Yu, ZQ Luo Advances in Neural Information Processing Systems 36, 2024 | 2* | 2024 |
Provably Efficient Adversarial Imitation Learning with Unknown Transitions T Xu, Z Li, Y Yu, ZQ Luo UAI, 2367-2378, 2023 | 2 | 2023 |
A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle Z Li, T Xu, Y Yu arXiv preprint arXiv:2203.11489, 2022 | 1 | 2022 |
Efficient Exploration by Novelty-Pursuit Z Li, XH Chen Distributed Artificial Intelligence: Second International Conference, DAI …, 2020 | 1 | 2020 |
Why Transformers Need Adam: A Hessian Perspective Y Zhang, C Chen, T Ding, Z Li, R Sun, ZQ Luo arXiv preprint arXiv:2402.16788, 2024 | | 2024 |
Deploying Offline Reinforcement Learning with Human Feedback Z Li, K Xu, L Liu, L Li, D Ye, P Zhao arXiv preprint arXiv:2303.07046, 2023 | | 2023 |