Follow
Junlin Wu
Title
Cited by
Cited by
Year
Robust deep reinforcement learning through bootstrapped opportunistic curriculum
J Wu, Y Vorobeychik
International Conference on Machine Learning, 24177-24211, 2022
252022
Exact verification of relu neural control barrier functions
H Zhang, J Wu, Y Vorobeychik, A Clark
Advances in neural information processing systems 36, 5685-5705, 2023
172023
Neural lyapunov control for discrete-time systems
J Wu, A Clark, Y Kantaros, Y Vorobeychik
Advances in neural information processing systems 36, 2939-2955, 2023
162023
On the exploitability of reinforcement learning with human feedback for large language models
J Wang, J Wu, M Chen, Y Vorobeychik, C Xiao
arXiv preprint arXiv:2311.09641, 2023
132023
Axioms for AI Alignment from Human Feedback
L Ge, D Halpern, E Micha, AD Procaccia, I Shapira, Y Vorobeychik, J Wu
arXiv preprint arXiv:2405.14758, 2024
72024
Manipulating Elections by Changing Voter Perceptions
J Wu, A Estornell, L Kong, Y Vorobeychik
Proceedings of the Thirty-First International Joint Conference on Artificial …, 2022
72022
RLHFPoison: Reward poisoning attack for reinforcement learning with human feedback in large language models
J Wang, J Wu, M Chen, Y Vorobeychik, C Xiao
Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024
32024
Preference Poisoning Attacks on Reward Model Learning
J Wu, J Wang, C Xiao, C Wang, N Zhang, Y Vorobeychik
arXiv preprint arXiv:2402.01920, 2024
22024
Learning Generative Deception Strategies in Combinatorial Masking Games
J Wu, C Kamhoua, M Kantarcioglu, Y Vorobeychik
Decision and Game Theory for Security: 12th International Conference …, 2021
22021
Certifying safety in reinforcement learning under adversarial perturbation attacks
J Wu, H Sibai, Y Vorobeychik
2024 IEEE Security and Privacy Workshops (SPW), 57-67, 2024
12024
Verified Safe Reinforcement Learning for Neural Network Dynamic Models
J Wu, H Zhang, Y Vorobeychik
arXiv preprint arXiv:2405.15994, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–11