Παρακολούθηση
Csaba Szepesvari
Csaba Szepesvari
Η διεύθυνση ηλεκτρονικού ταχυδρομείου έχει επαληθευτεί στον τομέα cs.ualberta.ca - Αρχική σελίδα
Τίτλος
Παρατίθεται από
Παρατίθεται από
Έτος
Bandit based monte-carlo planning
L Kocsis, C Szepesvári
Machine Learning: ECML 2006: 17th European Conference on Machine Learning …, 2006
37242006
Bandit algorithms
T Lattimore, C Szepesvári
Cambridge University Press, 2020
17522020
Algorithms for Reinforcement Learning
C Szepesvari
Morgan and Claypool, 2010
16882010
Improved algorithms for linear stochastic bandits
Y Abbasi-Yadkori, C Szepesvári, D Pál
Advances in Neural Information Processing Systems, 2312-2320, 2011
14492011
Convergence results for single-step on-policy reinforcement-learning algorithms
S Singh, T Jaakkola, ML Littman, C Szepesvári
Machine learning 38, 287-308, 2000
9002000
Exploration–exploitation tradeoff using variance estimates in multi-armed bandits
JY Audibert, R Munos, C Szepesvári
Theoretical Computer Science 410 (19), 1876-1902, 2009
6762009
Fast gradient-descent methods for temporal-difference learning with linear function approximation
RS Sutton, HR Maei, D Precup, S Bhatnagar, D Silver, C Szepesvári, ...
Proceedings of the 26th annual international conference on machine learning …, 2009
6402009
Finite-Time Bounds for Fitted Value Iteration.
R Munos, C Szepesvári
Journal of Machine Learning Research 9 (5), 2008
4932008
X-Armed Bandits.
S Bubeck, R Munos, G Stoltz, C Szepesvári
Journal of Machine Learning Research 12 (5), 2011
4612011
Parametric bandits: The generalized linear case
S Filippi, O Cappe, A Garivier, C Szepesvári
Advances in Neural Information Processing Systems 23, 2010
4332010
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
A Antos, C Szepesvári, R Munos
Machine Learning 71, 89-129, 2008
4232008
Learning with a strong adversary
R Huang, B Xu, D Schuurmans, C Szepesvári
arXiv preprint arXiv:1511.03034, 2015
3642015
Regret bounds for the adaptive control of linear quadratic systems
Y Abbasi-Yadkori, C Szepesvári
Proceedings of the 24th Annual Conference on Learning Theory, 1-26, 2011
3462011
A generalized reinforcement-learning model: Convergence and applications
ML Littman, C Szepesvári
ICML 96, 310-318, 1996
3131996
Toward off-policy learning control with function approximation.
HR Maei, C Szepesvári, S Bhatnagar, RS Sutton
ICML 10, 719-726, 2010
3012010
Apprenticeship learning using inverse reinforcement learning and gradient methods
G Neu, C Szepesvári
arXiv preprint arXiv:1206.5264, 2012
2962012
The grand challenge of computer Go: Monte Carlo tree search and extensions
S Gelly, L Kocsis, M Schoenauer, M Sebag, D Silver, C Szepesvári, ...
Communications of the ACM 55 (3), 106-113, 2012
2922012
Convergent temporal-difference learning with arbitrary smooth function approximation
H Maei, C Szepesvari, S Bhatnagar, D Precup, D Silver, RS Sutton
Advances in neural information processing systems 22, 2009
2922009
Multi-criteria reinforcement learning.
Z Gábor, Z Kalmár, C Szepesvári
ICML 98, 197-205, 1998
2841998
Cascading bandits: Learning to rank in the cascade model
B Kveton, C Szepesvari, Z Wen, A Ashkan
International conference on machine learning, 767-776, 2015
2692015
Δεν είναι δυνατή η εκτέλεση της ενέργειας από το σύστημα αυτή τη στιγμή. Προσπαθήστε ξανά αργότερα.
Άρθρα 1–20