Attention is all you need A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Advances in neural information processing systems 30, 2017 | 48960 | 2017 |
An image is worth 16x16 words: Transformers for image recognition at scale A Dosovitskiy, L Beyer, A Kolesnikov, D Weissenborn, X Zhai, ... arXiv preprint arXiv:2010.11929, 2020 | 6017 | 2020 |
A decomposable attention model for natural language inference AP Parikh, O Täckström, D Das, J Uszkoreit arXiv preprint arXiv:1606.01933, 2016 | 1254 | 2016 |
Self-attention with relative position representations P Shaw, J Uszkoreit, A Vaswani arXiv preprint arXiv:1803.02155, 2018 | 1112 | 2018 |
Image transformer N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran International conference on machine learning, 4055-4064, 2018 | 948 | 2018 |
Natural questions: a benchmark for question answering research T Kwiatkowski, J Palomaki, O Redfield, M Collins, A Parikh, C Alberti, ... Transactions of the Association for Computational Linguistics 7, 453-466, 2019 | 939 | 2019 |
Universal transformers M Dehghani, S Gouws, O Vinyals, J Uszkoreit, Ł Kaiser arXiv preprint arXiv:1807.03819, 2018 | 530 | 2018 |
Tensor2tensor for neural machine translation A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ... arXiv preprint arXiv:1803.07416, 2018 | 491 | 2018 |
Mlp-mixer: An all-mlp architecture for vision IO Tolstikhin, N Houlsby, A Kolesnikov, L Beyer, X Zhai, T Unterthiner, ... Advances in Neural Information Processing Systems 34, 24261-24272, 2021 | 464 | 2021 |
Music transformer CZA Huang, A Vaswani, J Uszkoreit, N Shazeer, I Simon, C Hawthorne, ... arXiv preprint arXiv:1809.04281, 2018 | 417 | 2018 |
Attention is all you need. arXiv 2017 A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... arXiv preprint arXiv:1706.03762, 2017 | 345 | 2017 |
An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020 A Dosovitskiy, L Beyer, A Kolesnikov, D Weissenborn, X Zhai, ... arXiv preprint arXiv:2010.11929, 2010 | 334 | 2010 |
One model to learn them all L Kaiser, AN Gomez, N Shazeer, A Vaswani, N Parmar, L Jones, ... arXiv preprint arXiv:1706.05137, 2017 | 301 | 2017 |
Advances in neural information processing systems A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Red Hook: Curran Associates, Inc, 5998-6008, 2017 | 255 | 2017 |
Cross-lingual word clusters for direct transfer of linguistic structure O Täckström, R McDonald, J Uszkoreit The 2012 conference of the north american chapter of the association for …, 2012 | 246 | 2012 |
Object-centric learning with slot attention F Locatello, D Weissenborn, T Unterthiner, A Mahendran, G Heigold, ... Advances in Neural Information Processing Systems 33, 11525-11538, 2020 | 234 | 2020 |
Attention is all you need. 2017 A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... arXiv preprint arXiv:1706.03762, 2017 | 183 | 2017 |
Insertion transformer: Flexible sequence generation via insertion operations M Stern, W Chan, J Kiros, J Uszkoreit International Conference on Machine Learning, 5976-5985, 2019 | 170 | 2019 |
Large scale parallel document mining for machine translation J Uszkoreit, J Ponte, A Popat, M Dubiner | 159 | 2010 |
Coarse-to-fine question answering for long documents E Choi, D Hewlett, J Uszkoreit, I Polosukhin, A Lacoste, J Berant Proceedings of the 55th Annual Meeting of the Association for Computational …, 2017 | 150 | 2017 |