cosformer: Rethinking softmax in attention Z Qin, W Sun, H Deng, D Li, Y Wei, B Lv, J Yan, L Kong, Y Zhong arXiv preprint arXiv:2202.08791, 2022 | 226 | 2022 |
Hierarchically gated recurrent neural network for sequence modeling Z Qin, S Yang, Y Zhong Advances in Neural Information Processing Systems 36, 2024 | 54 | 2024 |
The devil in linear transformer Z Qin, X Han, W Sun, D Li, L Kong, N Barnes, Y Zhong arXiv preprint arXiv:2210.10340, 2022 | 47 | 2022 |
Toeplitz neural network for sequence modeling Z Qin, X Han, W Sun, B He, D Li, D Li, Y Dai, L Kong, Y Zhong arXiv preprint arXiv:2305.04749, 2023 | 27 | 2023 |
Vicinity vision transformer W Sun, Z Qin, H Deng, J Wang, Y Zhang, K Zhang, N Barnes, S Birchfield, ... IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (10 …, 2023 | 26 | 2023 |
Hgrn2: Gated linear rnns with state expansion Z Qin, S Yang, W Sun, X Shen, D Li, W Sun, Y Zhong arXiv preprint arXiv:2404.07904, 2024 | 24 | 2024 |
Scaling transnormer to 175 billion parameters Z Qin, D Li, W Sun, W Sun, X Shen, X Han, Y Wei, B Lv, F Yuan, X Luo, ... arXiv preprint arXiv:2307.14995, 2023 | 20 | 2023 |
Neural architecture search on efficient transformers and beyond Z Liu, D Li, K Lu, Z Qin, W Sun, J Xu, Y Zhong arXiv preprint arXiv:2207.13955, 2022 | 16 | 2022 |
Fine-grained audible video description X Shen, D Li, J Zhou, Z Qin, B He, X Han, A Li, Y Dai, L Kong, M Wang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 14 | 2023 |
Lightning attention-2: A free lunch for handling unlimited sequence lengths in large language models Z Qin, W Sun, D Li, X Shen, W Sun, Y Zhong arXiv preprint arXiv:2401.04658, 2024 | 6 | 2024 |
Exploring Transformer Extrapolation Z Qin, Y Zhong, H Deng Proceedings of the AAAI Conference on Artificial Intelligence 38 (17), 18897 …, 2024 | 5 | 2024 |
Accelerating toeplitz neural network with constant-time inference complexity Z Qin, Y Zhong arXiv preprint arXiv:2311.08756, 2023 | 5 | 2023 |
Linearized Relative Positional Encoding Z Qin, W Sun, K Lu, H Deng, D Li, X Han, Y Dai, L Kong, Y Zhong arXiv preprint arXiv:2307.09270, 2023 | 5 | 2023 |
CO2: Efficient distributed training with full communication-computation overlap W Sun, Z Qin, W Sun, S Li, D Li, X Shen, Y Qiao, Y Zhong arXiv preprint arXiv:2401.16265, 2024 | 4 | 2024 |
All-pairs Consistency Learning forWeakly Supervised Semantic Segmentation W Sun, Y Zhang, Z Qin, Z Liu, L Cheng, F Wang, Y Zhong, N Barnes Proceedings of the IEEE/CVF International Conference on Computer Vision, 826-837, 2023 | 4 | 2023 |
Linear video transformer with feature fixation K Lu, Z Liu, J Wang, W Sun, Z Qin, D Li, X Shen, H Deng, X Han, Y Dai, ... arXiv preprint arXiv:2210.08164, 2022 | 4 | 2022 |
Transnormerllm: A faster and better large language model with improved transnormer Z Qin, D Li, W Sun, W Sun, X Shen, X Han, Y Wei, B Lv, X Luo, Y Qiao, ... | 3 | 2023 |
TAVGBench: Benchmarking text to audible-video generation Y Mao, X Shen, J Zhang, Z Qin, J Zhou, M Xiang, Y Zhong, Y Dai Proceedings of the 32nd ACM International Conference on Multimedia, 6607-6616, 2024 | 2 | 2024 |
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention Z Qin, W Sun, D Li, X Shen, W Sun, Y Zhong arXiv preprint arXiv:2405.17381, 2024 | 2 | 2024 |
Unlocking the secrets of linear complexity sequence model from a unified perspective Z Qin, X Shen, D Li, W Sun, S Birchfield, R Hartley, Y Zhong arXiv preprint arXiv:2405.17383, 2024 | 2 | 2024 |