Zijia Zhao

Cited by

	All	Since 2019
Citations	155	155
h-index	5	5
i10-index	4	4

20212022202320241 10 55 89

Public access

View all

2 articles

0 articles

available

not available

Based on funding mandates

Zijia Zhao

Institute of Automation, Chinese Academy Sciences (CASIA)

Verified email at ia.ac.cn

Multimodal learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset S Chen, H Li, Q Wang, Z Zhao, M Sun, X Zhu, J Liu Advances in Neural Information Processing Systems 36, 2024	52	2024
OPT: Omni-perception pre-trainer for cross-modal understanding and generation J Liu, X Zhu, F Liu, L Guo, Z Zhao, M Sun, W Wang, H Lu, S Zhou, J Zhang, ... arXiv preprint arXiv:2107.00249, 2021	39	2021
Chatbridge: Bridging modalities with large language model as a language catalyst Z Zhao, L Guo, T Yue, S Chen, S Shao, X Zhu, Z Yuan, J Liu arXiv preprint arXiv:2305.16103, 2023	30	2023
Vl-mamba: Exploring state space models for multimodal learning Y Qiao, Z Yu, L Guo, S Chen, Z Zhao, M Sun, Q Wu, J Liu arXiv preprint arXiv:2403.13600, 2024	17	2024
Mm21 pre-training for video understanding challenge: Video captioning with pretraining techniques S Chen, X Zhu, D Hao, W Liu, J Liu, Z Zhao, L Guo, J Liu Proceedings of the 29th ACM International Conference on Multimedia, 4853-4857, 2021	6	2021
Mamo: Fine-grained vision-language representations learning with masked multimodal modeling Z Zhao, L Guo, X He, S Shao, Z Yuan, J Liu Proceedings of the 46th International ACM SIGIR Conference on Research and …, 2023	5	2023
Mamo: masked multimodal modeling for fine-grained vision-language representation learning Z Zhao, L Guo, X He, S Shao, Z Yuan, J Liu arXiv preprint arXiv:2210.04183, 2022	4	2022
Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions W Wang, Y Zhang, X He, Y Yan, Z Zhao, X Wang, J Liu arXiv preprint arXiv:2402.11265, 2024	1	2024
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models T Yue, J Cheng, L Guo, X Dai, Z Zhao, X He, G Xiong, Y Lv, J Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024	1	2024
OneDiff: A Generalist Model for Image Difference E Hu, L Guo, T Yue, Z Zhao, S Xue, J Liu arXiv preprint arXiv:2407.05645, 2024		2024
Towards Event-oriented Long Video Understanding Y Du, K Zhou, Y Huo, Y Li, WX Zhao, H Lu, Z Zhao, B Wang, W Chen, ... arXiv preprint arXiv:2406.14129, 2024		2024
Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs Z Zhao, H Lu, Y Huo, Y Du, T Yue, L Guo, B Wang, W Chen, J Liu arXiv preprint arXiv:2406.09367, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–12

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by