Follow
Yifei Xin
Yifei Xin
Verified email at stu.pku.edu.cn
Title
Cited by
Cited by
Year
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Z Cheng, S Leng, H Zhang, Y Xin, X Li, G Chen, Y Zhu, W Zhang, Z Luo, ...
arXiv preprint arXiv:2406.07476, 2024
1202024
Improving text-audio retrieval by text-aware attention pooling and prior matrix revised loss
Y Xin, D Yang, Y Zou
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
352023
Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification.
Y Xin, D Yang, Y Zou
INTERSPEECH, 1546-1550, 2022
192022
Masked Audio Modeling with CLAP and Multi-Objective Learning
Y Xin, X Peng, Y Lu
Proc. INTERSPEECH 2023, 2763-2767, 2024
102024
Improving audio-text retrieval via hierarchical cross-modal interaction and auxiliary captions
Y Xin, Y Zou
Proc. INTERSPEECH 2023, 341-345, 2023
102023
Cooperative game modeling with weighted token-level alignment for audio-text retrieval
Y Xin, B Wang, L Shang
IEEE Signal Processing Letters, 2023
82023
Improving weakly supervised sound event detection with causal intervention
Y Xin, D Yang, F Cui, Y Wang, Y Zou
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
82023
Low-complexity acoustic scene classification with mismatch-devices using separable convolutions and coordinate attention
Y Xin, Y Zou, F Cui, Y Wang
DCASE2022 Challenge, Tech. Rep, 2022
62022
Improving speech enhancement via event-based query
Y Xin, X Peng, Y Lu
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
52023
Background-aware Modeling for Weakly Supervised Sound Event Detection
Y Xin, D Yang, Y Zou
Proc. INTERSPEECH 2023, 1199-1203, 2023
52023
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Y Zhu, B Li, Y Xin, L Xu
arXiv preprint arXiv:2411.02038, 2024
32024
Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
L Li, W Xu, J Guo, R Zhao, X Li, Y Yuan, B Zhang, Y Jiang, Y Xin, R Dang, ...
arXiv preprint arXiv:2410.13185, 2024
32024
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
Y Xin, X Cheng, Z Zhu, X Yang, Y Zou
Proc. Interspeech 2024, 1670-1674, 2024
32024
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
H Zhao, Y Xin, Z Yu, B Zhu, L Lu, Z Ma
Proc. Interspeech 2024, 52-56, 2024
2*2024
Soul-mix: Enhancing multimodal machine translation with manifold mixup
X Cheng, Z Yao, Y Xin, H An, H Li, Y Li, Y Zou
Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024
22024
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
R Dang, Y Yuan, W Zhang, Y Xin, B Zhang, L Li, L Wang, Q Zeng, X Li, ...
arXiv preprint arXiv:2501.05031, 2025
2025
Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
Y Xin, Z Zhu, X Cheng, X Yang, Y Zou
Proc. Interspeech 2024, 1140-1144, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–17