Training deeper models by GPU memory optimization on TensorFlow
C Meng, M Sun, J Yang, M Qiu, Y Gu
Proc. of ML Systems Workshop in NIPS, 2017
Ansor: Generating High-Performance Tensor Programs for Deep Learning
L Zheng, C Jia, M Sun, Z Wu, CH Yu, A Haj-Ali, Y Wang, J Yang, D Zhuo, ...
arXiv preprint arXiv:2006.06762, 2020
Characterizing Deep Learning Training Workloads on Alibaba-PAI
M Wang, C Meng, G Long, C Wu, J Yang, W Lin, Y Jia
arXiv preprint arXiv:1910.05930, 2019
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
S Fan, Y Rong, C Meng, Z Cao, S Wang, Z Zheng, C Wu, G Long, J Yang, ...
arXiv preprint arXiv:2007.01045, 2020
CrashTuner: detecting crash-recovery bugs in cloud systems via meta-info analysis
J Lu, C Liu, L Li, X Feng, F Tan, J Yang, L You
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 114-130, 2019
FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads
Z Zheng, P Zhao, G Long, F Zhu, K Zhu, W Zhao, L Diao, J Yang, W Lin
arXiv preprint arXiv:2009.10924, 2020
A Novel Integrated Framework for Learning both Text Detection and Recognition
W Sui, Q Zhang, J Yang, W Chu
2018 24th International Conference on Pattern Recognition (ICPR), 2233-2238, 2018
Pyramid Embedded Generative Adversarial Network for Automated Font Generation
D Sun, Q Zhang, J Yang
2018 24th International Conference on Pattern Recognition (ICPR), 976-981, 2018
FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs
G Long, J Yang, K Zhu, W Lin
arXiv preprint arXiv:1811.05213, 2018
Efficient Deep Learning Inference based on Model Compression
Q Zhang, M Zhang, M Wang, W Sui, C Meng, J Yang, W Kong, X Cui, ...
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2018
Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks
M Wang, Q Zhang, J Yang, X Cui, W Lin
arXiv preprint arXiv:1811.08589, 2018
INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile Devices
Y Yao, Y Li, C Wang, T Yu, H Chen, X Jiang, J Yang, J Huang, W Lin, ...
arXiv preprint arXiv:2010.14841, 2020
FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads
G Long, J Yang, W Lin
arXiv preprint arXiv:1911.11576, 2019
You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient
S Zhang, X Zheng, C Yang, Y Li, Y Wang, F Chao, M Wang, S Li, J Yang, ...
arXiv preprint arXiv:2106.02435, 2021
DISC: A Dynamic Shape Compiler for Machine Learning Workloads
K Zhu, W Zhao, Z Zheng, T Guo, P Zhao, J Bai, J Yang, X Liu, L Diao, ...
arXiv preprint arXiv:2103.05288, 2021
Optimizing Distributed Training Deployment in Heterogeneous GPU Clusters
WL X. Yi, S. Zhang, Z. Luo, G. Long, L. Diao, C. Wu, Z. Zheng, J. Yang
ACM CoNEXT, 2020
Fast Training of Deep Learning Models over Multiple GPUs
WL Xiaodong Yi, Ziyue Luo, Chen Meng, Mengdi Wang, Guoping Long, Chuan Wu ...
ACM/IFIP Middleware, 2020
