Merak: An efficient distributed dnn training framework with automated 3d parallelism for giant foundation models Z Lai, S Li, X Tang, K Ge, W Liu, Y Duan, L Qiao, D Li IEEE Transactions on Parallel and Distributed Systems 34 (5), 1466-1478, 2023 | 22 | 2023 |
HPDL: towards a general framework for high-performance distributed deep learning D Li, Z Lai, K Ge, Y Zhang, Z Zhang, Q Wang, H Wang 2019 IEEE 39th International Conference on Distributed Computing Systems …, 2019 | 19 | 2019 |
An efficient ADMM-based algorithm to nonconvex penalized support vector machines L Guan, L Qiao, D Li, T Sun, K Ge, X Lu 2018 IEEE International Conference on Data Mining Workshops (ICDMW), 1209-1216, 2018 | 18 | 2018 |
An efficient parallel and distributed solution to nonconvex penalized linear SVMs L Guan, T Sun, L Qiao, Z Yang, D Li, K Ge, X Lu Frontiers of Information Technology & Electronic Engineering 21, 587-603, 2020 | 12 | 2020 |
Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit K Ge, H Su, D Li, X Lu Frontiers of Information Technology & Electronic Engineering 18 (7), 915-927, 2017 | 9 | 2017 |
AutoPipe: A fast pipeline parallelism approach with balanced partitioning and micro-batch slicing W Liu, Z Lai, S Li, Y Duan, K Ge, D Li 2022 IEEE International Conference on Cluster Computing (CLUSTER), 301-312, 2022 | 6 | 2022 |
Deep discriminative clustering network X Shaol, K Ge, H Su, L Luo, B Peng, D Li 2018 International Joint Conference on Neural Networks (IJCNN), 1-7, 2018 | 5 | 2018 |
Hph: Hybrid parallelism on heterogeneous clusters for accelerating large-scale dnns training Y Duan, Z Lai, S Li, W Liu, K Ge, P Liang, D Li 2022 IEEE International Conference on Cluster Computing (CLUSTER), 313-323, 2022 | 4 | 2022 |
S2 reducer: High-performance sparse communication to accelerate distributed deep learning K Ge, Y Fu, Y Zhang, Z Lai, X Deng, D Li ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 3 | 2022 |
Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE Models W Wang, Z Lai, S Li, W Liu, K Ge, Y Liu, A Shen, D Li 2023 IEEE International Conference on Cluster Computing (CLUSTER), 82-94, 2023 | 2 | 2023 |
Auto-Divide GNN: Accelerating GNN Training with Subgraph Division H Chen, Z Ran, K Ge, Z Lai, J Jiang, D Li European Conference on Parallel Processing, 367-382, 2023 | 1 | 2023 |
Accelerate distributed deep learning with cluster-aware sketch quantization K Ge, Y Zhang, Y Fu, Z Lai, X Deng, D Li Science China Information Sciences 66 (6), 162102, 2023 | 1 | 2023 |
Compressed Collective Sparse-Sketch for Distributed Data-Parallel Training of Deep Learning Models K Ge, K Lu, Y Fu, X Deng, Z Lai, D Li IEEE Journal on Selected Areas in Communications 41 (4), 941-963, 2023 | 1 | 2023 |
BRGraph: An efficient graph neural network training system by reusing batch data on GPU K Ge, Z Ran, Z Lai, L Zhang, D Li Concurrency and Computation: Practice and Experience 34 (15), e6961, 2022 | 1 | 2022 |
Casq: Accelerate distributed deep learning with sketch-based gradient quantization K Ge, Y Zhang, Y Fu, Z Lai, X Deng, D Li 2021 IEEE International Conference on Cluster Computing (CLUSTER), 825-826, 2021 | 1 | 2021 |
Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training S Li, Z Lai, Y Hao, W Liu, K Ge, X Deng, D Li, K Lu arXiv preprint arXiv:2305.16121, 2023 | | 2023 |
Tag Pollution Detection in Web Videos via Cross-Modal Relevance Estimation Y Chen, X Lin, K Ge, W He, D Li 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS), 1-10, 2020 | | 2020 |
一种有效求解非凸正则化线性支持向量机的并行与分布式方法 L Guan, T Sun, L Qiao, Z Yang, D Li, K Ge, X Lu, AL Guan, AT Sun, ... Frontiers 21 (4), 587-603, 2020 | | 2020 |
基于 GPU 的密度峰值并行聚类算法 (英文) K GE, H SU, D LI, X LU | | |