Ammar Ahmad Awan
Ammar Ahmad Awan
Microsoft
Verified email at osu.edu - Homepage
Title
Cited by
Cited by
Year
S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters
AA Awan, K Hamidouche, JM Hashmi, DK Panda
ACM PPoPP '17 52 (8), 193-205, 2017
942017
Privacy-aware searching with oblivious term matching for cloud storage
Z Pervez, AA Awan, AM Khattak, S Lee, EN Huh
The Journal of Supercomputing 63 (2), 538-560, 2013
402013
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning
AA Awan, K Hamidouche, A Venkatesh, DK Panda
Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016
272016
An in-depth performance characterization of CPU-and GPU-based DNN training on modern architectures
AA Awan, H Subramoni, DK Panda
Proceedings of the Machine Learning on HPC Environments, 1-8, 2017
242017
Optimized broadcast for deep learning workloads on dense-GPU InfiniBand clusters: MPI or NCCL?
AA Awan, CH Chu, H Subramoni, DK Panda
Proceedings of the 25th European MPI Users' Group Meeting, 1-9, 2018
232018
Scalable distributed dnn training using tensorflow and cuda-aware mpi: Characterization, designs, and performance evaluation
AA Awan, J Bedorf, CH Chu, H Subramoni, DK Panda
arXiv preprint arXiv:1810.11112, 2018
142018
CUDA kernel based collective reduction operations on large-scale GPU clusters
CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
132016
Intercloud message exchange middleware
MB Amin, WA Khan, AA Awan, S Lee
Proceedings of the 6th International Conference on Ubiquitous Information …, 2012
122012
Efficient and scalable multi-source streaming broadcast on gpu clusters for deep learning
CH Chu, X Lu, AA Awan, H Subramoni, J Hashmi, B Elton, DK Panda
2017 46th International Conference on Parallel Processing (ICPP), 161-170, 2017
112017
OC-DNN: Exploiting advanced unified memory capabilities in CUDA 9 and volta GPUs for out-of-core DNN training
AA Awan, CH Chu, H Subramoni, X Lu, DK Panda
2018 IEEE 25th International Conference on High Performance Computing (HiPC …, 2018
102018
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
2015 IEEE International Conference on Cluster Computing, 78-87, 2015
102015
Designing non-blocking personalized collectives with near perfect overlap for rdma-enabled clusters
H Subramoni, AA Awan, K Hamidouche, D Pekurovsky, A Venkatesh, ...
International Conference on High Performance Computing, 434-453, 2015
92015
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
Parallel Computing 58, 27-36, 2016
72016
Exploiting hardware multicast and GPUDirect RDMA for efficient broadcast
CH Chu, X Lu, AA Awan, H Subramoni, B Elton, DK Panda
IEEE Transactions on Parallel and Distributed Systems 30 (3), 575-588, 2018
62018
Performance characterization of dnn training using tensorflow and pytorch on modern clusters
A Jain, AA Awan, Q Anthony, H Subramoni, DKDK Panda
2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-11, 2019
32019
On-demand connection management for OpenSHMEM and OpenSHMEM+ MPI
S Chakraborty, H Subramoni, J Perkins, AA Awan, DK Panda
2015 IEEE International Parallel and Distributed Processing Symposium …, 2015
32015
Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera
A Jain, AA Awan, H Subramoni, DK Panda
2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS), 76-83, 2019
22019
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects
AA Awan, A Jain, CH Chu, H Subramoni, DK Panda
IEEE Micro 40 (1), 35-43, 2019
22019
Optimized large-message broadcast for deep learning workloads: MPI, MPI+ NCCL, or NCCL2?
AA Awan, KV Manian, CH Chu, H Subramoni, DK Panda
Parallel Computing 85, 141-152, 2019
22019
High performance distributed deep learning: a beginner's guide
DK Panda, AA Awan, H Subramoni
Proceedings of the 24th Symposium on Principles and Practice of Parallel …, 2019
22019
The system can't perform the operation now. Try again later.
Articles 1–20