Ammar Ahmad Awan
TitleCited byYear
S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters
AA Awan, K Hamidouche, JM Hashmi, DK Panda
ACM PPoPP '17 52 (8), 193-205, 2017
782017
Privacy-aware searching with oblivious term matching for cloud storage
Z Pervez, AA Awan, AM Khattak, S Lee, EN Huh
The Journal of Supercomputing 63 (2), 538-560, 2013
372013
Optimized broadcast for deep learning workloads on dense-GPU InfiniBand clusters: MPI or NCCL?
AA Awan, CH Chu, H Subramoni, DK Panda
Proceedings of the 25th European MPI Users' Group Meeting, 1-9, 2018
212018
An in-depth performance characterization of CPU-and GPU-based DNN training on modern architectures
AA Awan, H Subramoni, DK Panda
Proceedings of the Machine Learning on HPC Environments, 1-8, 2017
192017
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning
AA Awan, K Hamidouche, A Venkatesh, DK Panda
Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016
192016
Efficient and scalable multi-source streaming broadcast on gpu clusters for deep learning
CH Chu, X Lu, AA Awan, H Subramoni, J Hashmi, B Elton, DK Panda
2017 46th International Conference on Parallel Processing (ICPP), 161-170, 2017
92017
Cuda kernel based collective reduction operations on large-scale gpu clusters
CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
92016
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
2015 IEEE International Conference on Cluster Computing, 78-87, 2015
92015
Intercloud message exchange middleware
MB Amin, WA Khan, AA Awan, S Lee
Proceedings of the 6th International Conference on Ubiquitous Information …, 2012
92012
OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training
AA Awan, CH Chu, H Subramoni, X Lu, DK Panda
2018 IEEE 25th International Conference on High Performance Computing (HiPC …, 2018
72018
Scalable distributed DNN training using TensorFlow and CUDA-aware MPI: characterization, designs, and performance evaluation
AA Awan, J Bedorf, CH Chu, H Subramoni, DK Panda
arXiv preprint arXiv:1810.11112, 2018
72018
Designing non-blocking personalized collectives with near perfect overlap for rdma-enabled clusters
H Subramoni, AA Awan, K Hamidouche, D Pekurovsky, A Venkatesh, ...
International Conference on High Performance Computing, 434-453, 2015
72015
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
Parallel Computing 58, 27-36, 2016
62016
Exploiting hardware multicast and GPUDirect RDMA for efficient broadcast
CH Chu, X Lu, AA Awan, H Subramoni, B Elton, DK Panda
IEEE Transactions on Parallel and Distributed Systems 30 (3), 575-588, 2018
52018
High performance distributed deep learning: a beginner's guide
DK Panda, AA Awan, H Subramoni
Proceedings of the 24th Symposium on Principles and Practice of Parallel …, 2019
22019
CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC
K Hamidouche, AA Awan, A Venkatesh, DK Panda
2016 IEEE 23rd International Conference on High Performance Computing (HiPC …, 2016
22016
GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks
AA Awan, K Hamidouche, A Venkatesh, J Perkins, H Subramoni, ...
Proceedings of the 22nd European MPI Users' Group Meeting, 1-10, 2015
22015
On-demand connection management for OpenSHMEM and OpenSHMEM+ MPI
S Chakraborty, H Subramoni, J Perkins, AA Awan, DK Panda
2015 IEEE International Parallel and Distributed Processing Symposium …, 2015
22015
A case for non-blocking collectives in OpenSHMEM: design, implementation, and performance evaluation using MVAPICH2-X
AA Awan, K Hamidouche, CH Chu, D Panda
Workshop on OpenSHMEM and Related Technologies, 69-86, 2014
22014
Towards Efficient Support for Parallel I/O in Java HPC
AA Awan, MS Ayub, A Shafi, S Lee
2012 13th International Conference on Parallel and Distributed Computing …, 2012
22012
The system can't perform the operation now. Try again later.
Articles 1–20