Ahmad Abdelfattah
Ahmad Abdelfattah
Research Scientist, Innovative Computing Laboratory, University of Tennessee
Verified email at icl.utk.edu
Title
Cited by
Cited by
Year
Performance, design, and autotuning of batched GEMM for GPUs
A Abdelfattah, A Haidar, S Tomov, J Dongarra
International Conference on High Performance Computing, 21-38, 2016
832016
Parallel programming models for dense linear algebra on heterogeneous systems
J Dongarra, M Abalenkovs, A Abdelfattah, M Gates, A Haidar, J Kurzak, ...
Supercomputing frontiers and innovations 2 (4), 67-86, 2016
482016
High-performance matrix-matrix multiplications of very small matrices
I Masliah, A Abdelfattah, A Haidar, S Tomov, M Baboulin, J Falcou, ...
European Conference on Parallel Processing, 659-671, 2016
462016
High-performance tensor contractions for GPUs
A Abdelfattah, M Baboulin, V Dobrev, J Dongarra, C Earl, J Falcou, ...
Procedia Computer Science 80, 108-118, 2016
462016
The design of fast and energy-efficient linear solvers: On the potential of half-precision arithmetic and iterative refinement techniques
A Haidar, A Abdelfattah, M Zounon, P Wu, S Pranesh, S Tomov, ...
International Conference on Computational Science, 586-600, 2018
362018
Kblas: An optimized library for dense matrix-vector multiplication on gpu accelerators
A Abdelfattah, D Keyes, H Ltaief
ACM Transactions on Mathematical Software (TOMS) 42 (3), 1-31, 2016
362016
With extreme computing, the rules have changed
J Dongarra, S Tomov, P Luszczek, J Kurzak, M Gates, I Yamazaki, H Anzt, ...
Computing in Science & Engineering 19 (3), 52-62, 2017
332017
A novel fast and accurate pseudo-analytical simulation approach for MOAO
E Gendron, A Charara, A Abdelfattah, D Gratadour, D Keyes, H Ltaief, ...
Adaptive Optics Systems IV 9148, 91486L, 2014
312014
C++ api for blas and lapack
M Gates, P Luszczek, A Abdelfattah, J Kurzak, J Dongarra, K Arturov, ...
SLATE Working Notes, 2017
21*2017
Fast Cholesky factorization on GPUs for batch and native modes in MAGMA
A Abdelfattah, A Haidar, S Tomov, J Dongarra
Journal of Computational Science 20, 85-93, 2017
182017
Pipelining computational stages of the tomographic reconstructor for multi-object adaptive optics on a multi-gpu system
A Charara, H Ltaief, D Gratadour, D Keyes, A Sevin, A Abdelfattah, ...
SC'14: Proceedings of the International Conference for High Performance …, 2014
182014
Optimizing memory-bound SYMV kernel on GPU hardware accelerators
A Abdelfattah, J Dongarra, D Keyes, H Ltaief
International Conference on High Performance Computing for Computational …, 2012
182012
Fast batched matrix multiplication for small sizes using half-precision arithmetic on gpus
A Abdelfattah, S Tomov, J Dongarra
2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2019
172019
A guide for achieving high performance with very small matrices on GPU: A case study of batched LU and Cholesky factorizations
A Haidar, A Abdelfattah, M Zounon, S Tomov, J Dongarra
IEEE Transactions on Parallel and Distributed Systems 29 (5), 973-984, 2017
162017
A survey of numerical methods utilizing mixed precision arithmetic
A Abdelfattah, H Anzt, EG Boman, E Carson, T Cojean, J Dongarra, ...
arXiv preprint arXiv:2007.06674, 2020
152020
Systematic approach in optimizing numerical memory-bound kernels on GPU
A Abdelfattah, D Keyes, H Ltaief
European Conference on Parallel Processing, 207-216, 2012
152012
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs
A Abdelfattah, A Haidar, S Tomov, J Dongarra
Proceedings of the International Conference on Supercomputing, 1-10, 2017
142017
Performance tuning and optimization techniques of fixed and variable size batched Cholesky factorization on GPUs
A Abdelfattah, A Haidar, S Tomov, J Dongarra
Procedia Computer Science 80, 119-130, 2016
132016
Roadmap for the development of a linear algebra library for exascale computing: SLATE: Software for linear algebra targeting exascale
A Abdelfattah, H Anzt, A Bouteiller, A Danalis, J Dongarra, M Gates, ...
SLATE Working Notes 1, 2017
122017
On the development of variable size batched computation for heterogeneous parallel architectures
A Abdelfattah, A Haidar, S Tomov, J Dongarra
2016 IEEE International Parallel and Distributed Processing Symposium …, 2016
112016
The system can't perform the operation now. Try again later.
Articles 1–20