Follow
Antonio J. Peña
Title
Cited by
Cited by
Year
rCUDA: Reducing the number of GPU-based accelerators in high performance clusters
J Duato, AJ Pena, F Silla, R Mayo, ES Quintana-Ortí
2010 International Conference on High Performance Computing & Simulation …, 2010
3982010
A Complete and Efficient CUDA-Sharing Solution for HPC Clusters
AJ Peña, C Reaño, F Silla, R Mayo, ES Quintana-Ortí, J Duato
Parallel Computing 40 (10), 574-588, 2014
1342014
Chai: collaborative heterogeneous applications for integrated-architectures
J Gómez-Luna, I El Hajj, LW Chang, V Garcıa-Flores, SG de Gonzalo, ...
2017 IEEE International Symposium on Performance Analysis of Systems and …, 2017
1112017
Enabling CUDA acceleration within virtual machines using rCUDA
J Duato, AJ Pena, F Silla, JC Fernandez, R Mayo, ES Quintana-Orti
High Performance Computing (HiPC), 2011 18th International Conference on, 1-10, 2011
1082011
An efficient implementation of GPU virtualization in high performance clusters
J Duato, FD Igual, R Mayo, AJ Peña, ES Quintana-Ortí, F Silla
Euro-Par 2009–Parallel Processing Workshops, 385-394, 2010
902010
MPICH User’s Guide
A Amer, P Balaji, W Bland, W Gropp, R Latham, H Lu, L Oden, AJ Pena, ...
Version, 2015
72*2015
Performance of CUDA virtualized remote GPUs in high performance clusters
J Duato, AJ Pena, F Silla, R Mayo, ES Quintana-Orti
Parallel Processing (ICPP), 2011 International Conference on, 365-374, 2011
702011
MT-MPI: multithreaded MPI for many-core environments
M Si, AJ Peña, P Balaji, M Takagi, Y Ishikawa
Proceedings of the 28th ACM international conference on Supercomputing, 125-134, 2014
692014
Performance evaluation of cudnn convolution algorithms on nvidia volta gpus
M Jorda, P Valero-Lara, AJ Pena
IEEE Access 7, 70461-70473, 2019
682019
Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures
M Si, AJ Pena, J Hammond, P Balaji, M Takagi, Y Ishikawa
29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2015
592015
Automating the Application Data Placement in Hybrid Memory Systems
H Servat, AJ Pena, G Llort, E Mercadal, HC Hoppe, J Labarta
Cluster Computing (CLUSTER), 2017 IEEE International Conference on, 126-136, 2017
572017
Toward the Efficient Use of Multiple Explicitly Managed Memory Subsystems
AJ Pena, P Balaji
IEEE Cluster 2014, 2014
532014
CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution
C Reaño, AJ Peña, F Silla, J Duato, R Mayo, ES Quintana-Orti
High Performance Computing (HiPC), 2012 19th International Conference on, 1-10, 2012
502012
Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization
C Reano, R Mayo, ES Quintana-Ortı, F Silla, J Duato, AJ Pena
IEEE Cluster 2013, 2013
492013
Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications
V Garcıa, J Gomez-Luna, T Grass, A Rico, E Ayguade, AJ Pena
2016 IEEE International Symposium on Workload Characterization (IISWC), 1-10, 2016
432016
Integrating blocking and non-blocking MPI primitives with task-based programming models
K Sala, X Teruel, JM Perez, AJ Peña, V Beltran, J Labarta
Parallel Computing 85, 153-166, 2019
422019
Exploring the Vision Processing Unit as Co-Processor for Inference
S Rivas-Gomez, AJ Pena, D Moloney, E Laure, S Markidis
2018 IEEE International Parallel and Distributed Processing Symposium …, 2018
412018
MultiCL: Enabling Automatic Scheduling for Task-Parallel Workloads in OpenCL
AM Aji, AJ Peña, P Balaji, W Feng
Parallel Computing 58, 37-55, 2016
342016
cuThomasBatch and cuThomasVBatch, CUDA routines to compute batch of tridiagonal systems on NVIDIA GPUs
P Valero‐Lara, I Martínez‐Pérez, R Sirvent, X Martorell, AJ Peña
Concurrency and Computation: Practice and Experience 30 (24), e4909, 2018
302018
Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL
AM Aji, AJ Pena, P Balaji, W Feng
IEEE Cluster 2015, 2015
302015
The system can't perform the operation now. Try again later.
Articles 1–20