Jiajia Li
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
J Li, G Tan, M Chen, N Sun
Proceedings of the 34th ACM SIGPLAN conference on Programming language …, 2013
FROSTT: The formidable repository of open sparse tensors and tools
S Smith, JW Choi, J Li, R Vuduc, J Park, X Liu, G Karypis
An input-adaptive and in-place approach to dense tensor-times-matrix multiply
J Li, C Battaglino, I Perros, J Sun, R Vuduc
SC'15: Proceedings of the International Conference for High Performance …, 2015
Model-driven sparse CP decomposition for higher-order tensors
J Li, J Choi, I Perros, J Sun, R Vuduc
2017 IEEE international parallel and distributed processing symposium (IPDPS …, 2017
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
A Li, SL Song, J Chen, J Li, X Liu, N Tallent, K Barker
https://ieeexplore.ieee.org/document/8763922, 2019
HiCOO: Hierarchical storage of sparse tensors
J Li, J Sun, R Vuduc
SC18: International Conference for High Performance Computing, Networking …, 2018
Bridging the gap between deep learning and sparse matrix format selection
Y Zhao, J Li, C Liao, X Shen
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of …, 2018
Understanding the gpu microarchitecture to achieve bare-metal performance tuning
X Zhang, G Tan, S Xue, J Li, K Zhou, M Chen
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of …, 2017
Optimizing sparse tensor times matrix on multi-core and many-core architectures
J Li, Y Ma, C Yan, R Vuduc
2016 6th Workshop on Irregular Applications: Architecture and Algorithms …, 2016
Optimizing sparse tensor times matrix on GPUs
Y Ma, J Li, X Wu, C Yan, J Sun, R Vuduc
Journal of Parallel and Distributed Computing 129, 99-109, 2019
An initial characterization of the Emu Chick
E Hein, T Conte, J Young, S Eswar, J Li, P Lavin, R Vuduc, J Riedy
2018 IEEE International Parallel and Distributed Processing Symposium …, 2018
An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs
J Li, X Li, G Tan, M Chen, N Sun
Proceedings of the 26th ACM international conference on Supercomputing, 377-386, 2012
A pattern based algorithmic autotuner for graph processing on GPUs
K Meng, J Li, G Tan, N Sun
Proceedings of the 24th Symposium on Principles and Practice of Parallel …, 2019
Load-Balanced Sparse MTTKRP on GPUs
I Nisa, J Li, A Sukumaran-Rajam, R Vuduc, P Sadayappan
https://ieeexplore.ieee.org/document/8821030, 2019
Design and implementation of adaptive spmv library for multicore and many-core architecture
G Tan, J Liu, J Li
ACM Transactions on Mathematical Software (TOMS) 44 (4), 1-25, 2018
A microbenchmark characterization of the Emu Chick
JS Young, E Hein, S Eswar, P Lavin, J Li, J Riedy, R Vuduc, T Conte
Parallel Computing 87, 60-69, 2019
Efficient and effective sparse tensor reordering
J Li, B Uçar, ÜV Çatalyürek, J Sun, K Barker, R Vuduc
Proceedings of the ACM International Conference on Supercomputing, 227-237, 2019
Introducing high performance computing concepts into engineering undergraduate curriculum: a success story
B Neelima, J Li
Proceedings of the Workshop on Education for High-Performance Computing, 1-8, 2015
PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite
J Li, Y Ma, X Wu, A Li, K Barker
https://link.springer.com/article/10.1007/s42514-019-00012-w 1 (2), 111–130, 2019
Programming strategies for irregular algorithms on the emu chick
ER Hein, S Eswar, A Yaşar, J Li, JS Young, TM Conte, ÜV Çatalyürek, ...
ACM Transactions on Parallel Computing (TOPC) 7 (4), 1-25, 2020
