Complexity Analysis and Algorithm Design for Reorganizing Data to Minimize Non-Coalesced GPU Memory Accesses B Wu, Z Zhao, E Zhang, Y Jiang, X Shen ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013 | 127* | 2013 |
Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations B Wu, G Chen, D Li, X Shen, J Vetter The 29th International Conference on Supercomputing, 2015 | 116 | 2015 |
Automine: harmonizing high-level abstraction and high performance for graph mining D Mawhirter, B Wu Proceedings of the 27th ACM Symposium on Operating Systems Principles, 509-523, 2019 | 100 | 2019 |
Flep: Enabling flexible and efficient preemption on gpus B Wu, X Liu, X Zhou, C Jiang ACM SIGPLAN Notices 52 (4), 483-496, 2017 | 86 | 2017 |
Can PCM Benefit GPU? Reconciling Hybrid Memory Design with GPU Massive Parallelism for Energy Efficiency B Wang, B Wu, D Li, X Shen, W Yu, Y Jiao, J Vetter The 22nd International Conference on Parallel Architectures and Compilation …, 2013 | 83* | 2013 |
PORPLE: An Extensible Optimizer for Portable Data Placement on GPU G Chen, B Wu, D Li, X Shen The 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014 | 82 | 2014 |
Graphie: Large-scale asynchronous graph traversals on just a GPU W Han, D Mawhirter, B Wu, M Buland 2017 26th International Conference on Parallel Architectures and Compilation …, 2017 | 76 | 2017 |
Grnn: Low-latency and scalable rnn inference on gpus C Holmes, D Mawhirter, Y He, F Yan, B Wu Proceedings of the Fourteenth EuroSys Conference 2019, 1-16, 2019 | 63 | 2019 |
FinePar: Irregularity-aware fine-grained workload partitioning on integrated architectures F Zhang, B Wu, J Zhai, B He, W Chen 2017 IEEE/ACM International Symposium on Code Generation and Optimization …, 2017 | 58 | 2017 |
ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs X Liu, B Wu The International Conference for High Performance Computing, Networking …, 2015 | 56 | 2015 |
Challenging the" embarrassingly sequential" parallelizing finite state machine-based computations through principled speculation Z Zhao, B Wu, X Shen ACM SIGARCH Computer Architecture News 42 (1), 543-558, 2014 | 53 | 2014 |
Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters W Zhang, W Cui, K Fu, Q Chen, DE Mawhirter, B Wu, C Li, M Guo Proceedings of the ACM international conference on supercomputing, 58-68, 2019 | 45 | 2019 |
Graphzero: A high-performance subgraph matching system D Mawhirter, S Reinehr, C Holmes, T Liu, B Wu ACM SIGOPS Operating Systems Review 55 (1), 21-37, 2021 | 39 | 2021 |
Co-run scheduling with power cap on integrated cpu-gpu systems Q Zhu, B Wu, X Shen, L Shen, Z Wang 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2017 | 37 | 2017 |
Graphzero: Breaking symmetry for efficient graph mining D Mawhirter, S Reinehr, C Holmes, T Liu, B Wu arXiv preprint arXiv:1911.12877, 2019 | 30 | 2019 |
Automatic irregularity-aware fine-grained workload partitioning on integrated architectures F Zhang, J Zhai, B Wu, B He, W Chen, X Du IEEE Transactions on Knowledge and Data Engineering 33 (3), 867-881, 2019 | 27 | 2019 |
Graphphi: efficient parallel graph processing on emerging throughput-oriented architectures Z Peng, A Powell, B Wu, T Bicer, B Ren Proceedings of the 27th International Conference on Parallel Architectures …, 2018 | 25 | 2018 |
Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control B Wu, EZ Zhang, X Shen The Twentieth International Conference on Parallel Architectures and …, 2011 | 25 | 2011 |
Evaluating large language models on graphs: Performance insights and comparative analysis C Liu, B Wu arXiv preprint arXiv:2308.11224, 2023 | 23 | 2023 |
Enabling scalability-sensitive speculative parallelization for fsm computations J Qiu, Z Zhao, B Wu, A Vishnu, SL Song Proceedings of the International Conference on Supercomputing, 1-10, 2017 | 22 | 2017 |