Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units Y Choi, M Rhu 2020 IEEE International Symposium on High Performance Computer Architecture …, 2020 | 150 | 2020 |
Lazy batching: An sla-aware batching system for cloud machine learning inference Y Choi, Y Kim, M Rhu 2021 IEEE International Symposium on High-Performance Computer Architecture …, 2021 | 69 | 2021 |
NeuMMU: Architectural support for efficient address translations in neural processing units B Hyun, Y Kwon, Y Choi, J Kim, M Rhu Proceedings of the Twenty-Fifth International Conference on Architectural …, 2020 | 34 | 2020 |
PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers Y Kim, Y Choi, M Rhu Proceedings of the 59th ACM/IEEE Design Automation Conference, 2022 | 17 | 2022 |
vtrain: A simulation framework for evaluating cost-effective and compute-optimal large language model training J Bang, Y Choi, M Kim, Y Kim, M Rhu 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO), 153-167, 2024 | 7 | 2024 |
Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations Y Choi, J Kim, M Rhu arXiv preprint arXiv:2302.11750, 2023 | 1 | 2023 |
PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers G Yeo, J Kim, Y Choi, M Rhu arXiv preprint arXiv:2411.19114, 2024 | | 2024 |
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models Y Choi, J Kim, M Rhu arXiv preprint arXiv:2406.06955, 2024 | | 2024 |