Boyang Yan

Home

❯

posts

❯

Notes on Fu, Xue & Huang et al., ServerlessLLM: Low Latency Serverless Inference for Large Language Models

Notes on Fu, Xue & Huang et al., ServerlessLLM: Low-Latency Serverless Inference for Large Language Models

Oct 31, 20251 min read

Large Language Models (LLMs)

vLLM serviceless

https://www.youtube.com/watch?v=beBTI7oyleg


Graph View

Created with Quartz v4.5.2 © 2025