Boyang Yan

Home

❯

posts

❯

KVFlow: Efficient Prefix Caching for Accelerating LLM Based Multi Agent Workflows

KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows

Jun 19, 20261 min read

prefix caching

hierarchical radix cache

workflow-aware eviction policy overlapped KV prefetching mechanism

KV cache characteristics

  • KV Cache Size with Context Length
  • prefill latency
  • KV Cache Transmission (CPU to GPU)

Reference List

  1. https://arxiv.org/pdf/2507.07400
  2. https://github.com/PanZaifeng/KVFlow

Graph View

  • KV cache characteristics
  • Reference List

Backlinks

  • Agent-Specific KV-Cache Profiler with LangGraph, SGLang, and MLflow

Created with Quartz v4.5.2 © 2026