Boyang Yan

Home

❯

posts

❯

ForesightKV

ForesightKV

Jun 19, 20261 min read

reasoning model Group Relative Policy Optimization (GRPO) Eviction algorithm Pairwise Ranking Loss

AIME2024 and AIME2025 benchmarks

Markov Decision Process (MDP) KV cache eviction methods

Reference List

  1. https://arxiv.org/pdf/2602.03203

Graph View

Created with Quartz v4.5.2 © 2026