prefix caching

Prefix caching in one sentence

Prefix caching stores the attention Key/Value tensors computed for the beginning of a prompt so that, when another request begins with the same tokens, the model does not have to process those tokens again.

Boyang Yan

Explorer

prefix caching

Graph View

Backlinks