Prefix caching in one sentence
Prefix caching stores the attention Key/Value tensors computed for the beginning of a prompt so that, when another request begins with the same tokens, the model does not have to process those tokens again.
Prefix caching in one sentence
Prefix caching stores the attention Key/Value tensors computed for the beginning of a prompt so that, when another request begins with the same tokens, the model does not have to process those tokens again.