attention head

An attention head is the core functional unit inside a Transformer’s multi-head attention layer. It acts as a specialized data-retrieval system that determines how much “importance” or focus one token in a sequence (e.g., a word) should place on other tokens, capturing specific syntactic or semantic relationships.

Reference List

https://towardsdatascience.com/transformers-explained-visually-part-3-multi-head-attention-deep-dive-1c1ff1024853/

Boyang Yan

Explorer

attention head

Reference List

Graph View

Backlinks