Boyang Yan

Home

❯

posts

❯

attention head

attention head

Jun 20, 20261 min read

An attention head is the core functional unit inside a Transformer’s multi-head attention layer. It acts as a specialized data-retrieval system that determines how much “importance” or focus one token in a sequence (e.g., a word) should place on other tokens, capturing specific syntactic or semantic relationships.

Reference List

  1. https://towardsdatascience.com/transformers-explained-visually-part-3-multi-head-attention-deep-dive-1c1ff1024853/

Graph View

Backlinks

  • Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

Created with Quartz v4.5.2 © 2026