Transformer
Self-attention
Self-attention Overview
Self-attention is a mechanism that allows models to focus on important parts of the input data. It is a key component of Transformer models, enabling them to capture dependencies between words in a sequence.
Key Concepts
- Attention Scores: Determine the importance of each word in the context of others.
- Scaled Dot-Product Attention: A common method for calculating attention scores.
- Multi-head Attention: Combines multiple attention mechanisms for better performance.
Was this page helpful?