Transformer
Masked Self-attention
Masked Self-attention Overview
Masked self-attention is a variant of self-attention that prevents models from accessing future information. It is commonly used in language models to ensure predictions are made based on past and present data only.
Applications
- Language Modeling: Predicting the next word in a sequence.
- Sequence-to-Sequence Models: Ensuring proper information flow in tasks like translation.
Was this page helpful?