Domain: Transformers & LLMs
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Maximum number of tokens the model can attend to in one forward pass; constrains long-document reasoning.
Attention where queries/keys/values come from the same sequence, enabling token-to-token interactions.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
The set of tokens a model can represent; impacts efficiency, multilinguality, and handling of rare strings.