Results for "layers"
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Methods to set starting weights to preserve signal/gradient scales across layers.
Architecture based on self-attention and feedforward layers; foundation of modern LLMs and many multimodal models.
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Allows gradients to bypass layers, enabling very deep networks.
Tradeoffs between many layers vs many neurons per layer.