Results for "decoding"
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Stores past attention states to speed up autoregressive decoding.
Samples from the k highest-probability tokens to limit unlikely outputs.