Alignment
IntermediateEnsuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
AdvertisementAd space — term-top
Definition
Full Definition
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.