Results for "instruction exposure"
A high-priority instruction layer setting overarching behavior constraints for a chat model.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
Task instruction without examples.
Differences between training and inference conditions.
Controlling robots via language.