Search: instruction exposure — Dictionary of AI

Exposure Bias Intermediate

Differences between training and inference conditions.

Model Failure Modes

System Prompt Intermediate

A high-priority instruction layer setting overarching behavior constraints for a chat model.

Reinforcement Learning

SFT Intermediate

Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.

Foundations & Theory

Zero-Shot Prompting Intro

Task instruction without examples.

Prompting & Instructions

Privacy Attack Intermediate

Attacks that infer whether specific records were in training data, or reconstruct sensitive training examples.

Foundations & Theory

Secure Inference Intermediate

Methods to protect model/data during inference (e.g., trusted execution environments) from operators/attackers.

Foundations & Theory

Risk Model Intermediate

Quantifying financial risk.

AI Economics & Strategy

Safety Filter Intermediate

Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).

Foundations & Theory

One-Shot Prompting Intro

One example included to guide output.

Prompting & Instructions

Few-Shot Prompting Intro

Multiple examples included in prompt.

Prompting & Instructions

Natural Language Instruction Frontier

Controlling robots via language.

World Models & Cognition

Results for "instruction exposure"