Safety Filter
IntermediateAutomated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
AdvertisementAd space — term-top
Definition
Full Definition
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).