Eval Harness
IntermediateSystem for running consistent evaluations across tasks, versions, prompts, and model settings.
AdvertisementAd space — term-top
Definition
Full Definition
System for running consistent evaluations across tasks, versions, prompts, and model settings.