RLHF
IntermediateReinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
AdvertisementAd space — term-top
Definition
Full Definition
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.