A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 3 Home / Browse O / On-Policy Learning On-Policy Learning Intermediate EN Share Print Learning only from current policy’s data. AdvertisementAd space — term-top Definition Full Definition Learning only from current policy’s data. Keywords policy consistency Domains AI Economics & Strategy Related Terms Exploration-Exploitation Tradeoff related to Balancing learning new behaviors vs exploiting known rewards. Agent Loop related to Continuous cycle of observation, reasoning, action, and feedback. Toolformer related to Models trained to decide when to call tools. Planner-Executor related to Separates planning from execution in agent architectures. Bellman Equation related to Fundamental recursive relationship defining optimal value functions. Policy Gradient related to Optimizing policies directly via gradient ascent on expected reward. Actor-Critic related to Combines value estimation (critic) with policy learning (actor). Off-Policy Learning related to Learning from data generated by a different policy.