1 result for “off-policy”
Q-learning is a model-free, off-policy reinforcement learning algorithm that learns the value of taking a given action in a given state, enabling an agent to derive an optimal policy through trial and error.