Search Results
2 results for “reward”
Foundations
Reinforcement Learning
A machine learning paradigm in which an agent learns to make sequential decisions by interacting with an environment and optimising for cumulative reward through trial and error.
7 min readUpdated June 2026
Foundations
Reinforcement Learning from Human Feedback
A machine learning technique that trains a reward model from human preference data and uses it to align large language models with human values, safety requirements, and intended behaviour through reinforcement learning.
7 min readUpdated May 2026