1 result for “AI feedback”
Reinforcement Learning from AI Feedback (RLAIF) is an alignment technique in which an AI model provides preference labels to train a reward model, replacing or supplementing expensive human annotation used in RLHF.