1 result for “post-training”
Reward modeling is the process of training a neural network to predict human or AI preferences over model outputs, providing a scalable reward signal for reinforcement learning-based alignment of language models.