Normalization Factor for a Reward-Weighted Policy
The normalization factor, often denoted as , is a crucial component for converting an unnormalized, reward-weighted function into a valid probability distribution. It is calculated by summing or integrating the product of a reference policy, , and an exponentiated, scaled reward, , over the entire domain of possible outputs . The formula is: By dividing the unnormalized function by this factor, the resulting distribution is guaranteed to sum to one.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Normalization Factor for a Reward-Weighted Policy
A function assigns the following unnormalized scores to three possible discrete outcomes:
score(A) = 12,score(B) = 7, andscore(C) = 1. To transform these scores into a valid probability distributionP(outcome), each score must be divided by a normalization factor calculated from the sum of all scores. What is the resulting probability for outcome B,P(B)?From Model Scores to Probabilities
Converting Model Scores to Probabilities
Learn After
Reward-Weighted Probability Distribution
Consider a scenario where for a given input , there are only two possible outputs, and . A reference model assigns probabilities and . A reward function gives scores and . Assuming the scaling factor is 1, what is the value of the normalization factor , which is calculated as ?
Consider the calculation of a normalization factor using the formula: If the reward function consistently returns a value of 0 for all possible outputs , the normalization factor will always be equal to 1.
Impact of Scaling Factor on Normalization