1Cademy - Consider a scenario where for a given input $\mathbf{x}$, there are only two possible outputs, $\mathbf{y}_1$ and $\mathbf{y}_2$. A reference model assigns probabilities $\pi_{\text{ref}}(\mathbf{y}_1|\mathbf{x}) = 0.6$ and $\pi_{\text{ref}}(\mathbf{y}_2|\mathbf{x}) = 0.4$. A reward function gives scores $r(\mathbf{x}, \mathbf{y}_1) = 2$ and $r(\mathbf{x}, \mathbf{y}_2) = 1$. Assuming the scaling factor $\beta$ is 1, what is the value of the normalization factor $Z(\mathbf{x})$, which is calculated as $Z(\mathbf{x}) = \sum_{\mathbf{y}} \pi_{\text{ref}}(\mathbf{y}|\mathbf{x}) \exp(r(\mathbf{x}, \mathbf{y}))$?

Learn Before

Normalization Factor for a Reward-Weighted Policy

Multiple Choice

Consider a scenario where for a given input $\mathbf{x}$ , there are only two possible outputs, $\mathbf{y}_1$ and $\mathbf{y}_2$ . A reference model assigns probabilities $\pi_{\text{ref}}(\mathbf{y}_1|\mathbf{x}) = 0.6$ and $\pi_{\text{ref}}(\mathbf{y}_2|\mathbf{x}) = 0.4$ . A reward function gives scores $r(\mathbf{x}, \mathbf{y}_1) = 2$ and $r(\mathbf{x}, \mathbf{y}_2) = 1$ . Assuming the scaling factor $\beta$ is 1, what is the value of the normalization factor $Z(\mathbf{x})$ , which is calculated as $Z(\mathbf{x}) = \sum_{\mathbf{y}} \pi_{\text{ref}}(\mathbf{y}|\mathbf{x}) \exp(r(\mathbf{x}, \mathbf{y}))$ ?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related