Multiple Choice

Consider a scenario where for a given input x\mathbf{x}, there are only two possible outputs, y1\mathbf{y}_1 and y2\mathbf{y}_2. A reference model assigns probabilities πref(y1x)=0.6\pi_{\text{ref}}(\mathbf{y}_1|\mathbf{x}) = 0.6 and πref(y2x)=0.4\pi_{\text{ref}}(\mathbf{y}_2|\mathbf{x}) = 0.4. A reward function gives scores r(x,y1)=2r(\mathbf{x}, \mathbf{y}_1) = 2 and r(x,y2)=1r(\mathbf{x}, \mathbf{y}_2) = 1. Assuming the scaling factor β\beta is 1, what is the value of the normalization factor Z(x)Z(\mathbf{x}), which is calculated as Z(x)=yπref(yx)exp(r(x,y))Z(\mathbf{x}) = \sum_{\mathbf{y}} \pi_{\text{ref}}(\mathbf{y}|\mathbf{x}) \exp(r(\mathbf{x}, \mathbf{y}))?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science