Formula

Reward functions and performance metrics (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)

For different purposes, different reward functions were used. 1. The Goal: Maximize likelihood of expected number of recalled items: R(s,)=i=1nP[Zi=1s]R (s, \cdot) = \sum_{i=1}^{n} P[Z_i=1 | s] 2. The Goal: Maximize likelihood of recalling all items: R(s,)=i=1nlogP[ZI=1s]R(s, \cdot) = \sum_{i=1}^{n} \log P[Z_I = 1 | s] In the paper the authors have defined the reward function as the average of the sum of the correct answers at every time step: R(s,)=iZiR(s, \cdot) = \sum_i Z_i, where ZiPi(s)Z_i \sim P_i(\cdot | s). The reward function for the LSTM: RRNN=i=0nPRNN(Zijo0:j1)R_{RNN} = \sum_{i=0}^{n}P_{RNN}(Z_{i}^{j} | o_{0:j-1}). Here nn denotes the number of items, jj the current interaction step, PRNNP_{RNN} the probability that the user will answer correctly item ii, and ot=(Zij,i)o_t = (Z_{i}^{j}, i).

0

1

Updated 2026-05-16

Tags

Data Science

Related