Concept

Formulation (Accelerating Human Learning with Deep Reinforcement Learning)

In the context of optimizing spaced repetition via model-free reinforcement learning, the environment is formulated as a partially-observable Markov decision process.

  • State Space (SS): Depends on the student model.
    • For the EFC (exponential forgetting curve) model: S=R+3nS = \mathbb{R}_{+}^{3n}, encoding item difficulty, delay, and memory strength.
    • For the HLR (half-life regression) model: S=θ×(R+×X)nS = \theta \times (\mathbb{R}_{+} \times X)^n, encoding model parameters, delay, and memory strength.
    • For the GPL (generalized-power-law) model: S=R×(R×N2W)nS = \mathbb{R} \times (\mathbb{R} \times \mathbb{N}^{2W})^n, encoding student ability, item difficulty, number of attempts, and number of correct answers over WW windows for nn items.
  • Observation Space: The agent can only access observations, not the full state. At every step, the observation set stores whether the student remembered the shown item or not: O(zs,α)=P[Zα=zs]O(z | s, \alpha) = P [Z_{\alpha} = z | s]
  • Agent/Action Space: Consists of nn items that can be shown to the student.
  • Reward Function (R\mathcal{R}): Depending on the goal, there are two distinct functions:
    • Maximizing expected items recalled: mathcal{R}(s, bullet) = sum_{i=1}^n P [Z_i = 1 | s]
    • Maximizing the likelihood of recalling all items: R(s,)=i=1nlogP[Zi=1s]\mathcal{R}(s, \bullet) = \sum_{i=1}^n \log P [Z_i = 1 | s]

Note: The discount factor (γ\gamma) influences agent actions. A smaller γ\gamma encourages intensive studying, while a larger γ\gamma focuses on long-lasting learning.

0

1

Updated 2026-06-07

Tags

Data Science