Concept
Formulation (Accelerating Human Learning with Deep Reinforcement Learning)
In the context of optimizing spaced repetition via model-free reinforcement learning, the environment is formulated as a partially-observable Markov decision process.
- State Space (): Depends on the student model.
- For the EFC (exponential forgetting curve) model: , encoding item difficulty, delay, and memory strength.
- For the HLR (half-life regression) model: , encoding model parameters, delay, and memory strength.
- For the GPL (generalized-power-law) model: , encoding student ability, item difficulty, number of attempts, and number of correct answers over windows for items.
- Observation Space: The agent can only access observations, not the full state. At every step, the observation set stores whether the student remembered the shown item or not:
- Agent/Action Space: Consists of items that can be shown to the student.
- Reward Function (): Depending on the goal, there are two distinct functions:
- Maximizing expected items recalled: mathcal{R}(s, bullet) = sum_{i=1}^n P [Z_i = 1 | s]
- Maximizing the likelihood of recalling all items:
Note: The discount factor () influences agent actions. A smaller encourages intensive studying, while a larger focuses on long-lasting learning.
0
1
Updated 2026-06-07
Contributors are:
Who are from:
Tags
Data Science