Formula

Location-Based Addressing (NTM Architecture)

When the content-based method is not well-suited for a problem, location-based addressing is used, which purely focuses on memory location using rotational shifting and weighting. Before rotational shifting is performed, both read and write heads produce a scalar interpolation gate gtg_t (between 0 and 1). This value acts as a blend between wt1w_{t-1} (the weight produced in the previous time step by the read/write head) and wtcw_t^c (generated by the content-based system) to return a gated weight, wtgw_t^g:

w_t^g leftarrow g_t w_t^c + (1 - g_t) w_{t-1}

Depending on the value of gtg_t, we might completely ignore weights produced by the content-based system or by the head in the previous time step. More precisely, if gtg_t is equal to 0, we ignore the content-based system, and if it is 1, we ignore the previous head weight. After this procedure, shift weighting sts_t is applied. The simplest way to define the shift weighting is to use a softmax distribution. The rotation applied to the gated weight is written in the following formula:

w_t^{sim} leftarrow sum_{j=i}^{N-1} w_t^g(j) s_t(i-j)

If weights aren't sharp, this convolution procedure can lead to leakage or dispersion. In order to solve this problem, each head produces an additional scalar γt1\gamma_t \ge 1 which makes sure to sharpen those weights:

wt(i)wt(i)γtjwt(j)γtw_t(i) \leftarrow \frac{w_t^{\sim}(i)^{\gamma_t}}{\sum_j w_t^{\sim}(j)^{\gamma_t}}

0

1

Updated 2026-06-07

Tags

Data Science