Learn Before
Concept

General Attention function

This is similar to dot attention but also adds a learning aspect to it because we first pass the encoder vector through a Dense layer before taking the dot product. In this case attention is also is subject to backpropagation and gradient descent

score(h, h'{t}) = (h'{t})^{T} * W_{a} * h

hh - encoder state hth'_{t} - decoder state

0

1

Updated 2020-10-10

Tags

Data Science