Learn Before
Concept
General Attention function
This is similar to dot attention but also adds a learning aspect to it because we first pass the encoder vector through a Dense layer before taking the dot product. In this case attention is also is subject to backpropagation and gradient descent
score(h, h'{t}) = (h'{t})^{T} * W_{a} * h
- encoder state - decoder state
0
1
Updated 2020-10-10
Tags
Data Science