Multi dimensional LSTM as described in Alex Graves' Paper https://arxiv.org/pdf/0705.2011.pdf
Tensorflow Implementation of the model described in Alex Graves' paper https://arxiv.org/pdf/0705.2011.pdf.
Example: 2D LSTM Architecture
Basically a LSTM that is multidirectional, for example, that can operate on a 2D grid. Here's a figure describing the way it works:
Example: 2D LSTM Architecture
git clone [email protected]:philipperemy/tensorflow-multi-dimensional-lstm.git
cd tensorflow-multi-dimensional-lstm
# create a new virtual python environment
virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt
# usage: trainer.py [-h] --model_type {MD_LSTM,HORIZONTAL_SD_LSTM,SNAKE_SD_LSTM}
python trainer.py --model_type MD_LSTM
python trainer.py --model_type HORIZONTAL_SD_LSTM
python trainer.py --model_type SNAKE_SD_LSTM
The random diagonal task consists in initializing a matrix with values very close to 0 except two which are set to 1. Those two values are on a straight line parallel to the diagonal of the matrix. The idea is to predict where those two values are. Here are some examples:
____________
| |
|x |
| x |
| |
|__________|
____________
| |
| |
| x |
| x |
|__________|
____________
| |
| x |
| x |
| |
|__________|
A model performing on this task is considered as successful if it can correctly predict the second x (it's impossible to predict the first x).
HORIZONTAL_SD_LSTM
. It should perform the worst.SNAKE_SD_LSTM
).After training on this task for 8x8 matrices, the losses look like this:
Overall loss of the random diagonal task (loss applied on all the elements of the inputs)
Overall loss of the random diagonal task (loss applied only on the location of the second x)
No surprise that MD LSTM performs the best here. It has direct connections between the grid cell that contains the first x and the second x (2 connections). The snake LSTM has width+1 = 9 steps between the two x. As expected, the vertical LSTM does not learn anything apart from outputting values very close to 0.
MD LSTM predictions (left) and ground truth (right) before training (predictions are all random).
MD LSTM predictions (left) and ground truth (right) after training. As expected, the MD LSTM can only predict the second x and not the first one. That means the task is correctly predicted.
inputs += eps
in case.Welcome!