Bidirectional LSTM + CRF (Conditional Random Fields) in Tensorflow
The notebook bi-lstm-crf-tensorflow.ipynb
contains an example of a Bidirectional LSTM + CRF (Conditional Random Fields) model in Tensorflow.
I tried to keep the problem and implementation as simple as possible so anyone can understand and change the model to meet their own problem and data.
And to make it more realistic, the inputs have variable sequence lengths.
We will define a simple sequence classification problem to explore bidirectional LSTMs + CRF.
The problem is defined as a sequence of random values between 0 and 1.
A binary label (0 or 1) is associated with each input. Initially, the output values are all 0. Once the cumulative sum of the input values in the sequence exceeds a threshold, then the output value flips from 0 to 1.
A threshold of 1/4 the sequence length is used.
For example, below is a sequence of 10 input timesteps (X):
0.63144003 0.29414551 0.91587952 0.95189228 0.32195638 0.60742236 0.83895793 0.18023048 0.84762691 0.29165514
In this case the threshold is 2.5
and the corresponding classification output (y) would be:
0 0 0 1 1 1 1 1 1 1
Both bidirectional_dynamic_rnn
and crf_log_likelihood
use the optional sequence_length
parameter.
This parameter holds the real sequence lengths
of the inputs (without the padding) and, when running the model, TensorFlow will return zero vectors for states and outputs after these sequence lengths.
Therefore, weights will not get trained on the padding information.
*Obs.: The padding is necessary to use batches in Tensorflow, in order to speed up the computations.