Implement Long-short Term Memory (LSTM) with TensorFlow

[box type="note" align="" class="" width=""]This article is an excerpt from the book, Deep Learning Essentials written by Wei Di, Anurag Bhardwaj, and Jianing Wei. This book will help you get started with the essentials of deep learning and neural network modeling.[/box]

In today’s tutorial, we will look at an example of using LSTM in TensorFlow to perform sentiment classification.

The input to LSTM will be a sentence or sequence of words. The output of LSTM will be a binary value indicating a positive sentiment with 1 and a negative sentiment with 0. We will use a many-to-one LSTM architecture for this problem since it maps multiple inputs onto a single output.

Figure LSTM: Basic cell architecture shows this architecture in more detail. As shown here, the input takes a sequence of word tokens (in this case, a sequence of three words). Each word token is input at a new time step and is input to the hidden state for the corresponding time step.

For example, the word Book is input at time step t and is fed to the hidden state ht: Sentiment analysis:

implement-long-short-term-memory-lstm-tensorflow-img-0

To implement this model in TensorFlow, we need to first define a few variables as follows:

batch_size = 4

lstm_units = 16

num_classes = 2

max_sequence_length = 4

embedding_dimension = 64

num_iterations = 1000

As shown previously, batch_size dictates how many sequences of tokens we can input in one batch for training. lstm_units represents the total number of LSTM cells in the network. max_sequence_length represents the maximum possible length of a given sequence. Once defined, we now proceed to initialize TensorFlow-specific data structures for input data as follows:

import tensorflow as tf

labels = tf.placeholder(tf.float32, [batch_size, num_classes])

raw_data = tf.placeholder(tf.int32, [batch_size, max_sequence_length])

Given we are working with word tokens, we would like to represent them using a good feature representation technique. Let us assume the word embedding representation takes a word token and projects it onto an embedding space of dimension, embedding_dimension. The two-dimensional input data containing raw word tokens is now transformed into a three-dimensional word tensor with the added dimension representing the word embedding. We also use pre-computed word embedding, stored in a word_vectors data structure. We initialize the data structures as follows:

data = tf.Variable(tf.zeros([batch_size, max_sequence_length,

embedding_dimension]),dtype=tf.float32)

data = tf.nn.embedding_lookup(word_vectors,raw_data)

Now that the input data is ready, we look at defining the LSTM model. As shown previously, we need to create lstm_units of a basic LSTM cell. Since we need to perform a classification at the end, we wrap the LSTM unit with a dropout wrapper. To perform a full temporal pass of the data on the defined network, we unroll the LSTM using a dynamic_rnn routine of TensorFlow. We also initialize a random weight matrix and a constant value of 0.1 as the bias vector, as follows:

weight = tf.Variable(tf.truncated_normal([lstm_units, num_classes]))

bias = tf.Variable(tf.constant(0.1, shape=[num_classes]))

lstm_cell = tf.contrib.rnn.BasicLSTMCell(lstm_units)

wrapped_lstm_cell = tf.contrib.rnn.DropoutWrapper(cell=lstm_cell,

output_keep_prob=0.8)

output, state = tf.nn.dynamic_rnn(wrapped_lstm_cell, data,

dtype=tf.float32)

Once the output is generated by the dynamic unrolled RNN, we transpose its shape, multiply it by the weight vector, and add a bias vector to it to compute the final prediction value:

output = tf.transpose(output, [1, 0, 2])

last = tf.gather(output, int(output.get_shape()[0]) - 1)

prediction = (tf.matmul(last, weight) + bias)

weight = tf.cast(weight, tf.float64)

last = tf.cast(last, tf.float64)

bias = tf.cast(bias, tf.float64)

Since the initial prediction needs to be refined, we define an objective function with crossentropy to minimize the loss as follows:

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits

(logits=prediction, labels=labels))

optimizer = tf.train.AdamOptimizer().minimize(loss)

After this sequence of steps, we have a trained, end-to-end LSTM network for sentiment classification of arbitrary length sentences.

To summarize, we saw how effectively we can implement LSTM network using TensorFlow.

If you are interested to know more, check out this book Deep Learning Essentials which will help you take first steps in training efficient deep learning models and apply them in various practical scenarios.