Solving an NLP Problem with Keras, Part 2

In this two-part post series, we are solving a Natural Language Processing (NLP) problem with Keras. In Part 1, we covered the problem and the ATIS dataset we are using. We also went over the word embeddings (mapping words to a vector) along with Recurrent Neural Networks that solve complicated word tagging problems. We passed the word embedding sequence as input into the RNN and we then started coding that up. Now, it is time in this post to start loading the data.

Loading Data

Let's load the data using data.load.atisfull(). It will download the data the first time it is run. Words and labels are encoded as indexes to a vocabulary. This vocabulary is stored in w2idx and labels2idx.

import numpy as np
import data.load

train_set, valid_set, dicts = data.load.atisfull()
w2idx, labels2idx = dicts['words2idx'], dicts['labels2idx']

train_x, _, train_label = train_set
val_x, _, val_label = valid_set

# Create index to word/label dicts
idx2w  = {w2idx[k]:k for k in w2idx}
idx2la = {labels2idx[k]:k for k in labels2idx}

# For conlleval script
words_train = [ list(map(lambda x: idx2w[x], w)) for w in train_x]
labels_train = [ list(map(lambda x: idx2la[x], y)) for y in train_label]
words_val = [ list(map(lambda x: idx2w[x], w)) for w in val_x]
labels_val = [ list(map(lambda x: idx2la[x], y)) for y in val_label]

n_classes = len(idx2la)
n_vocab = len(idx2w)

Let's print an example sentence and label:

print("Example sentence : {}".format(words_train[0]))
print("Encoded form: {}".format(train_x[0]))
print()
print("It's label : {}".format(labels_train[0]))
print("Encoded form: {}".format(train_label[0]))

Here is the output:

Example sentence : ['i', 'want', 'to', 'fly', 'from', 'boston', 'at', 'DIGITDIGITDIGIT', 'am', 'and', 'arrive', 'in', 'denver', 'at', 'DIGITDIGITDIGITDIGIT', 'in', 'the', 'morning']
Encoded form: [232 542 502 196 208  77  62  10  35  40  58 234 137  62  11 234 481 321]

It's label : ['O', 'O', 'O', 'O', 'O', 'B-fromloc.city_name', 'O', 'B-depart_time.time', 'I-depart_time.time', 'O', 'O', 'O', 'B-toloc.city_name', 'O', 'B-arrive_time.time', 'O', 'O', 'B-arrive_time.period_of_day']
Encoded form: [126 126 126 126 126  48 126  35  99 126 126 126  78 126  14 126 126  12]

Keras model

Next, we define the Keras model. Keras has an inbuilt Embedding layer for word embeddings. It expects integer indices. SimpleRNN is the recurrent neural network layer described in Part 1. We will have to use TimeDistributed to pass the output of RNN

O_t

At each time step:

To a fully connected layer. Otherwise, the output at the final time step will be passed on to the next layer.

from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import SimpleRNN
from keras.layers.core import Dense, Dropout
from keras.layers.wrappers import TimeDistributed
from keras.layers import Convolution1D

model = Sequential()
model.add(Embedding(n_vocab,100))
model.add(Dropout(0.25))
model.add(SimpleRNN(100,return_sequences=True))
model.add(TimeDistributed(Dense(n_classes, activation='softmax')))
model.compile('rmsprop', 'categorical_crossentropy')

Training

Now, let's start training our model. We will pass each sentence as a batch to the model. We cannot use model.fit() because it expects all of the sentences to be the same size. We will therefore use model.train_on_batch(). Training is very fast, since the dataset is relatively small. Each epoch takes 20 seconds on my Macbook Air.

import progressbar
n_epochs = 30

for i in range(n_epochs):
print("Training epoch {}".format(i))

    bar = progressbar.ProgressBar(max_value=len(train_x))
for n_batch, sent in bar(enumerate(train_x)):
        label = train_label[n_batch]
# Make labels one hot
        label = np.eye(n_classes)[label][np.newaxis,:] 
# View each sentence as a batch
        sent = sent[np.newaxis,:]

if sent.shape[1] >1: #ignore 1 word sentences
            model.train_on_batch(sent, label)

Evaluation

To measure the accuracy of the model, we use model.predict_on_batch() and metrics.accuracy.conlleval().

from metrics.accuracy import conlleval

labels_pred_val = []

bar = progressbar.ProgressBar(max_value=len(val_x))
for n_batch, sent in bar(enumerate(val_x)):
    label = val_label[n_batch]
    label = np.eye(n_classes)[label][np.newaxis,:]
    sent = sent[np.newaxis,:]

    pred = model.predict_on_batch(sent)
    pred = np.argmax(pred,-1)[0]
    labels_pred_val.append(pred)

labels_pred_val = [ list(map(lambda x: idx2la[x], y)) 
                                    for y in labels_pred_val]
con_dict = conlleval(labels_pred_val, labels_val, 
                            words_val, 'measure.txt')

print('Precision = {}, Recall = {}, F1 = {}'.format(
            con_dict['r'], con_dict['p'], con_dict['f1']))

With this model, I get a 92.36 F1 Score.

Precision = 92.07, Recall = 92.66, F1 = 92.36

Note that for the sake of brevity, I've not shown the logging part of the code. Loggging losses and accuracies are an important part of coding up an model. An improved model (described in the next section) with logging is at main.py. You can run it as :

$ python main.py

Improvements

One drawback with our current model is that there is no look ahead, that is, output:

o_t

This depends only on the current and previous words, but not on the words next to it. You can imagine clues about the properties of the current word that are also held by the next word.

Lookahead can easily be implemented by having a convolutional layer before RNN and word embeddings:

model = Sequential()
model.add(Embedding(n_vocab,100))
model.add(Convolution1D(128, 5, border_mode='same', activation='relu'))
model.add(Dropout(0.25))
model.add(GRU(100,return_sequences=True))
model.add(TimeDistributed(Dense(n_classes, activation='softmax')))
model.compile('rmsprop', 'categorical_crossentropy')

With this improved model, I get a 94.90F1 Score!

Conclusion

In this two-part post series, you learned about word embeddings and RNNs. We applied these to an NLP problem: ATIS. We also made an improvement to our model.

To improve the model further, you can try using word embeddings learned on a large site like Wikipedia. Also, there are variants of RNNs such as LSTM or GRU that can be experimented with.

About the author

Sasank Chilamkurthy works at Fractal Analytics. His work involves deep learning on medical images obtained from radiology and pathology. He is mainly interested in computer vision.