In this two-part post series, we are solving a Natural Language Processing (NLP) problem with Keras. In Part 1, we covered the problem and the ATIS dataset we are using. We also went over the word embeddings (mapping words to a vector) along with Recurrent Neural Networks that solve complicated word tagging problems. We passed the word embedding sequence as input into the RNN and we then started coding that up. Now, it is time in this post to start loading the data.
Loading Data
Let’s load the data using data.load.atisfull(). It will download the data the first time it is run. Words and labels are encoded as indexes to a vocabulary. This vocabulary is stored in w2idx and labels2idx.
import numpy as np
import data.load
train_set, valid_set, dicts = data.load.atisfull()
w2idx, labels2idx = dicts['words2idx'], dicts['labels2idx']
train_x, _, train_label = train_set
val_x, _, val_label = valid_set
# Create index to word/label dicts
idx2w = {w2idx[k]:k for k in w2idx}
idx2la = {labels2idx[k]:k for k in labels2idx}
# For conlleval script
words_train = [ list(map(lambda x: idx2w[x], w)) for w in train_x]
labels_train = [ list(map(lambda x: idx2la[x], y)) for y in train_label]
words_val = [ list(map(lambda x: idx2w[x], w)) for w in val_x]
labels_val = [ list(map(lambda x: idx2la[x], y)) for y in val_label]
n_classes = len(idx2la)
n_vocab = len(idx2w)
Let’s print an example sentence and label:
print("Example sentence : {}".format(words_train[0]))
print("Encoded form: {}".format(train_x[0]))
print()
print("It's label : {}".format(labels_train[0]))
print("Encoded form: {}".format(train_label[0]))
Here is the output:
Example sentence : ['i', 'want', 'to', 'fly', 'from', 'boston', 'at', 'DIGITDIGITDIGIT', 'am', 'and', 'arrive', 'in', 'denver', 'at', 'DIGITDIGITDIGITDIGIT', 'in', 'the', 'morning']
Encoded form: [232 542 502 196 208 77 62 10 35 40 58 234 137 62 11 234 481 321]
It's label : ['O', 'O', 'O', 'O', 'O', 'B-fromloc.city_name', 'O', 'B-depart_time.time', 'I-depart_time.time', 'O', 'O', 'O', 'B-toloc.city_name', 'O', 'B-arrive_time.time', 'O', 'O', 'B-arrive_time.period_of_day']
Encoded form: [126 126 126 126 126 48 126 35 99 126 126 126 78 126 14 126 126 12]
Keras model
Next, we define the Keras model. Keras has an inbuilt Embedding layer for word embeddings. It expects integer indices. SimpleRNN is the recurrent neural network layer described in Part 1. We will have to use TimeDistributed to pass the output of RNN
Ot
At each time step:
t
To a fully connected layer. Otherwise, the output at the final time step will be passed on to the next layer.
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import SimpleRNN
from keras.layers.core import Dense, Dropout
from keras.layers.wrappers import TimeDistributed
from keras.layers import Convolution1D
model = Sequential()
model.add(Embedding(n_vocab,100))
model.add(Dropout(0.25))
model.add(SimpleRNN(100,return_sequences=True))
model.add(TimeDistributed(Dense(n_classes, activation='softmax')))
model.compile('rmsprop', 'categorical_crossentropy')
Training
Now, let’s start training our model. We will pass each sentence as a batch to the model. We cannot use model.fit() because it expects all of the sentences to be the same size. We will therefore use model.train_on_batch(). Training is very fast, since the dataset is relatively small. Each epoch takes 20 seconds on my Macbook Air.
import progressbar
n_epochs = 30
for i in range(n_epochs):
print("Training epoch {}".format(i))
bar = progressbar.ProgressBar(max_value=len(train_x))
for n_batch, sent in bar(enumerate(train_x)):
label = train_label[n_batch]
# Make labels one hot
label = np.eye(n_classes)[label][np.newaxis,:]
# View each sentence as a batch
sent = sent[np.newaxis,:]
if sent.shape[1] >1: #ignore 1 word sentences
model.train_on_batch(sent, label)
Evaluation
To measure the accuracy of the model, we use model.predict_on_batch() and metrics.accuracy.conlleval().
from metrics.accuracy import conlleval
labels_pred_val = []
bar = progressbar.ProgressBar(max_value=len(val_x))
for n_batch, sent in bar(enumerate(val_x)):
label = val_label[n_batch]
label = np.eye(n_classes)[label][np.newaxis,:]
sent = sent[np.newaxis,:]
pred = model.predict_on_batch(sent)
pred = np.argmax(pred,-1)[0]
labels_pred_val.append(pred)
labels_pred_val = [ list(map(lambda x: idx2la[x], y))
for y in labels_pred_val]
con_dict = conlleval(labels_pred_val, labels_val,
words_val, 'measure.txt')
print('Precision = {}, Recall = {}, F1 = {}'.format(
con_dict['r'], con_dict['p'], con_dict['f1']))
With this model, I get a 92.36 F1 Score.
Precision = 92.07, Recall = 92.66, F1 = 92.36
Note that for the sake of brevity, I’ve not shown the logging part of the code. Loggging losses and accuracies are an important part of coding up an model. An improved model (described in the next section) with logging is at main.py. You can run it as :
$ python main.py
Improvements
One drawback with our current model is that there is no look ahead, that is, output:
ot
This depends only on the current and previous words, but not on the words next to it. You can imagine clues about the properties of the current word that are also held by the next word.
Lookahead can easily be implemented by having a convolutional layer before RNN and word embeddings:
model = Sequential()
model.add(Embedding(n_vocab,100))
model.add(Convolution1D(128, 5, border_mode='same', activation='relu'))
model.add(Dropout(0.25))
model.add(GRU(100,return_sequences=True))
model.add(TimeDistributed(Dense(n_classes, activation='softmax')))
model.compile('rmsprop', 'categorical_crossentropy')
With this improved model, I get a 94.90F1 Score!
Conclusion
In this two-part post series, you learned about word embeddings and RNNs. We applied these to an NLP problem: ATIS. We also made an improvement to our model.
To improve the model further, you can try using word embeddings learned on a large site like Wikipedia. Also, there are variants of RNNs such as LSTM or GRU that can be experimented with.
About the author
Sasank Chilamkurthy works at Fractal Analytics. His work involves deep learning on medical images obtained from radiology and pathology. He is mainly interested in computer vision.