Solving an NLP Problem with Keras, Part 1

In a previous two-part post series on Keras, I introduced Convolutional Neural Networks(CNNs) and the Keras deep learning framework. We used them to solve a Computer Vision (CV) problem involving traffic sign recognition. Now, in this two-part post series, we will solve a Natural Language Processing (NLP) problem with Keras. Let’s begin.

The Problem and the Dataset

The problem we are going to tackle is Natural Language Understanding. The aim is to extract the meaning of speech utterances. This is still an unsolved problem. Therefore, we can break this problem into a solvable practical problem of understanding the speaker in a limited context. In particular, we want to identify the intent of a speaker asking for information about flights.

The dataset we are going to use is Airline Travel Information System (ATIS). This dataset was collected by DARPA in the early 90s. ATIS consists of spoken queries on flight related information. An example utterance is I want to go from Boston to Atlanta on Monday. Understanding this is then reduced to identifying arguments like Destination and Departure Day. This task is called slot-filling.

Here is an example sentence and its labels. You will observe that labels are encoded in an Inside Outside Beginning (IOB) representation. Let’s look at the dataset:

The ATIS official split contains 4,978/893 sentences for a total of 56,590/9,198 words (average sentence length is 15) in the train/test set. The number of classes (different slots) is 128, including the O label (NULL). Unseen words in the test set are encoded by the <UNK> token, and each digit is replaced with string DIGIT;that is,20 is converted to DIGITDIGIT.

Our approach to the problem is to use:

Word embeddings
Recurrent neural networks

I'll talk about these briefly in the following sections.

Word Embeddings

Word embeddings map words to a vector in a high-dimensional space. These word embeddings can actually learn the semantic and syntactic information of words. For instance, they can understand that similar words are close to each other in this space and dissimilar words are far apart.

This can be learned either using large amounts of text like Wikipedia, or specifically for a given problem. We will take the second approach for this problem.

As an illustation, I have shown here the nearest neighbors in the word embedding space for some of the words. This embedding space was learned by the model that we’ll define later in the post:

sunday	delta	california	boston	august	time	car
wednesday	continental	colorado	nashville	september	schedule	rental
saturday	united	florida	toronto	july	times	limousine
friday	american	ohio	chicago	june	schedules	rentals
monday	eastern	georgia	phoenix	december	dinnertime	cars
tuesday	northwest Unlock access to the largest independent learning library in Tech for FREE! Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of. Renews at $19.99/month. Cancel anytime	pennsylvania	cleveland	november	ord	taxi
thursday	us	north	atlanta	april	f28	train
wednesdays	nationair	tennessee	milwaukee	october	limo	limo
saturdays	lufthansa	minnesota	columbus	january	departure	ap
sundays	midwest	michigan	minneapolis	may	sfo	later

Recurrent Neural Networks

Convolutional layers can be a great way to pool local information, but they do not really capture the sequentiality of data. Recurrent Neural Networks (RNNs) help us tackle sequential information like natural language.

If we are going to predict properties of the current word, we better remember the words before it too. An RNN has such an internal state/memory that stores the summary of the sequence it has seen so far. This allows us to use RNNs to solve complicated word tagging problems such as Part Of Speech (POS) tagging or slot filling, as in our case.

The following diagram illustrates the internals of RNN:

solving-nlp-problem-keras-part-1-img-0

Source: Nature RNN

Let's briefly go through the diagram:

Is the input to the RNN. x_1,x_2,...,x_(t-1),x_t,x_(t+1)...
Is the hidden state of the RNN at the step. s_t
This is computed based on the state at the step. t-1
As s_t=f(Ux_t+W_s(t-1))
Here f is a nonlinearity such astanh or ReLU. o_t
Is the output at the step. t
Computed as:o_t=f(V_st)
U,V,W
Are the learnable parameters of RNN.

For our problem, we will pass a word embeddings’ sequence as the input to the RNN.

Putting it all together

Now that we've setup the problem and have an understanding of the basic blocks, let's code it up.

Since we are using the IOB representation for labels, it's not simpleto calculate the scores of our model. We therefore use the conlleval perl script to compute the F1 Scores. I've adapted the code from here for the data preprocessing and score calculation. The complete code is available at GitHub:

$ git clone https://github.com/chsasank/ATIS.keras.git
$ cd ATIS.keras

I recommend using jupyter notebook to run and experiment with the snippets from the tutorial.

$ jupyter notebook

Conclusion

In part 2, we will load the data using data.load.atisfull(). We will also define the Keras model, and then we will train the model. To measure the accuracy of the model, we’ll use model.predict_on_batch() and metrics.accuracy.conlleval(). And finally, we will improve our model to achieve better results.

About the author

Sasank Chilamkurthy works at Fractal Analytics. His work involves deep learning
on medical images obtained from radiology and pathology. He is mainly
interested in computer vision.