Solving an NLP Problem with Keras, Part 1

0
3683
4 min read

In a previous two-part post series on Keras, I introduced Convolutional Neural Networks(CNNs) and the Keras deep learning framework. We used them to solve a Computer Vision (CV) problem involving traffic sign recognition. Now, in this two-part post series, we will solve a Natural Language Processing (NLP) problem with Keras. Let’s begin.

The Problem and the Dataset

The problem we are going to tackle is Natural Language Understanding. The aim is to extract the meaning of speech utterances. This is still an unsolved problem. Therefore, we can break this problem into a solvable practical problem of understanding the speaker in a limited context. In particular, we want to identify the intent of a speaker asking for information about flights.

The dataset we are going to use is Airline Travel Information System (ATIS). This dataset was collected by DARPA in the early 90s. ATIS consists of spoken queries on flight related information. An example utterance is I want to go from Boston to Atlanta on Monday. Understanding this is then reduced to identifying arguments like Destination and Departure Day. This task is called slot-filling.

Here is an example sentence and its labels. You will observe that labels are encoded in an Inside Outside Beginning (IOB) representation. Let’s look at the dataset:


|Words | Show | flights | from | Boston | to | New | York| today| |Labels| O | O | O |B-dept | O|B-arr|I-arr|B-date|

The ATIS official split contains 4,978/893 sentences for a total of 56,590/9,198 words (average sentence length is 15) in the train/test set. The number of classes (different slots) is 128, including the O label (NULL). Unseen words in the test set are encoded by the <UNK> token, and each digit is replaced with string DIGIT;that is,20 is converted to DIGITDIGIT.

Our approach to the problem is to use:

  • Word embeddings
  • Recurrent neural networks

I’ll talk about these briefly in the following sections.

Word Embeddings

Word embeddings map words to a vector in a high-dimensional space. These word embeddings can actually learn the semantic and syntactic information of words. For instance, they can understand that similar words are close to each other in this space and dissimilar words are far apart.

This can be learned either using large amounts of text like Wikipedia, or specifically for a given problem. We will take the second approach for this problem.

As an illustation, I have shown here the nearest neighbors in the word embedding space for some of the words. This embedding space was learned by the model that we’ll define later in the post:

sunday

delta

california

boston

august

time

car

wednesday

continental

colorado

nashville

september

schedule

rental

saturday

united

florida

toronto

july

times

limousine

friday

american

ohio

chicago

june

schedules

rentals

monday

eastern

georgia

phoenix

december

dinnertime

cars

tuesday

northwest

pennsylvania

cleveland

november

ord

taxi

thursday

us

north

atlanta

april

f28

train

wednesdays

nationair

tennessee

milwaukee

october

limo

limo

saturdays

lufthansa

minnesota

columbus

january

departure

ap

sundays

midwest

michigan

minneapolis

may

sfo

later

Recurrent Neural Networks

Convolutional layers can be a great way to pool local information, but they do not really capture the sequentiality of data. Recurrent Neural Networks (RNNs) help us tackle sequential information like natural language.

If we are going to predict properties of the current word, we better remember the words before it too. An RNN has such an internal state/memory that stores the summary of the sequence it has seen so far. This allows us to use RNNs to solve complicated word tagging problems such as Part Of Speech (POS) tagging or slot filling, as in our case.

The following diagram illustrates the internals of RNN:

 Source: Nature RNN

Let’s briefly go through the diagram:

  • Is the input to the RNN.   x_1,x_2,…,x_(t-1),x_t,x_(t+1)…
  • Is the hidden state of the RNN at the step.  st
  • This is computed based on the state at the step. t-1
  • As st=f(Uxt+Ws(t-1))
  • Here f is a nonlinearity such astanh or ReLU. ot
  • Is the output at the step. t
  • Computed as:ot=f(Vst)
    U,V,W
  • Are the learnable parameters of RNN.

For our problem, we will pass a word embeddings’ sequence as the input to the RNN.

Putting it all together

Now that we’ve setup the problem and have an understanding of the basic blocks, let’s code it up.

Since we are using the IOB representation for labels, it’s not simpleto calculate the scores of our model. We therefore use the conlleval perl script to compute the F1 Scores. I’ve adapted the code from here for the data preprocessing and score calculation. The complete code is available at GitHub:

$ git clone https://github.com/chsasank/ATIS.keras.git
$ cd ATIS.keras

I recommend using jupyter notebook to run and experiment with the snippets from the tutorial.

$ jupyter notebook 

Conclusion

In part 2, we will load the data using data.load.atisfull(). We will also define the Keras model, and then we will train the model. To measure the accuracy of the model, we’ll use model.predict_on_batch() and metrics.accuracy.conlleval(). And finally, we will improve our model to achieve better results.

About the author

Sasank Chilamkurthy works at Fractal Analytics. His work involves deep learning
on medical images obtained from radiology and pathology. He is mainly
interested in computer vision.

LEAVE A REPLY

Please enter your comment!
Please enter your name here