Intelligent mobile projects with TensorFlow: Build a basic Raspberry Pi robot that listens, moves, sees, and speaks [Tutorial]

0
7862
13 min read

According to Wikipedia, “The Raspberry Pi is a series of small single-board computers developed in the United Kingdom by the Raspberry Pi Foundation to promote the teaching of basic computer science in schools and in developing countries.” The official site of Raspberry Pi describes it as “a small and affordable computer that you can use to learn programming.” If you have never heard of or used Raspberry Pi before, just go its website and chances are you’ll quickly fall in love with the cool little thing. Little yet powerful—in fact, developers of TensorFlow made TensorFlow available on Raspberry Pi from early versions around mid-2016, so we can run complicated TensorFlow models on the tiny computer that you can buy for about $35.

In this article we will see how to set up TensorFlow on Raspberry Pi and use the TensorFlow image recognition and audio recognition models, along with text to speech and robot movement APIs, to build a Raspberry Pi robot that can move, see, listen, and speak.

This tutorial is an excerpt from a book written by Jeff Tang titled Intelligent Mobile Projects with TensorFlow.

Setting up TensorFlow on Raspberry Pi

To use TensorFlow in Python, we can install the TensorFlow 1.6 nightly build for Pi at the TensorFlow Jenkins continuous integrate site (http://ci.tensorflow.org/view/Nightly/job/nightly-pi/223/artifact/output-artifacts):

sudo pip install http://ci.tensorflow.org/view/Nightly/job/nightly-pi/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.6.0-cp27-none-any.whl

This method is quite common. A more complicated method is to use the makefile, required when you need to build and use the TensorFlow library. The Raspberry Pi section of the official TensorFlow makefile documentation (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/makefile) has detailed steps to build the TensorFlow library, but it may not work with every release of TensorFlow. The steps there work perfectly with an earlier version of TensorFlow (0.10), but would cause many “undefined reference to google::protobuf” errors with the TensorFlow 1.6.


The following steps have been tested with the TensorFlow 1.6 release, downloadable at https://github.com/tensorflow/tensorflow/releases/tag/v1.6.0; you can certainly try a newer version in the TensorFlow releases page, or clone the latest TensorFlow source by git clone https://github.com/tensorflow/tensorflow, and fix any possible hiccups.

After cd to your TensorFlow source root, we run the following commands:

tensorflow/contrib/makefile/download_dependencies.sh
sudo apt-get install -y autoconf automake libtool gcc-4.8 g++-4.8
cd tensorflow/contrib/makefile/downloads/protobuf/
./autogen.sh
./configure
make CXX=g++-4.8
sudo make install
sudo ldconfig # refresh shared library cache
cd ../../../../..
export HOST_NSYNC_LIB=`tensorflow/contrib/makefile/compile_nsync.sh`
export TARGET_NSYNC_LIB="$HOST_NSYNC_LIB"

Make sure you run make CXX=g++-4.8, instead of just make, as documented in the official TensorFlow Makefile documentation, because Protobuf must be compiled with the same gcc version as that used for building the following TensorFlow library, in order to fix those “undefined reference to google::protobuf” errors. Now try to build the TensorFlow library using the following command:

make -f tensorflow/contrib/makefile/Makefile HOST_OS=PI TARGET=PI \
 OPTFLAGS="-Os -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize" CXX=g++-4.8

After a few hours of building, you’ll likely get an error such as “virtual memory exhausted: Cannot allocate memory” or the Pi board will just freeze due to running out of memory. To fix this, we need to set up a swap, because without the swap, when an application runs out of the memory, the application will get killed due to a kernel panic. There are two ways to set up a swap: swap file and swap partition. Raspbian uses a default swap file of 100 MB on the SD card, as shown here using the free command:

[email protected]:~/tensorflow-1.6.0 $ free -h
total used free shared buff/cache available
Mem: 927M 45M 843M 660K 38M 838M
Swap: 99M 74M 25M

To improve the swap file size to 1 GB, modify the /etc/dphys-swapfile file via sudo vi /etc/dphys-swapfile, changing CONF_SWAPSIZE=100 to CONF_SWAPSIZE=1024, then restart the swap file service:

sudo /etc/init.d/dphys-swapfile stop 
sudo /etc/init.d/dphys-swapfile start

After this, free -h will show the Swap total to be 1.0 GB.

A swap partition is created on a separate USB disk and is preferred because a swap partition can’t get fragmented but a swap file on the SD card can get fragmented easily, causing slower access. To set up a swap partition, plug a USB stick with no data you need on it to the Pi board, then run sudo blkid, and you’ll see something like this:

/dev/sda1: LABEL="EFI" UUID="67E3-17ED" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="622fddad-da3c-4a09-b6b3-11233a2ca1f6"
/dev/sda2: UUID="E67F-6EAB" TYPE="vfat" PARTLABEL="NO NAME" PARTUUID="a045107a-9e7f-47c7-9a4b-7400d8d40f8c"

/dev/sda2 is the partition we’ll use as the swap partition. Now unmount and format it to be a swap partition:

sudo umount /dev/sda2
sudo mkswap /dev/sda2
mkswap: /dev/sda2: warning: wiping old swap signature.
Setting up swapspace version 1, size = 29.5 GiB (31671701504 bytes)
no label, UUID=23443cde-9483-4ed7-b151-0e6899eba9de

You’ll see a UUID output in the mkswap command; run sudo vi /etc/fstab, add a line as follows to the fstab file with the UUID value:

UUID=<UUID value> none swap sw,pri=5 0 0

Save and exit the fstab file and then run sudo swapon -a. Now if you run free -h again, you’ll see the Swap total to be close to the USB storage size. We definitely don’t need all that size for swap—in fact, the recommended maximum swap size for the Raspberry Pi 3 board with 1 GB memory is 2 GB, but we’ll leave it as is because we just want to successfully build the TensorFlow library.

With either of the swap setting changes, we can rerun the make command:

make -f tensorflow/contrib/makefile/Makefile HOST_OS=PI TARGET=PI \
 OPTFLAGS="-Os -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize" CXX=g++-4.8

After this completes, the TensorFlow library will be generated as tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a. Now we can build the image classification example using the library.

Image recognition and text to speech

There are two TensorFlow Raspberry Pi example apps (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/pi_examples) located in tensorflow/contrib/pi_examples: label_image and camera. We’ll modify the camera example app to integrate text to speech so the app can speak out its recognized images when moving around. Before we build and test the two apps, we need to install some libraries and download the pre-built TensorFlow Inception model file:

sudo apt-get install -y libjpeg-dev
sudo apt-get install libv4l-dev
curl https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015_stripped.zip -o /tmp/inception_dec_2015_stripped.zip

cd ~/tensorflow-1.6.0
unzip /tmp/inception_dec_2015_stripped.zip -d tensorflow/contrib/pi_examples/label_image/data/

To build the label_image and camera apps, run:

make -f tensorflow/contrib/pi_examples/label_image/Makefile
make -f tensorflow/contrib/pi_examples/camera/Makefile

You may encounter the following error when building the apps:

./tensorflow/core/platform/default/mutex.h:25:22: fatal error: nsync_cv.h: No such file or directory
 #include "nsync_cv.h"
 ^
 compilation terminated.

To fix this, run sudo cp tensorflow/contrib/makefile/downloads/nsync/public/nsync*.h /usr/include.

Then edit the tensorflow/contrib/pi_examples/label_image/Makefile or  tensorflow/contrib/pi_examples/camera/Makefile file, add the following library, and include paths before running the make command again:

-L$(DOWNLOADSDIR)/nsync/builds/default.linux.c++11 \

-lnsync \

To test run the two apps, run the apps directly:

tensorflow/contrib/pi_examples/label_image/gen/bin/label_image
tensorflow/contrib/pi_examples/camera/gen/bin/camera

Take a look at the C++ source code,  tensorflow/contrib/pi_examples/label_image/label_image.cc and tensorflow/contrib/pi_examples/camera/camera.cc, and you’ll see they use the similar C++ code as in our iOS apps in the previous chapters to load the model graph file, prepare input tensor, run the model, and get the output tensor.

By default, the camera example also uses the prebuilt Inception model unzipped in the label_image/data folder. But for your own specific image classification task, you can provide your own model retrained via transfer learning using the --graph parameter when running the two example apps.

In general, voice is a Raspberry Pi robot’s main UI to interact with us. Ideally, we should run a TensorFlow-powered natural-sounding Text-to-Speech (TTS) model such as WaveNet (https://deepmind.com/blog/wavenet-generative-model-raw-audio) or Tacotron (https://github.com/keithito/tacotron), but it’d be beyond the scope of this article to run and deploy such a model. It turns out that we can use a much simpler TTS library called Flite by CMU (http://www.festvox.org/flite), which offers pretty decent TTS, and it takes just one simple command to install it: sudo apt-get install flite. If you want to install the latest version of Flite to hopefully get a better TTS quality, just download the latest Flite source from the link and build it.

To test Flite with our USB speaker, run flite with the -t parameter followed by a double quoted text string such as  flite -t "i recommend the ATM machine". If you don’t like the default voice, you can find other supported voices by running flite -lv, which should return Voices available: kal awb_time kal16 awb rms slt. Then you can specify a voice used for TTS: flite -voice rms -t "i recommend the ATM machine".

To let the camera app speak out the recognized objects, which should be the desired behavior when the Raspberry Pi robot moves around, you can use this simple pipe command:

tensorflow/contrib/pi_examples/camera/gen/bin/camera | xargs -n 1 flite -t

You’ll likely hear too much voice. To fine tune the TTS result of image classification, you can also modify the camera.cc file and add the following code to the PrintTopLabels function before rebuilding the example using make -f tensorflow/contrib/pi_examples/camera/Makefile:

std::string cmd = "flite -voice rms -t \"";
cmd.append(labels[label_index]);
cmd.append("\"");
system(cmd.c_str());

Now that we have completed the image classification and speech synthesis tasks, without using any Cloud APIs, let’s see how we can do audio recognition on Raspberry Pi.

Audio recognition and robot movement

To use the pre-trained audio recognition model in the TensorFlow tutorial (https://www.tensorflow.org/tutorials/audio_recognition), we’ll reuse a listen.py Python script from https://gist.github.com/aallan, and add the GoPiGo API calls to control the robot movement after it recognizes four basic audio commands: “left,” “right,” “go,” and “stop.” The other six commands supported by the pre-trained model—”yes,” “no,” “up,” “down,” “on,” and “off”—don’t apply well in our example.

To run the script, first download the pre-trained audio recognition model from http://download.tensorflow.org/models/speech_commands_v0.01.zip and unzip it to /tmp for example, to the Pi board’s /tmp directory, then run:

python listen.py --graph /tmp/conv_actions_frozen.pb --labels /tmp/conv_actions_labels.txt -I plughw:1,0

Or you can run:

python listen.py --graph /tmp/speech_commands_graph.pb --labels /tmp/conv_actions_labels.txt -I plughw:1,0

Note that plughw value 1,0 should match the card number and device number of your USB microphone, which can be found using the arecord -l command we showed before.

The listen.py script also supports many other parameters. For example, we can use --detection_threshold 0.5 instead of the default detection threshold 0.8.

Let’s now take a quick look at how listen.py works before we add the GoPiGo API calls to make the robot move. listen.py uses Python’s subprocess module and its Popen class to spawn a new process of running the arecord command with appropriate parameters. The Popen class has an stdout attribute that specifies the arecord executed command’s standard output file handle, which can be used to read the recorded audio bytes.

The Python code to load the trained model graph is as follows:

with tf.gfile.FastGFile(filename, 'rb') as f:
  graph_def = tf.GraphDef()
  graph_def.ParseFromString(f.read())
  tf.import_graph_def(graph_def, name='')

A TensorFlow session is created using tf.Session() and after the graph is loaded and session created, the recorded audio buffer gets sent, along with the sample rate, as the input data to the TensorFlow session’s run method, which returns the prediction of the recognition:

run(softmax_tensor, {
 self.input_samples_name_: input_data,
 self.input_rate_name_: self.sample_rate_
 })

Here, softmax_tensor is defined as the TensorFlow graph’s get_tensor_by_name(self.output_name_), and output_name_,  input_samples_name_, and input_rate_name_ are defined as  labels_softmaxdecoded_sample_data:0decoded_sample_data:1, respectively.

On Raspberry Pi, you can choose to run the TensorFlow models on Pi using the TensorFlow Python API directly, or C++ API (as in the label_image and camera examples), although normally you’d still train the models on a more powerful computer. For the complete TensorFlow Python API documentation, see https://www.tensorflow.org/api_docs/python.

To use the GoPiGo Python API to make the robot move based on your voice command, first add the following two lines to listen.py:

import easygopigo3 as gpg
gpg3_obj = gpg.EasyGoPiGo3()

Then add the following code to the end of the def add_data method:

if current_top_score > self.detection_threshold_ and time_since_last_top > self.suppression_ms_:
  self.previous_top_label_ = current_top_label
  self.previous_top_label_time_ = current_time_ms
  is_new_command = True
  logger.info(current_top_label)
if current_top_label=="go":
gpg3_obj.drive_cm(10, False)
elif current_top_label=="left":
gpg3_obj.turn_degrees(-30, False)
elif current_top_label=="right":
gpg3_obj.turn_degrees(30, False)
elif current_top_label=="stop":
gpg3_obj.stop()

Now put your Raspberry Pi robot on the ground, connect to it with ssh from your computer, and run the following script:

python listen.py --graph /tmp/conv_actions_frozen.pb --labels /tmp/conv_actions_labels.txt -I plughw:1,0 --detection_threshold 0.5

You’ll see output like this:

INFO:audio:started recording
INFO:audio:_silence_
INFO:audio:_silence_

Then you can say left, right, stop, go, and stop to see the commands get recognized and the robot moves accordingly:

INFO:audio:left
INFO:audio:_silence_
INFO:audio:_silence_
INFO:audio:right
INFO:audio:_silence_
INFO:audio:stop
INFO:audio:_silence_
INFO:audio:go
INFO:audio:stop

You can run the camera app in a separate Terminal, so while the robot moves around based on your voice commands, it’ll recognize new images it sees and speak out the results. That’s all it takes to build a basic Raspberry Pi robot that listens, moves, sees, and speaks—what the Google I/O 2016 demo does but without using any Cloud APIs. It’s far from a fancy robot that can understand natural human speech, engage in interesting conversations, or perform useful and non-trivial tasks. But powered with pre-trained, retrained, or other powerful TensorFlow models, and using all kinds of sensors, you can certainly add more and more intelligence and physical power to the Pi robot we have built.

Google TensorFlow is used to train all the models deployed and running on mobile devices. This book covers 10 projects on the implementation of all major AI areas on iOS, Android, and Raspberry Pi: computer vision, speech and language processing, and machine learning, including traditional, reinforcement, and deep reinforcement.

If you liked this tutorial and would like to implement projects for major AI areas on iOS, Android, and Raspberry Pi, check out the book Intelligent Mobile Projects with TensorFlow.

Read Next

TensorFlow 2.0 is coming. Here’s what we can expect.

Build and train an RNN chatbot using TensorFlow [Tutorial]

Use TensorFlow and NLP to detect duplicate Quora questions [Tutorial]

LEAVE A REPLY

Please enter your comment!
Please enter your name here