How to set up a Deep Learning System on Amazon Web Services (AWS)

[box type="note" align="" class="" width=""]This article is an excerpt from the book, Deep Learning Essentials written by Wei Di, Anurag Bhardwaj, and Jianing Wei. This book covers popular Python libraries such as Tensorflow, Keras, and more, along with tips to train, deploy and optimize deep learning models in the best possible manner.[/box]

Today, we will learn two different methods of setting up a deep learning system using Amazon Web Services (AWS).

Setup from scratch

We will illustrate how to set up a deep learning environment on an AWS EC2 GPU instance g2.2xlarge running Ubuntu Server 16.04 LTS. For this example, we will use a pre-baked Amazon Machine Image (AMI) which already has a number of software packages installed—making it easier to set up an end-end deep learning system. We will use a publicly available AMI Image ami-b03ffedf, which has following pre-installed Packages:

CUDA 8.0
Anaconda 4.20 with Python 3.0
Keras / Theano

The first step to setting up the system is to set up an AWS account and spin a new EC2 GPU instance using the AWS web console as (http://console.aws.amazon.com/) shown in figure Choose EC2 AMI:

how-to-set-up-a-deep-learning-system-on-amazon-web-services-aws-img-0

2. We pick a g2.2xlarge instance type from the next page as shown in figure Choose instance type:

how-to-set-up-a-deep-learning-system-on-amazon-web-services-aws-img-1

3. After adding a 30 GB of storage as shown in figure Choose storage, we now launch a cluster and assign an EC2 key pair that can allow us to ssh and log in to the box using the provided key pair file:

how-to-set-up-a-deep-learning-system-on-amazon-web-services-aws-img-2

4. Once the EC2 box is launched, next step is to install relevant software packages.To ensure proper GPU utilization, it is important to ensure graphics drivers are installed first. We will upgrade and install NVIDIA drivers as follows:

$ sudo add-apt-repository ppa:graphics-drivers/ppa -y
$ sudo apt-get update
$ sudo apt-get install -y nvidia-375 nvidia-settings

While NVIDIA drivers ensure that host GPU can now be utilized by any deep learning application, it does not provide an easy interface to application developers for easy programming on the device.

Various different software libraries exist today that help achieve this task reliably. Open Computing Language (OpenCL) and CUDA are more commonly used in industry. In this book, we use CUDA as an application programming interface for accessing NVIDIA graphics drivers. To install CUDA driver, we first SSH into the EC2 instance and download CUDA 8.0 to our $HOME folder and install from there:

$ wget

https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-r

epo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb

$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb

$ sudo apt-get update

$ sudo apt-get install -y cuda nvidia-cuda-toolkit

Once the installation is finished, you can run the following command to validate the installation:

$ nvidia-smi

Now your EC2 box is fully configured to be used for a deep learning development. However, for someone who is not very familiar with deep learning implementation details, building a deep learning system from scratch can be a daunting task.

To ease this development, a number of advanced deep learning software frameworks exist, such as Keras and Theano. Both of these frameworks are based on a Python development environment, hence we first install a Python distribution on the box, such as Anaconda:

$ wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh

$ bash Anaconda3-4.2.0-Linux-x86_64.sh

Finally, Keras and Theanos are installed using Python’s package manager pip:

$ pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git

$ pip install keras

Once the pip installation is completed successfully, the box is now fully set up for a deep learning development.

Setup using Docker

The previous section describes getting started from scratch which can be tricky sometimes given continuous changes to software packages and changing links on the web. One way to avoid dependence on links is to use container technology like Docker.

In this chapter, we will use the official NVIDIA-Docker image that comes pre-packaged with all the necessary packages and deep learning framework to get you quickly started with deep learning application development:

$ sudo add-apt-repository ppa:graphics-drivers/ppa -y

$ sudo apt-get update

$ sudo apt-get install -y nvidia-375 nvidia-settings nvidia-modprobe

We now install Docker Community Edition as follows:

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo
apt-key add -

# Verify that the key fingerprint is 9DC8 5822 9FC7 DD38 854A E2D8

8D81 803C 0EBF CD88

$ sudo apt-key fingerprint 0EBFCD88

$ sudo add-apt-repository 

"deb [arch=amd64] https://download.docker.com/linux/ubuntu 

$(lsb_release -cs) 

Stable"

$ sudo apt-get update

$ sudo apt-get install -y docker-ce

2. We then install NVIDIA-Docker and its plugin:

$ wget -P /tmp

https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nv

Idia-docker_1.0.1-1_amd64.deb

$ sudo dpkg -i /tmp/nvidia-docker_1.0.1-1_amd64.deb && rm

/tmp/nvidia-docker_1.0.1-1_amd64.deb

3. To validate if the installation happened correctly, we use the following command:

$ sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

4. Once it’s setup correctly, we can use the official TensorFlow or Theano Docker Image:

$ sudo nvidia-docker run -it tensorflow/tensorflow:latest-gpu bash

5. We can run a simple Python program to check if TensorFlow works properly:

import tensorflow as tf

a = tf.constant(5, tf.float32)

b = tf.constant(5, tf.float32)

with tf.Session() as sess:

sess.run(tf.add(a, b)) # output is 10.0

print("Output of graph computation is = ",output)

You should see the TensorFlow output on the screen now as shown in figure Tensorflow sample output:

how-to-set-up-a-deep-learning-system-on-amazon-web-services-aws-img-3