Some Basic Concepts of Theano

February 21, 2018 - 12:00 am

1817

12 min read

In this article by Christopher Bourez, the author of the book Deep Learning with Theano, presents Theano as a compute engine, and the basics for symbolic computing with Theano. Symbolic computing consists in building graphs of operations that will be optimized later on for a specific architecture, using the computation libraries available for this architecture.

(For more resources related to this topic, see here.)

Learn Programming & Development with a Packt Subscription

Although this article might sound far from practical application. Theano may be defined as a library for scientific computing; it has been available since 2007 and is particularly suited for deep learning. Two important features are at the core of any deep learning library: tensor operations, and the capability to run the code on CPU or GPU indifferently. These two features enable us to work with massive amount of multi-dimensional data. Moreover, Theano proposes automatic differentiation, a very useful feature to solve a wider range of numeric optimizations than deep learning problems.

The content of the article covers the following points:

Theano install and loading
Tensors and algebra
Symbolic programming

Need for tensor

Usually, input data is represented with multi-dimensional arrays:

Images have three dimensions: The number of channels, the width and height of the image
Sounds and times series have one dimension: The time length
Natural language sequences can be represented by two dimensional arrays: The time length and the alphabet length or the vocabulary length

In Theano, multi-dimensional arrays are implemented with an abstraction class, named tensor, with many more transformations available than traditional arrays in a computer language like Python.

At each stage of a neural net, computations such as matrix multiplications involve multiple operations on these multi-dimensional arrays.

Classical arrays in programming languages do not have enough built-in functionalities to address well and fastly multi-dimensional computations and manipulations.

Computations on multi-dimensional arrays have known a long history of optimizations, with tons of libraries and hardwares. One of the most important gains in speed has been permitted by the massive parallel architecture of the Graphical Computation Unit (GPU), with computation ability on a large number of cores, from a few hundreds to a few thousands.

Compared to the traditional CPU, for example a quadricore, 12-core or 32-core engine, the gain with GPU can range from a 5x to a 100x times speedup, even if part of the code is still being executed on the CPU (data loading, GPU piloting, result outputing). The main bottleneck with the use of GPU is usually the transfer of data between the memory of the CPU and the memory of the GPU, but still, when well programmed, the use of GPU helps bring a significant increase in speed of an order of magnitude. Getting results in days rather than months, or hours rather than days, is an undeniable benefit for experimentation.

Theano engine has been designed to address these two challenges of multi-dimensional array and architecture abstraction from the beginning.

There is another undeniable benefit of Theano for scientific computation: the automatic differentiation of functions of multi-dimensional arrays, a well-suited feature for model parameter inference via objective function minimization. Such a feature facilitates the experimentation by releasing the pain to compute derivatives, which might not be so complicated, but prone to many errors.

Installing and loading Theano

Conda package and environment manager

The easiest way to install Theano is to use conda, a cross-platform package and environment manager.

If conda is not already installed on your operating system, the fastest way to install conda is to download the miniconda installer from https://conda.io/miniconda.html. For example, for conda under Linux 64 bit and Python 2.7:

wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh

chmod +x Miniconda2-latest-Linux-x86_64.sh

bash ./Miniconda2-latest-Linux-x86_64.sh

Conda enables to create new environments in which versions of Python (2 or 3) and the installed packages may differ. The conda root environment uses the same version of Python as the version installed on your system on which you installed conda.

Install and run Theano on CPU

Last, let’s install Theano:

conda install theano

Run a python session and try the following commands to check your configuration:

>>> from theano import theano

>>> theano.config.device

‘cpu’

>>> theano.config.floatX

‘float64’

>>> print(theano.config)

The last command prints all the configuration of Theano. The theano.config object contains keys to many configuration options.

To infer the configuration options, Theano looks first at ~/.theanorc file, then at any environment variables available, which override the former options, last at the variable set in the code, that are first in the order of precedence:

>>> theano.config.floatX=’float32′

Some of the properties might be read-only and cannot be changed in the code, but floatX property, that sets the default floating point precision for floats, is among properties that can be changed directly in the code.

It is advised to use float32 since GPU have a long history without float64, float64 execution speed on GPU is slower, sometimes much slower (2x to 32x on latest generation Pascal hardware), and that float32 precision is enough in practice.

GPU drivers and libraries

Theano enables the use of GPU (graphic computation units), the units usually used to compute the graphics to display on the computer screen.

To have Theano work on the GPU as well, a GPU backend library is required on your system.

CUDA library (for NVIDIA GPU cards only) is the main choice for GPU computations. There exists also the OpenCL standard, which is opensource, but far less developed, and much more experimental and rudimentary on Theano.

Most of the scientific computations still occur on NVIDIA cards today.

If you have a NVIDIA GPU card, download CUDA from the NVIDIA website at https://developer.nvidia.com/cuda-downloads and install it. The installer will install the lastest version of the gpu drivers first if they are not already installed. It will install the CUDA library in /usr/local/cuda directory.

Install the cuDNN library, a library by NVIDIA also, that offers faster implementations of some operations for the GPU To install it, I usually copy /usr/local/cuda directory to a new directory /usr/local/cuda-{CUDA_VERSION}-cudnn-{CUDNN_VERSION} so that I can choose the version of CUDA and cuDNN, depending on the deep learning technology I use, and its compatibility.

In your .bashrc profile, add the following line to set $PATH and $LD_LIBRARY_PATH variables:

export PATH=/usr/local/cuda-8.0-cudnn-5.1/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda-8.0-cudnn-5.1/lib64::/usr/local/cuda-8.0-cudnn-5.1/lib:$LD_LIBRARY_PATH

Install and run Theano on GPU

N-dimensional GPU arrays have been implemented in Python under 6 different GPU library (Theano/CudaNdarray,PyCUDA/ GPUArray,CUDAMAT/ CUDAMatrix, PYOPENCL/GPUArray, Clyther, Copperhead), are a subset of NumPy.ndarray. Libgpuarray is a backend library to have them in a common interface with the same property.

To install libgpuarray with conda:

conda install pygpu

To run Theano in GPU mode, you need to configure the config.device variable before execution since it is a read-only variable once the code is run. With the environment variable THEANO_FLAGS:

THEANO_FLAGS="device=cuda,floatX=float32" python

>>> import theano

Using cuDNN version 5110 on context None

Mapped name None to device cuda: Tesla K80 (0000:83:00.0)

 

>>> theano.config.device

 

'gpu'

 

>>> theano.config.floatX

 

'float32'

The first return shows that GPU device has been correctly detected, and specifies which GPU it uses.

By default, Theano activates CNMeM, a faster CUDA memory allocator, an initial preallocation can be specified with gpuarra.preallocate option. At the end, my launch command will be:

THEANO_FLAGS="device=cuda,floatX=float32,gpuarray.preallocate=0.8" python

 

>>> from theano import theano

 

Using cuDNN version 5110 on context None

Preallocating 9151/11439 Mb (0.800000) on cuda

Mapped name None to device cuda: Tesla K80 (0000:83:00.0)

The first line confirms that cuDNN is active, the second confirms memory preallocation. The third line gives the default context name (that is None when the flag device=cuda is set) and the model of the GPU used, while the default context name for the CPU will always be cpu.

It is possible to specify a different GPU than the first one, setting the device to cuda0, cuda1,… for multi-GPU computers. It is also possible to run a program on multiple GPU in parallel or in sequence (when the memory of one GPU is not sufficient), in particular when training very deep neural nets. In this case, the context flag contexts=dev0->cuda0;dev1->cuda1;dev2->cuda2;dev3->cuda3 activates multiple GPU instead of one, and designate the context name to each GPU device to be used in the code. For example, on a 4-GPU instance:

THEANO_FLAGS="contexts=dev0->cuda0;dev1->cuda1;dev2->cuda2;dev3->cuda3,floatX=float32,gpuarray.preallocate=0.8" python

>>> import theano

Using cuDNN version 5110 on context None

Preallocating 9177/11471 Mb (0.800000) on cuda0

Mapped name dev0 to device cuda0: Tesla K80 (0000:83:00.0)

Using cuDNN version 5110 on context dev1

Preallocating 9177/11471 Mb (0.800000) on cuda1

Mapped name dev1 to device cuda1: Tesla K80 (0000:84:00.0)

Using cuDNN version 5110 on context dev2

Preallocating 9177/11471 Mb (0.800000) on cuda2

Mapped name dev2 to device cuda2: Tesla K80 (0000:87:00.0)

Using cuDNN version 5110 on context dev3

Preallocating 9177/11471 Mb (0.800000) on cuda3

Mapped name dev3 to device cuda3: Tesla K80 (0000:88:00.0)

To assign computations to a specific GPU in this multi-GPU setting, the names we choose dev0, dev1, dev2, and dev3 have been mapped to each device (cuda0, cuda1, cuda2, cuda3).

This name mapping enables to write codes that are independent of the underlying GPU assignments and libraries (CUDA or other).

To keep the current configuration flags active at every Python session or execution without using environment variables, save your configuration in the ~/.theanorc file as:

[global]

floatX = float32

device = cuda0

[gpuarray]

preallocate = 1

Now, you can simply run python command. You are now all set.

Tensors

In Python, some scientific libraries such as NumPy provide multi-dimensional arrays. Theano doesn’t replace Numpy but works in concert with it. In particular, NumPy is used for the initialization of tensors.

To perform the computation on CPU and GPU indifferently, variables are symbolic and represented by the tensor class, an abstraction, and writing numerical expressions consists in building a computation graph of Variable nodes and Apply nodes. Depending on the platform on which the computation graph will be compiled, tensors are replaced either:

By a TensorType variable, which data has to be on CPU
By a GpuArrayType variable, which data has to be on GPU

That way, the code can be written indifferently of the platform where it will be executed.

Here are a few tensor objects:

Object class	Number of dimensions	Example
theano.tensor.scalar	0-dimensional array	1, 2.5
theano.tensor.vector	1-dimensional array	[0,3,20]
theano.tensor.matrix	2-dimensional array	[[2,3][1,5]]
theano.tensor.tensor3	3-dimensional array	[[[2,3][1,5]],[[1,2],[3,4]]]

Playing with these Theano objects in the Python shell gives a better idea:

>>> import theano.tensor as T

>>> T.scalar()

<TensorType(float32, scalar)>

>>> T.iscalar()

<TensorType(int32, scalar)>

>>> T.fscalar()

<TensorType(float32, scalar)>

>>> T.dscalar()

<TensorType(float64, scalar)>

With a i, l, f, d letter in front of the object name, you initiate a tensor of a given type, integer32, integer64, floats32 or float64. For real-valued (floating point) data, it is advised to use the direct form T.scalar() instead of the f or d variants since the direct form will use your current configuration for floats:

>>> theano.config.floatX = ‘float64’

>>> T.scalar()

<TensorType(float64, scalar)>

>>> T.fscalar()

<TensorType(float32, scalar)>

>>> theano.config.floatX = ‘float32’

>>> T.scalar()

<TensorType(float32, scalar)>

Symbolic variables either:

Play the role of placeholders, as a starting point to build your graph of numerical operations (such as addition, multiplication): they receive the flow of the incoming data during the evaluation, once the graph has been compiled
Represent intermediate or output results

Symbolic variables and operations are both part of a computation graph that will be compiled either towards CPU or GPU for fast execution. Let’s write a first computation graph consisting in a simple addition:

>>> x = T.matrix(‘x’)

>>> y = T.matrix(‘y’)

>>> z = x + y

>>> theano.pp(z)

‘(x + y)’

>>> z.eval({x: [[1, 2], [1, 3]], y: [[1, 0], [3, 4]]})

array([[ 2., 2.],

[ 4., 7.]], dtype=float32)

At first place, two symbolic variables, or Variable nodes are created, with names x and y, and an addition operation, an Apply node, is applied between both of them, to create a new symbolic variable, z, in the computation graph.

The pretty print function pp prints the expression represented by Theano symbolic variables. Eval evaluates the value of the output variable z, when the first two variables x and y are initialized with two numerical 2-dimensional arrays.

The following example explicit the difference between the variables x and y, and their names x and y:

>>> a = T.matrix()

>>> b = T.matrix()

>>> theano.pp(a + b)

‘(<TensorType(float32, matrix)> + <TensorType(float32, matrix)>)’

Without names, it is more complicated to trace the nodes in a large graph. When printing the computation graph, names significantly helps diagnose problems, while variables are only used to handle the objects in the graph:

>>> x = T.matrix(‘x’)

>>> x = x + x

>>> theano.pp(x)

‘(x + x)’

Here the original symbolic variable, named x, does not change and stays part of the computation graph. x + x creates a new symbolic variable we assign to the Python variable x.

Note also, that with names, the plural form initializes multiple tensors at the same time:

>>> x, y, z = T.matrices(‘x’, ‘y’, ‘z’)

Now, let’s have a look at the different functions to display the graph.

Summary

Thus, this article helps us to give a brief idea on how to download and install Theano on various platforms along with the packages such as NumPy and SciPy.

Resources for Article:

Further resources on this subject:

Introduction to Deep Learning [article]
Getting Started with Deep Learning [article]
Practical Applications of Deep Learning [article]

Some Basic Concepts of Theano

Need for tensor

Installing and loading Theano

Conda package and environment manager

Install and run Theano on CPU

GPU drivers and libraries

Install and run Theano on GPU

Tensors

Summary

Resources for Article:

NO COMMENTS

LEAVE A REPLY Cancel reply