Home Tutorials Python Data Science Up and Running

Tutorials

Python Data Science Up and Running

February 16, 2016 - 12:00 am

1154

9 min read

In this article we will learn how to build NumPy, SciPy, matplotlib, and IPython and using Ipython as shell and then to create a simple application using it.

(For more resources related to this topic, see here.)

Building NumPY, SciPy, matplotlib, and IPython from source

As a last resort or if we want to have the latest code, we can build from source. In practice, it shouldn’t be that hard, although depending on your operating system, you might run into problems. As operating systems and related software are rapidly evolving, in such cases, the best you can do is search online or ask for help. In this chapter, we give pointers on good places to look for help.

The source code can be retrieved with git or as an archive from GitHub. The steps to install NumPy from source are straightforward and given here. We can retrieve the source code for NumPy with git as follows:

$ git clone git://github.com/numpy/numpy.git numpy

There are similar commands for SciPy, matplotlib, and IPython (refer to the table that follows after this piece of information). The IPython source code can be downloaded from https://github.com/ipython/ipython/releases as a source archive or ZIP file. You can then unpack it with your favorite tool or with the following command:
$ tar -xzf ipython.tar.gz

Please refer to the following table for the git commands and source archive/zip links:

Library	Git command	Tarball/zip URL
NumPy	git clone git://github.com/numpy/numpy.git numpy	https://github.com/numpy/numpy/releases
SciPy	git clone http://github.com/scipy/scipy.git scipy	https://github.com/scipy/scipy/releases
matplotlib	git clone git://github.com/matplotlib/matplotlib.git	https://github.com/matplotlib/matplotlib/releases
IPython	git clone –recursive https://github.com/ipython/ipython.git	https://github.com/ipython/ipython/releases

Install on /usr/local with the following command from the source code directory:

$ python setup.py build
$ sudo python setup.py install --prefix=/usr/local

To build, we need a C compiler such as GCC and the Python header files in the python-dev or python-devel package.

Installing with setuptools

If you have setuptools or pip, you can install NumPy, SciPy, matplotlib, and IPython with the following commands. For each library, we give two commands, one for setuptools and one for pip. You only need to choose one command per pair:

$ easy_install numpy
$ pip install numpy

$ easy_install scipy
$ pip install scipy

$ easy_install matplotlib
$ pip install matplotlib

$ easy_install ipython
$ pip install ipython

It may be necessary to prepend sudo to these commands if your current user doesn’t have sufficient rights on your system.

NumPy arrays

After going through the installation of NumPy, it’s time to have a look at NumPy arrays. NumPy arrays are more efficient than Python lists when it comes to numerical operations. NumPy arrays are, in fact, specialized objects with extensive optimizations. NumPy code requires less explicit loops than equivalent Python code. This is based on vectorization.

If we go back to highschool mathematics, then we should remember the concepts of scalars and vectors. The number 2, for instance, is a scalar. When we add 2 to 2, we are performing scalar addition. We can form a vector out of a group of scalars. In Python programming terms, we will then have a one-dimensional array. This concept can, of course, be extended to higher dimensions. Performing an operation on two arrays, such as addition, can be reduced to a group of scalar operations. In straight Python, we will do that with loops going through each element in the first array and adding it to the corresponding element in the second array. However, this is more verbose than the way it is done in mathematics. In mathematics, we treat the addition of two vectors as a single operation. That’s the way NumPy arrays do it too, and there are certain optimizations using low-level C routines, which make these basic operations more efficient.

Simple application

Imagine that we want to add two vectors called a and b. The word vector is used here in the mathematical sense, which means a one-dimensional array. The vector a holds the squares of integers 0 to n; for instance, if n is equal to 3, a contains 0, 1, or 4. The vector b holds the cubes of integers 0 to n, so if n is equal to 3, then the vector b is equal to 0, 1, or 8. How would you do that using plain Python? After we come up with a solution, we will compare it with the NumPy equivalent.

The following function solves the vector addition problem using pure Python without NumPy:

def pythonsum(n):
   a = range(n)
   b = range(n)
   c = []

   for i in range(len(a)):
       a[i] = i ** 2
       b[i] = i ** 3
       c.append(a[i] + b[i])

   return c

The following is a function that solves the vector addition problem with NumPy:

def numpysum(n):
  a = numpy.arange(n) ** 2
  b = numpy.arange(n) ** 3
  c = a + b
  return c

Notice that numpysum() does not need a for loop. Also, we used the arange() function from NumPy, which creates a NumPy array for us with integers from 0 to n. The arange() function was imported; that is why it is prefixed with numpy.

Now comes the fun part. Remember that it was mentioned in the Preface that NumPy is faster when it comes to array operations. How much faster is Numpy, though? The following program will show us by measuring the elapsed time in microseconds for the numpysum() and pythonsum() functions. It also prints the last two elements of the vector sum. Let’s check that we get the same answers using Python and NumPy:

#!/usr/bin/env/python

import sys
from datetime import datetime
import numpy as np

"""
 This program demonstrates vector addition the Python way.
 Run from the command line as follows

  python vectorsum.py n

 where n is an integer that specifies the size of the vectors.

 The first vector to be added contains the squares of 0 up to n.
 The second vector contains the cubes of 0 up to n.
 The program prints the last 2 elements of the sum and the elapsed  time.
"""

def numpysum(n):
   a = np.arange(n) ** 2
   b = np.arange(n) ** 3
   c = a + b

   return c

def pythonsum(n):
   a = range(n)
   b = range(n)
   c = []

   for i in range(len(a)):
       a[i] = i ** 2
       b[i] = i ** 3
       c.append(a[i] + b[i])

   return c

size = int(sys.argv[1])

start = datetime.now()
c = pythonsum(size)
delta = datetime.now() - start
print "The last 2 elements of the sum", c[-2:]
print "PythonSum elapsed time in microseconds", delta.microseconds

start = datetime.now()
c = numpysum(size)
delta = datetime.now() - start
print "The last 2 elements of the sum", c[-2:]
print "NumPySum elapsed time in microseconds", delta.microseconds

The output of the program for 1000, 2000, and 3000 vector elements is as follows:

$ python vectorsum.py 1000
The last 2 elements of the sum [995007996, 998001000]
PythonSum elapsed time in microseconds 707
The last 2 elements of the sum [995007996 998001000]
NumPySum elapsed time in microseconds 171

$ python vectorsum.py 2000
The last 2 elements of the sum [7980015996, 7992002000]
PythonSum elapsed time in microseconds 1420
The last 2 elements of the sum [7980015996 7992002000]
NumPySum elapsed time in microseconds 168

$ python vectorsum.py 4000
The last 2 elements of the sum [63920031996, 63968004000]
PythonSum elapsed time in microseconds 2829
The last 2 elements of the sum [63920031996 63968004000]
NumPySum elapsed time in microseconds 274

Clearly, NumPy is much faster than the equivalent normal Python code. One thing is certain; we get the same results whether we are using NumPy or not. However, the result that is printed differs in representation. Notice that the result from the numpysum() function does not have any commas. How come? Obviously, we are not dealing with a Python list but with a NumPy array.

Using IPython as a shell

Scientists, data analysts, and engineers are used to experimenting. IPython was created by scientists with experimentation in mind. The interactive environment that IPython provides is viewed by many as a direct answer to MATLAB, Mathematica, and Maple.

The following is a list of features of the IPython shell:

Tab completion, which helps you find a command
History mechanism
Inline editing
Ability to call external Python scripts with %run
Access to system commands
The pylab switch
Access to the Python debugger and profiler

The following list describes how to use the IPython shell:

The pylab switch: The pylab switch automatically imports all the Scipy, NumPy, and matplotlib packages. Without this switch, we would have to import these packages ourselves.

All we need to do is enter the following instruction on the command line:

$ ipython -pylab
Type "copyright", "credits" or "license" for more information.

IPython 2.0.0-dev -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra  details.

Welcome to pylab, a matplotlib-based Python environment  [backend: MacOSX].
For more information, type 'help(pylab)'.

In [1]: quit()
The quit() function or Ctrl + D quits the IPython shell.

Saving a session: We might want to be able to go back to our experiments. In IPython, it is easy to save a session for later use, with the following command:

In [1]: %logstart
Activating auto-logging. Current session state plus future  input saved.
Filename       : ipython_log.py
Mode           : rotate
Output logging : False
Raw input log  : False
Timestamping   : False
State          : active

Logging can be switched off as follows:

In [9]: %logoff
Switching logging OFF

Executing system shell command: Execute a system shell command in the default IPython profile by prefixing the command with the ! symbol. For instance, the following input will get the current date:
```
In [1]: !date
```
In fact, any line prefixed with ! is sent to the system shell. Also, we can store the command output as shown here:
```
In [2]: thedate = !date
```
In [3]: thedate
Displaying history: We can show the history of commands with the %hist command, for example:
```
In [1]: a = 2 + 2

In [2]: a
Out[2]: 4

In [3]: %hist
a = 2 + 2
a
```
%hist
This is a common feature in Command Line Interface (CLI) environments. We can also search through the history with the -g switch as follows:
```
In [5]: %hist -g a = 2
```
1: a = 2 + 2
Downloading the example code

You can download the example code files for all the Packt books you have purchased from your account at https://www.packtpub.com. If you purchased this book elsewhere, you can visit https://www.packtpub.com/books/content/support and register to have the files e-mailed directly to you.

We saw a number of so-called magic functions in action. These functions start with the % character. If the magic function is used on a line by itself, the % prefix is optional.

Summary

In this article, we installed NumPy, SciPy, matplotlib, and IPython. We got a vector addition program working and convinced ourselves that NumPy offers superior performance. In addition, we explored the available documentation and online resources.

You can also refer the following books on the similar topics:

Mastering Python Data Analysis: (https://www.packtpub.com/big-data-and-business-intelligence/mastering-python-data-analysis)
Python Data Analysis Cookbook: (https://www.packtpub.com/big-data-and-business-intelligence/python-data-analysis-cookbook)
Getting Started with Python Data Analysis: (https://www.packtpub.com/big-data-and-business-intelligence/getting-started-python-data-analysis)

Resources for Article:

Further resources on this subject:

Python Design Patterns in Depth – The Observer Pattern [article]
Python Design Patterns in Depth: The Factory Pattern [article]
Customizing IPython [article]

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Python Data Science Up and Running

Building NumPY, SciPy, matplotlib, and IPython from source

Installing with setuptools

NumPy arrays

Simple application

Using IPython as a shell

Summary

Resources for Article:

LEAVE A REPLY Cancel reply

Interviews

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring Forms in Angular – types, benefits and differences

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9...

Exploring the Strategy Behavioral Design Pattern in Node.js

Giving material.angular.io a refresh from Angular Blog – Medium

Popular on Packt Hub

How to use arrays, lists, and dictionaries in Unity for 3D...

Customizing Elgg Themes

Using Python Automation to interact with network devices [Tutorial]

Basics of Jupyter Notebook and Python

OpenCV: Detecting Edges, Lines, and Shapes

MobilePro

datapro

Programming

Subscribe to our newsletter