(For more resources related to this topic, see here.)

Getting started with NumPy

NumPy is founded around its multidimensional array object, numpy.ndarray. NumPy arrays are a collection of elements of the same data type; this fundamental restriction allows NumPy to pack the data in an efficient way. By storing the data in this way NumPy can handle arithmetic and mathematical operations at high speed.

Creating arrays

You can create NumPy arrays using the numpy.array function. It takes list-like object (or another array) as input and, optionally, a string expressing its data type. You can interactively test array creation using an IPython shell as follows:

In [1]: import numpy as np
In [2]: a = np.array([0, 1, 2])

Every NumPy array has a data type that can be accessed by the dtype attribute, as shown in the following code. In the following code example, dtype is a 64-bit integer.

In [3]: a.dtype
Out[3]: dtype('int64')

If we want those numbers to be treated as a float type of variable, we can either pass the dtype argument in the np.array function or cast the array to another data type using the astype method as shown in the following code:

In [4]: a = np.array([1, 2, 3], dtype='float32')
In [5]: a.astype('float32')
Out[5]: array([ 0.,  1.,  2.], dtype=float32)

To create an array with two dimensions (an array of arrays) we can initialize the array using a nested sequence shown as follows:

In [6]: a = np.array([[0, 1, 2], [3, 4, 5]])
In [7]: print(a)
Out[7]: [[0 1 2]
         [3 4 5]]

The array created in this way has two dimensions—axes in NumPy's jargon. Such an array is like a table that contains two rows and three columns. We can access the axes structure using the ndarray.shape attribute:

In [7]: a.shape
Out[7]: (2, 3)

Arrays can also be reshaped only as long as the product of the shape dimensions is equal to the total number of elements in the array. For example, we can reshape an array containing 16 elements in the following ways: (2, 8), (4, 4), or (2, 2, 4). To reshape an array we can either use the ndarray.reshape method or directly change the ndarray.shape attribute. The following code illustrates the use of the ndarray.reshape method:

In [7]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8,
                       9, 10, 11, 12, 13, 14, 15])
In [7]: a.shape
Out[7]: (16,)
In [8]: a.reshape(4, 4) # Equivalent: a.shape = (4, 4)
Out[8]:
array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])

Thanks to this property you are also free to add dimensions of size one. You can reshape an array with 16 elements to (16, 1), (1, 16), (16, 1, 1), and so on.

NumPy provides convenience functions, shown in the following code, to create arrays filled with zeros, filled with ones, or without an initialization value (empty—their actual value is meaningless and depends on the memory state). Those functions take the array shape as a tuple and optionally its dtype.

In [8]: np.zeros((3, 3))
In [9]: np.empty((3, 3))
In [10]: np.ones((3, 3), dtype='float32')

In our examples we will use the numpy.random module to generate random floating point numbers in the (0, 1) interval. The numpy.random module is shown as follows: In [11]: np.random.rand(3, 3)

Sometimes it is convenient to initialize arrays that have a similar shape to other arrays. Again, NumPy provides some handy functions for that purpose such as zeros_like, empty_like, and ones_like. These functions are as follows:

In [12]: np.zeros_like(a)
In [13]: np.empty_like(a)
In [14]: np.ones_like(a)

Accessing arrays

NumPy array interface is, on a shallow level, similar to Python lists. They can be indexed using integers, and can also be iterated using a for loop.

In [15]: A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
In [16]: A[0]
Out[16]: 0
In [17]: [a for a in A]
Out[17]: [0, 1, 2, 3, 4, 5, 6, 7, 8]

It is also possible to index an array in multiple dimensions. If we take a (3,3) array (an array containing 3 triplets) and we index the first element, we obtain the first triplet shown as follows:

In [18]: A = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
In [19]: A[0]
Out[19]: array([0, 1, 2])

We can index the triplet again by adding the other index separated by a comma. To get the second element of the first triplet we can index using [0, 1] as shown in the following code:

In [20]: A[0, 1]
Out[20]: 1

NumPy allows you to slice arrays in single and multiple dimensions. If we index on the first dimension we will get a collection of triplets shown as follows:

In [21]: A[0:2]
Out[21]: array([[0, 1, 2],
               [3, 4, 5]])

If we slice the array with [0:2]. for every selected triplet we extract the first two elements, resulting in a (2, 2) array shown in the following code:

In [22]: A[0:2, 0:2]
Out[22]: array([[0, 1],
                [3, 4]])

Intuitively, you can update values in the array by using both numerical indexes and slices. The syntax is as follows:

In [23]: A[0, 1] = 8
In [24]: A[0:2, 0:2] = [[1, 1], [1, 1]]

    
        Unlock access to the largest independent learning library in Tech for FREE!
        
            
                Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
                Renews at $19.99/month. Cancel anytime

Indexing with the slicing syntax is fast because it doesn't make copies of the array. In NumPy terminology it returns a view over the same memory area. If we take a slice of the original array and then changes one of its value; the original array will be updated as well. The following code illustrates an example of the same:

In [25]: a = np.array([1, 1, 1, 1])
In [26]: a_view = A[0:2]
In [27]: a_view[0] = 2
In [28]: print(A)
Out[28]: [2 1 1 1]

We can take a look at another example that shows how the slicing syntax can be used in a real-world scenario. We define an array r_i, shown in the following line of code, which contains a set of 10 coordinates (x, y); its shape will be (10, 2):

In [29]: r_i = np.random.rand(10, 2)

A typical operation is extracting the x component of each coordinate. In other words you want to extract the items [0, 0], [1, 0], [2, 0], and so on. resulting in an array with shape (10,). It is helpful to think that the first index is moving while the second one is fixed (at 0). With this in mind, we will slice every index on the first axis (the moving one) and take the first element (the fixed one) on the second axis as shown in the following line of code:

In [30]: x_i = r_i[:, 0]

On the other hand, the following expression of code will keep the first index fixed and the second index moving, giving the first (x, y) coordinate:

In [31]: r_0 = r_i[0, :]

Slicing all the indexes over the last axis is optional; using r_i[0] has the same effect as r_i[0, :].

NumPy allows to index an array by using another NumPy array made of either integer or Boolean values—a feature called fancy indexing.

If you index with an array of integers, NumPy will interpret the integers as indexes and will return an array containing their corresponding values. If we index an array containing 10 elements with [0, 2, 3], we obtain an array of size 3 containing the elements at positions 0, 2 and 3. The following code gives us an illustration of this concept:

In [32]: a = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
In [33]: idx = np.array([0, 2, 3])
In [34]: a[idx]
Out[34]: array([9, 7, 6])

You can use fancy indexing on multiple dimensions by passing an array for each dimension. If we want to extract the elements [0, 2] and [1, 3] we have to pack all the indexes acting on the first axis in one array, and the ones acting on the second axis in another. This can be seen in the following code:

In [35]: a = np.array([[0, 1, 2], [3, 4, 5],
                       [6, 7, 8], [9, 10, 11]])
In [36]: idx1 = np.array([0, 1])
In [37]: idx2 = np.array([2, 3])
In [38]: a[idx1, idx2]

You can also use normal lists as index arrays, but not tuples. For example the following two statements are equivalent:

>>> a[np.array([0, 1])] # is equivalent to
>>> a[[0, 1]]

However, if you use a tuple, NumPy will interpret the following statement as an index on multiple dimensions:

>>> a[(0, 1)] # is equivalent to
 >>> a[0, 1]

The index arrays are not required to be one-dimensional; we can extract elements from the original array in any shape. For example we can select elements from the original array to form a (2,2) array shown as follows:

In [39]: idx1 = [[0, 1], [3, 2]]
In [40]: idx2 = [[0, 2], [1, 1]]
In [41]: a[idx1, idx2]
Out[41]: array([[ 0,  5],
                 [10,  7]])

The array slicing and fancy indexing features can be combined. For example, this is useful if we want to swap the x and y columns in a coordinate array. In the following code, the first index will be running over all the elements (a slice), and for each of those we extract the element in position 1 (the y) first and then the one in position 0 (the x):

In [42]: r_i = np.random(10, 2)
In [43]: r_i[:, [0, 1]] = r_i[:, [1, 0]]

When the index array is a Boolean there are slightly different rules. The Boolean array will act like a mask; every element corresponding to True will be extracted and put in the output array. This procedure is shown as follows:

In [44]: a = np.array([0, 1, 2, 3, 4, 5])
In [45]: mask = np.array([True, False, True, False, False, False])
In [46]: a[mask]
Out[46]: array([0, 2])

The same rules apply when dealing with multiple dimensions. Furthermore, if the index array has the same shape as the original array, the elements corresponding to True will be selected and put in the resulting array.

Indexing in NumPy is a reasonably fast operation. Anyway, when speed is critical, you can use the, slightly faster, numpy.take and numpy.compress functions to squeeze out a little more speed. The first argument of numpy.take is the array we want to operate on, and the second is the list of indexes we want to extract. The last argument is axis; if not provided, the indexes will act on the flattened array, otherwise they will act along the specified axis.

In [47]: r_i = np.random(100, 2)
In [48]: idx = np.arange(50) # integers 0 to 50
In [49]: %timeit np.take(r_i, idx, axis=0)
1000000 loops, best of 3: 962 ns per loop
In [50]: %timeit r_i[idx]
100000 loops, best of 3: 3.09 us per loop

The similar, but faster version for Boolean arrays is numpy.compress which works in the same way. The use of numpy.compress is shown as follows:

In [51]: idx = np.ones(100, dtype='bool') # all True values
In [52]: %timeit np.compress(idx, r_i, axis=0)
1000000 loops, best of 3: 1.65 us per loop
In [53]: %timeit r_i[idx]
100000 loops, best of 3: 5.47 us per loop