Building Machine Learning Systems with Python
上QQ阅读APP看书,第一时间看更新

Learning NumPy

So, let's import NumPy and play with it a bit. For that, we need to start the Python interactive shell:

>>> import numpy
>>> numpy.version.full_version
1.13.3

As we do not want to pollute our namespace, we certainly should not use the following code:

>>> from numpy import *

If we do this, then, for instance, numpy.array will potentially shadow the array package that is included in standard Python. Instead, we will use the following convenient shortcut:

>>> import numpy as np
>>> a = np.array([0,1,2,3,4,5])
>>> a
array([0, 1, 2, 3, 4, 5])
>>> a.ndim
1
>>> a.shape
(6,)

With the previous code snippet, we created an array in the same way that we would create a list in Python. However, the NumPy arrays have additional information about the shape. In this case, it is a one-dimensional array of six elements. That's no surprise so far.

We can now transform this array into a two-dimensional matrix:

>>> b = a.reshape((3,2))
>>> b
array([[0, 1],
[2, 3],
[4, 5]])
>>> b.ndim
2
>>> b.shape
(3, 2)

It is important to realize just how much the NumPy package is optimized. For example, doing the following avoids copies wherever possible:

>>> b[1][0] = 77
>>> b
array([[ 0, 1],
[77, 3],
[ 4, 5]])
>>> a
array([ 0, 1, 77, 3, 4, 5])

In this case, we have modified the value 2 to 77 in b, and we immediately see the same change reflected in a, as well. Keep in mind that whenever you need a true copy, you can always perform the following:

>>> c = a.reshape((3,2)).copy()
>>> c
array([[ 0, 1],
[77, 3],
[ 4, 5]])
>>> c[0][0] = -99
>>> a
array([ 0, 1, 77, 3, 4, 5])
>>> c
array([[-99, 1],
[ 77, 3],
[ 4, 5]])

Note that, here, c and a are totally independent copies.

Another big advantage of NumPy arrays is that the operations are propagated to the individual elements. For example, multiplying a NumPy array will result in an array of the same size (including all of its elements) being multiplied:

>>> d = np.array([1,2,3,4,5])
>>> d*2
array([ 2, 4, 6, 8, 10])

This is also true for other operations:

>>> d**2
array([ 1, 4, 9, 16, 25])

Contrast that with ordinary Python lists:

>>> [1,2,3,4,5]*2
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
>>> [1,2,3,4,5]**2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

Of course, by using NumPy arrays, we sacrifice the agility Python lists offer. Simple operations, such as adding or removing elements, are a bit complex for NumPy arrays. Luckily, we have both at our hands, and we will use the right one for the task at hand.