Applying Math with Python
上QQ阅读APP看书,第一时间看更新

NumPy arrays

NumPy provides high performance array types and routines for manipulating these arrays in Python. These arrays are useful for processing large datasets where performance is crucial. NumPy forms the base for the numerical and scientific computing stack in Python. Under the hood, NumPy makes use of low-level libraries for working with vectors and matrices, such as the Basic Linear Algebra Subprograms (BLAS) package, and the Linear Algebra Package (LAPACK)contains more advanced routines for linear algebra.

Traditionally, the NumPy package is imported under the shorter alias np, which can be accomplished using the following import statement:

import numpy as np

In particular, this convention is used in the NumPy documentation and in the wider scientific Python ecosystem (SciPy, Pandas, and so on).

The basic type provided by the NumPy library is the ndarray type (henceforth referred to as a NumPy array). Generally, you won't create your own instances of this type, and will instead use one of the helper routines such as array to set up the type correctly. The array routine creates NumPy arrays from an array-like object, which is typically a list of numbers or a list of lists (of numbers). For example, we can create a simple array by providing a list with the required elements:

ary = np.array([1, 2, 3, 4])  # array([1, 2, 3, 4])

The NumPy array type (ndarray) is a Python wrapper around an underlying C array structure. The array operations are implemented in C and optimized for performance. NumPy arrays must consist of homogeneous data (all elements have the same type), although this type could be a pointer to an arbitrary Python object. NumPy will infer an appropriate data type during creation if one is not explicitly provided using thedtype keyword argument:

np.array([1, 2, 3, 4], dtype=np.float32)
# array([1., 2., 3., 4.], dtype=float32)

Under the hood, a NumPy array of any shape is a buffer containing the raw data as a flat (one-dimensional) array, and a collection of additional metadata that specifies details such as the type of the elements.

After creation, the data type can be accessed using thedtype attribute of the array. Modifying thedtype attribute will have undesirable consequences since the raw bytes that constitute the data in the array will simply be reinterpreted as the new data type. For example, if we create an array using Python integers, NumPy will convert those to 64-bit integers in the array. Changing the dtype value will cause NumPy to reinterpret these 64-bit integers to the new data type:

arr = np.array([1, 2, 3, 4])
print(arr.dtype) # dtype('int64')
arr.dtype = np.float32
print(arr)
# [1.e-45 0.e+00 3.e-45 0.e+00 4.e-45 0.e+00 6.e-45 0.e+00]

Each 64-bit integer has been re-interpreted as two 32-bit, floating-point numbers, which clearly gives nonsense values. Instead, if you wish to change the data type after creation, use theastype method to specify the new type. The correct way to change the data type is shown here:

arr = arr.astype(np.float32)
print(arr)
# [1. 2. 3. 4.]

NumPy also provides a number of routines for creating various standard arrays. Thezeros routine creates an array, of the specified shape, in which every element is 0, and theones routine creates an array in which every element is 1.

Element access

NumPy arrays support the getitem protocol, so elements in an array can be accessed as if it were a list and support all of the arithmetic operations, which are performed component-wise. This means we can use the index notation and the index to retrieve the element from the specified index as follows:

ary = np.array([1, 2, 3, 4])
ary[0] # 1
ary[2] # 3

This also includes the usual slice syntax for extracting an array of data from an existing array. A slice of an array is again an array, containing the elements specified by the slice. For example, we can retrieve an array containing the first two elements of ary, or an array containing the elements at even indexes, as follows:

first_two = ary[:2]  # array([1, 2])
even_idx = ary[::2] # array([1, 3])

The syntax for a slice is start:stop:step. We can omit either, or both, of start and stop to take from the beginning or the end, respectively, of all elements. We can also omit the step parameter, in which case we also drop the trailing :. The step parameter describes the elements from the chosen range that should be selected. A value of 1 selects every element or, as in the recipe, a value of 2 selects every second element (starting from 0 gives even-numbered elements). This syntax is the same as for slicing Python lists.

Array arithmetic and functions

NumPy provides a number of universal functions (ufunc), which are routines that can operate efficiently on NumPy array types. In particular, all of the basic mathematical functions discussed in the Basic mathematical functions section have analogues in NumPy that can operate on NumPy arrays. Universal functions can also perform broadcasting, to allow them to operate on arrays of different—but compatible—shapes.

The arithmetic operations on NumPy arrays are performed component-wise. This is best illustrated by the following example:

arr_a = np.array([1, 2, 3, 4])
arr_b = np.array([1, 0, -3, 1])
arr_a + arr_b # array([2, 2, 0, 5])
arr_a - arr_b # array([0, 2, 6, 3])
arr_a * arr_b # array([ 1, 0, -9, 4])
arr_b / arr_a # array([ 1. , 0. , -1. , 0.25])
arr_b**arr_a # array([1, 0, -27, 1])

Note that the arrays must be the same shape, which means have the same length. Using an arithmetic operation on arrays of different shapes will result in a ValueError. Adding, subtracting, multiplying, or dividing by a number will result in array where the operation has been applied to each component. For example, we can multiply all elements in an array by 2 by using the following command:

arr = np.array([1, 2, 3, 4])
new = 2*arr
print(new)
# [2, 4, 6, 8]

Useful array creation routines

To generate arrays of numbers at regular intervals between two given end points, you can use either thearange routine or thelinspace routine. The difference between these two routines is thatlinspace generates a number (the default is 50) of values with equal spacing between the two end points, including both endpoints, whilearange generates numbers at a given step size up to, but not including, the upper limit. The linspace routine generates values in the closed interval a ≤ x ≤ b and the arange routine generates values in the half-open interval a≤ x < b:

np.linspace(0, 1, 5)  # array([0., 0.25, 0.5, 0.75, 1.0])
np.arange(0, 1, 0.3) # array([0.0, 0.3, 0.6, 0.9])

Note that the array generated using linspace has exactly 5 points, specified by the third argument, including the two end points, 0 and 1. The array generated by arange has 4 points, and does not include the right end point, 1; an additional step of 0.3 would equal 1.2, which is larger than 1.

Higher dimensional arrays

NumPy can create arrays with any number of dimensions, which are created using the same array routine as simple one-dimensional arrays. The number of dimensions of an array is specified by the number of nested lists provided to the array routine. For example, we can create a two-dimensional array by providing a list of lists, where each member of the inner list is a number, such as the following:

mat = np.array([[1, 2], [3, 4]])

NumPy arrays have a shapeattribute, which describes the arrangement of the elements in each dimension. For a two-dimensional array, the shape can be interpreted as the number of rows and the number of columns of the array.

NumPy stores the shape as the shape attribute on the array object, which is a tuple. The number of elements in this tuple is the number of dimensions:

vec = np.array([1, 2])
mat.shape # (2, 2)
vec.shape # (2,)

Since the data in a NumPy array is stored in a flat (one-dimensional) array, an array can be reshaped with little cost by simply changing the associated metadata. This is done using thereshape method on a NumPy array:

mat.reshape(4,)  # array([1, 2, 3, 4])

Note that the total number of elements must remain unchanged. The matrixmat originally has shape(2, 2) with a total of 4 elements, and the latter is a one-dimensional array with shape(4,), which again has a total of 4 elements. Attempting to reshape when there is a mismatch in the total number of elements will result in aValueError.

To create an array of higher dimensions, simply add more levels of nested lists. To make this clearer, in the following example, we separate out the lists for each element in the third dimension before we construct the array:

mat1 = [[1, 2], [3, 4]]
mat2 = [[5, 6], [7, 8]]
mat3 = [[9, 10], [11, 12]]
arr_3d = np.array([mat1, mat2, mat3])
arr_3d.shape # (3, 2, 2)
Note that the first element of the shape is the outermost, and the last element is the innermost.

This means that adding an additional dimension to an array is a simple matter of providing the relevant metadata. Using the array routine, the shape metadata is described by the length of each list in the argument. The length of the outermost list defines the corresponding shape parameter for that dimension, and so on.

The size in memory of a NumPy array does not significantly depend on the number of dimensions, but only on the total number of elements, which is the product of the shape parameters. However, note that th e total number of elements tends to be larger in higher dimensional arrays.

To access an element in a multi-dimensional array, you use the usual index notation, but rather than providing a single number, you need to provide the index in each dimension. For a 2 × 2 matrix, this means specifying the row and column for the desired element:

mat[0, 0]  # 1 - top left element
mat[1, 1] # 4 - bottom right element

The index notation also supports slicing in each dimension, so we can extract all members of a single column by using the slice mat[:, 0] like so:

mat[:, 0]
# array([1, 3])

Note that the result of the slice is a one-dimensional array.

The array creation functions, zeros and ones, can create multi-dimensional arrays by simply specifying a shape with more than one dimension parameter.