Applying Math with Python
上QQ阅读APP看书,第一时间看更新

Matrices

NumPy arrays also serve as matrices, which are fundamental in mathematics and computational programming. A matrix is simply a two-dimensional array. Matrices are central in many applications, such as geometric transformations and simultaneous equations, but also appear as useful tools in other areas such a statistics. Matrices themselves are only distinctive (compared to any other array) once we equip them with matrix arithmetic. Matrices have element-wise addition and subtraction operations, just as for NumPy arrays, a third operation called scalar multiplication, where we multiply every element of the matrix by a constant number, and a different notion of matrix multiplication. Matrix multiplication is fundamentally different from other notions of multiplication, as we will see later.

One of the most important attributes of a matrix is its shape, defined exactly as for NumPy arrays. A matrix with m rows and n columns is usuallydescribed as an m × n matrix. A matrix that has the same number of rows as columns is said to be a square matrix, and these matrices play a special role in the theory of vectors and matrices.

The identity matrix (of sizen) is then ×n matrix where the (i,i)-th entry is 1, and the (i, j)-th entry is zero forij. There is an array creation routine that gives ann × n identity matrix for a specifiedn value:

np.eye(3)
# array([[1., 0., 0.],
# [0., 1., 0.],
# [0., 0., 1.]])

Basic methods and properties

There are a large number of terms and quantities associated with matrices. We only mention two such properties here, since they will be useful later. These are the transpose of a matrix, where rows and columns are interchanged, and the trace of a square matrix, which is the sum of the elements along the leading diagonal. The leading diagonal consists of the elements aii along the line from the top left of the matrix to the bottom right.

NumPy arrays can be easily transposed by calling the transposemethod on the array object. In fact, since this is such a common operation, arrays have a convenience propertyT that returns the transpose of the matrix. The transposition reverses the order of the shape of a matrix (array), so that rows become columns and columns become rows. For example, if we start with a 3 × 2 matrix (3 rows, 2 columns), then its transpose will be a 2 × 3 matrix, such as in the following example:

mat = np.array([[1, 2], [3, 4]])
mat.transpose()
# array([[1, 3],
# [2, 4]])
mat.T
# array([[1, 3],
# [2, 4]])

Another quantity associated with matrices that is occasionally useful is the trace. Thetrace of a square matrixA, with entries as in the preceding code, is defined to be the sum of the elements along theleading diagonal, which consists of the elements starting from the top left diagonally to the bottom right. The formula for the trace is given as

NumPy arrays have a trace method that returns the trace of a matrix:

A = np.array([[1, 2], [3, 4]])
A.trace() # 5

The trace can also be accessed using the np.trace function, which is not bound to the array.

Matrix multiplication

Matrix multiplication is an operation performed on two matrices, which preserves some of the structure and character of both matrices. Formally, if A is an l × m matrix, and B is an m × n matrix, say

then the matrix product C of A and B is an l × n matrix whose (p, q)-th entry is given by

Note that the number of columns of the first matrix must match the number of rows of the second matrix in order for matrix multiplication to be defined. We usually write AB for the matrix product of A and B, if it is defined. Matrix multiplication is a peculiar operation. It is not commutative like most other arithmetic operations: even if AB and BA can both be computed, there is no need for them to be equal. In practice, this means that the order of multiplication matters for matrices. This arises from the origins of matrix algebras as representations of linear maps, where multiplication corresponds to the composition of functions.

Python has an operator reserved for matrix multiplication@, which was added in Python 3.5. NumPy arrays implement the operator to perform matrix multiplication. Note that this is fundamentally different from the component-wise multiplication of arrays*:

A = np.array([[1, 2], [3, 4]])
B = np.array([[-1, 1], [0, 1]])
A @ B
# array([[-1, 3],
# [-3, 7]])
A * B # different from A @ B
# array([[-1, 2],
# [ 0, 4]])

The identity matrix is a neutral element under matrix multiplication. That is, if A is any l × m matrix, and I is the m × m identity matrix, then AI = A. This can be easily checked for specific examples using NumPy arrays:

A = np.array([[1, 2], [3, 4]])
I = np.eye(2)
A @ I
# array([[1, 2],
# [3, 4]])

Determinants and inverses

The determinant of a square matrix is important in most applications because of its strong link with finding the inverse of a matrix. A matrix is square if the number of rows and columns are equal. In particular, a matrix that has a non-zero determinant has a (unique) inverse, which translates to unique solutions of certain systems of equations. The determinant of a matrix is defined recursively. For a 2 × 2 matrix

the determinant of A is defined by the formula

For a general n × n matrix

where n > 2, we define the submatrix Ai,j for 1 ≤ i, jn, to be the result of deleting the ith row and jth column from A. The submatrix Ai,j is an (n-1) × (n-1) matrix, and so we can compute the determinant. We then define the determinant of A to be the quantity

In fact, the index 1 that appears in the preceding equation can be replaced by any 1 ≤ i≤ n and the result will be the same.

The NumPy routine for computing the determinant of a matrix is contained in a separate module called linalg. This module contains many common routines for linear algebra, which is the branch of mathematics that covers vector and matrix algebra. The routine for computing the determinant of a square matrix is the det routine:

from numpy import linalg
linalg.det(A) # -2.0000000000000004

Note that a floating-point rounding error has occurred in the calculation of the determinant.

The SciPy package, if installed, also offers a linalg module that extends NumPy's linalg. The SciPy version not only includes additional routines, but it is also always compiled with BLAS and LAPACK support, while for the NumPy version, this is optional. Thus, the SciPy variant may be preferable, depending on how NumPy was compiled, if speed is important.

Theinverse of ann ×n matrixA is the (necessarily unique) n ×nmatrixB, such thatAB=BA=I, whereIdenotes the n ×n identity matrix and the multiplication performed here is matrix multiplication. Not every square matrix has an inverse; those that do not are sometimes calledsingular matrices. In fact, a matrix is non-singular (that is, has an inverse) if, and only if, the determinant of that matrix is not 0. WhenA has an inverse, it is customary to denote it byA-1.

The inv routine from the linalg module computes the inverse of a matrix, if it exists:

linalg.inv(A)
# array([[-2. , 1. ],
# [ 1.5, -0.5]])

We can check that the matrix given by the inv routine is indeed the matrix inverse of A by matrix multiplying (on either side) by the inverse and checking that we get the 2 × 2 identity matrix:

Ainv = linalg.inv(A)
Ainv @ A
# Approximately
# array([[1., 0.],
# [0., 1.]])
A @ Ainv
# Approximately
# array([[1., 0.],
# [0., 1.]])

There will be a floating-point error in these computations, which has been hidden away behind the Approximately comment, due to the way that matrix inverses are computed.

The linalg package also contains a number of other methods such as norm, which computes various norms of a matrix. It also contains functions for decomposing matrices in various ways and solving systems of equations.

There are also the matrix analogues of the exponential function expm, the logarithm logm, sine sinm, cosine cosm, and tangent tanm. Note that these functions are not the same as the standard exp, log, sin, cos, and tan functions in the base NumPy package, which apply the corresponding function on an element by element basis. In contrast, the matrix exponential function is defined using a "power series" of matrices

where A is an n × n matrix and Ak is the kth matrix power of A; that is, the A matrix multiplied by itself k times. Note that this "power series" always converges in an appropriate sense. By convention, we take A0 = I, where I is the n × n identity matrix. This is completely analogous to the usual power series definition of the exponential function for real or complex numbers, but with matrices and matrix multiplication in place of numbers and (regular) multiplication. The other functions are defined in a similar fashion, but we will skip the details.

Systems of equations

Solving systems of (linear) equations is one of the main motivations for studying matrices in mathematics. Problems of this type occur frequently in a variety of applications. We start with a system of linear equations written as

where n is at least two, ai,jand bi are known values, and the xi values are the unknown values that we wish to find.

Before we can solve such a system of equations, we need to convert the problem into a matrix equation. This is achieved by collecting together the coefficients ai,j into an n × n matrix and using the properties of matrix multiplication to relate this matrix to the system of equations. So, let

be the matrix containing the coefficients taken from the equations. Then, if we take x to be the unknown (column) vector containing the xi values and b to be the (column) vector containing the known values bi, then we can rewrite the system of equations as the single matrix equation

which we can now solve using matrix techniques. In this situation, we view a column vector as an n × 1 matrix, so the multiplication in the preceding equation is matrix multiplication. To solve this matrix equation, we use the solve routine in the linalg module. To illustrate the technique, we will solve the following system of equations as an example:

These equations have three unknown values, x1, x2, and x3. First, we create the matrix of coefficients and the vector b. Since we are using NumPy as our means of working with matrices and vectors, we create a two-dimensional NumPy array for the matrix A and a one-dimensional array for b:

import numpy as np
from numpy import linalg

A = np.array([[3, -2, 1], [1, 1, -2], [-3, -2, 1]])
b = np.array([7, -4, 1])

Now, the solution to the system of equations can be found using the solve routine:

linalg.solve(A, b)  # array([ 1., -1., 2.])

This is indeed the solution to the system of equations, which can be easily verified by computing A @ x and checking the result against the barray. There may be a floating-point rounding error in this computation.

The solve function expects two inputs, which are the matrix of coefficients A and the right-hand side vector b. It solves the system of equations using LAPACK routines that decompose matrix A into simpler matrices to quickly reduce to an easier problem that can be solved by simple substitution. This technique for solving matrix equations is extremely powerful and efficient, and is less prone to the floating-point rounding errors that dog some other methods. For instance, the solution to a system of equations could be computed by multiplying (on the left) by the inverse of the matrix A, if the inverse is known. However, this is generally not as good as using the solve routine since it may be slower or result in larger numerical errors.

In the example we used, the coefficient matrix A was square. That is, there are the same number of equations as there are unknown values. In this case, the system of equations has a unique solution if (and only if) the determinant of this matrix A is not 0. In cases where the determinant of A is 0, one of two things can happen: the system can have no solution, in which case we say that the system is inconsistent; or there can be infinitely many solutions. The difference between a consistent and inconsistent system is usually determined by the vector b. For example, consider the following systems of equations:

The left-hand system of equations is consistent and has infinitely many solutions; for instance, taking x = 1 and y = 1 or x = 0 and y = 2 are both solutions. The right-hand system of equations is inconsistent, and there are no solutions. In both of the above, the solve routine will fail because the coefficient matrix is singular.

The coefficient matrix does not need to be square for the system to be solvable. For example, if there are more equations than there are unknown values (a coefficient matrix has more rows than columns). Such a system is said to be over-specified and, provided that it is consistent, it will have a solution. If there are fewer equations than there are unknown values, then the system is said to be under-specified. Under-specified systems of equations generally have infinitely many solutions if they are consistent, since there is not enough information to uniquely specify all the unknown values. Unfortunately, the solve routine will not be able to find solutions for systems where the coefficient matrix is not square, even if the system does have a solution.

Eigenvalues and eigenvectors

Consider the matrix equation Ax = λx, where A is a square (n × n) matrix, x is a vector, and λ is a number. Numbers λ for which there is an x that solves this equation are called eigenvalues, and the corresponding vectors x are called eigenvectors. Pairs of eigenvalues and corresponding eigenvectors encode information about the matrix A, and are therefore important in many applications where matrices appear.

We will demonstrate computing eigenvalues and eigenvectors using the following matrix:

We must first define this as a NumPy array:

import numpy as np
from numpy import linalg
A = np.array([[3, -1, 4], [-1, 0, -1], [4, -1, 2]])

The eig routine in the linalg module is used to find the eigenvalues and eigenvectors of a square matrix. This routine returns a pair (v, B) where v is a one-dimensional array containing the eigenvalues and B is a two-dimensional array whose columns are the corresponding eigenvectors:

v, B = linalg.eig(A)

It is perfectly possible for a matrix with only real entries to have complex eigenvalues and eigenvectors. For this reason, the return type of the eig routine will sometimes be a complex number type such as complex32 or complex64. In some applications, complex eigenvalues have a special meaning, while in others we only consider the real eigenvalues.

We can extract an eigenvalue/eigenvector pair from the output of eig using the following sequence:

i = 0 # first eigenvalue/eigenvector pair
lambda0 = v[i]
print(lambda0)
# 6.823156164525971
x0 = B[:, i] # ith column of B
print(x0)
# array([ 0.73271846, -0.20260301, 0.649672352])

The eigenvectors returned by the eig routine are normalized so that they have norm (length) 1. (The Euclidean norm is defined to be the square root of the sum of the squares of the members of the array.) We can check that this is the case by evaluating in the norm of the vector using the norm routine from linalg:

linalg.norm(x0)  # 1.0  - eigenvectors are normalized.

Finally, we can check that these values do indeed satisfy the definition of an eigenvalue/eigenvector pair by computing the product A @ x0 and checking that, up to floating-point precision, this is equal to lambda0*x0:

lhs = A @ x0
rhs = lambda0*x0
linalg.norm(lhs - rhs) # 2.8435583831733384e-15 - very small.

The norm computed here represents the "distance" between the left-hand side lhs and the right-hand side rhs of the equation Ax = λx. Since this distance is extremely small (0 to 14 decimal places), we can be fairly confident that they are actually the same. The fact that this is not zero is likely due to floating-point precision error.

The eig routine is a wrapper around the low-level LAPACK routines for computing eigenvalues and eigenvectors. The theoretical procedure for finding eigenvalues and eigenvectors is to first find the eigenvalues by solving the equation

where I is the appropriate identity matrix, to find the values λ. The equation determined by the left-hand side is a polynomial in λ and is called the characteristic polynomial of A. The corresponding eigenvectors can then be found by solving the matrix equation

where λjis one of the eigenvalues already found. In practice, this process is somewhat inefficient, and there are alternative strategies for computing eigenvalues and eigenvectors numerically more efficiently.

One key application of eigenvalues and eigenvectors is in principal component analysis, which is a key technique for reducing a large, complex dataset to better understand the internal structure.

We can only compute eigenvalues and eigenvectors for square matrices; for non-square matrices, the definition does not make sense. There is a generalization of eigenvalues and eigenvalues to non-square matrices called singular values.

Sparse matrices

Systems of linear equations such as those discussed earlier are extremely common throughout mathematics and, in particular, in mathematical computing. In many applications, the coefficient matrix will be extremely large, with thousands of rows and columns, and will likely be obtained from an alternative source rather than simply entering by hand. In many cases, it will also be a sparse matrix, where most of the entries are 0.

A matrix is sparse if a large number of the elements are zero. The exact number of elements that need to be zero in order to call a matrix sparse is not well defined. Sparse matrices can be represented more efficiently, for example, by simply storing the indexes (i, j) and the values ai,j that are non-zero. There are entire collections of algorithms for sparse matrices that offer great improvements in performance, assuming the matrix is indeed sufficiently sparse.

Sparse matrices appear in many applications, and often follow some kind of pattern. In particular, several techniques for solving partial differential equations (PDEs) involve solving sparse matrix equations (see Chapter 3, Calculus and Differential Equations), and matrices associated with networks are often sparse. There are additional routines for sparse matrices associated with networks (graphs) contained in the sparse.csgraph module. We will discuss these further in Chapter 5, Working with Trees and Networks.

The sparse module contains several different classes representing the different means of storing a sparse matrix. The most basic means of storing a sparse matrix is to store three arrays, two containing integers representing the indices of non zero elements, and the third the data of the corresponding element. This is the format of the coo_matrix class. Then there are the compressed column CSC (csc_matrix) and the compressed row CSR (csr_matrix) formats, which provide efficient column or row slicing, respectively.There are three additional sparse matrix classes in sparse, including dia_matrix, which efficiently stores matrices where the non-zero entries appear along a diagonal band.

The sparse module from SciPy contains routines for creating and working with sparse matrices. We import the sparse module from SciPy using the following import statement:

import numpy as np
from scipy import sparse

A sparse matrix can be created from a full (dense) matrix, or some other kind of data structure. This is done using the constructor for the specific format in which you wish to store the sparse matrix.

For example, we can take a dense matrix and store it in CSR format by using the following command:

A = np.array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
sp_A = sparse.csr_matrix(A)

If you are generating a sparse matrix by hand, the matrix probably follows some kind of pattern, such as the following tridiagonal matrix:

Here, the non-zero entries appear on the diagonal and on either side of the diagonal, and the non-zero entries in each row follow the same pattern. To create such a matrix, we could use one of the array creation routines in sparse such as diags, which is a convenience routine for creating matrices with diagonal patterns:

T = sparse.diags([-1, 2, -1], (-1, 0, 1), shape=(5, 5), format="csr")

This will create the matrix T as described previously and store it as a sparse matrix in compressed sparse row CSR format. The first argument specifies the values that should appear in the output matrix, and the second argument is the positions relative to the diagonal position in which the values should be placed. So the 0 index in the tuple represents the diagonal entry, -1 is to the left of the diagonal in the row, and +1 is to the right of the diagonal in the row. The shape keyword argument gives the dimensions of the matrix produced, and the format specifies the storage format for the matrix. If no format is provided using the optional argument, then a reasonable default will be used. The array T can be expanded to a full (dense) matrix using the toarray method:

T.toarray()
# array([[ 2, -1, 0, 0, 0],
# [-1, 2, -1, 0, 0],
# [ 0, -1, 2, -1, 0],
# [ 0, 0, -1, 2, -1],
# [ 0, 0, 0, -1, 2]])

When the matrix is small (as it is here), there is little difference in performance between the sparse solving routine and the usual solving routines.

Once a matrix is stored in a sparse format, we can use the sparse solving routines in the linalg submodule of sparse. For example, we can solve a matrix equation using the spsolve routine from this module. The spsolve routine will convert the matrix into CSR or CSC, which may add additional time to the computation if it is not provided in one of these formats:

from scipy.sparse import linalg
linalg.spsolve(T.tocsr(), np.array([1, 2, 3, 4, 5]))
# array([ 5.83333333, 10.66666667, 13.5 , 13.33333333, 9.16666667])

The sparse.linalg module also contains many of the routines that can be found in the linalg module of NumPy (or SciPy) that accept sparse matrices instead of full NumPy arrays, such as eig and inv.