## Transcript — Introduction

We have now seen that symmetric matrices are wonderful because they can be orthogonally diagonalized. However, in many real-world applications, the matrices we get are not symmetric. Well, if the matrix is not symmetric, then we still may be able to diagonalize it, which is almost as good. But we have seen in Linear Algebra I that not all matrices are diagonalizable, so if the matrix is not diagonalizable, then what do we do? Well, we have seen that as long as it has all real eigenvalues, then we can triangularize it.

This all seems pretty good. However, we are missing one important detail. For all of these I just mentioned, the matrix has to be square. What do we do if we get an m-by-n matrix with m not equal to n? This is what we will figure out during this lecture and the next.

To start figuring this out, let’s look at an example of using a non-square matrix with a problem similar to what we were doing with quadratic forms. Example: Let A be the matrix [1, 2; -2, 1; 1, -2], and let L be the linear mapping with standard matrix A. What is the maximum and minimum of the length of L(x) subject to the constraint the length of x is equal to 1? Solution: At first, it is not clear how we will find the maximum or minimum of the length of L(x). However, most of the time when using length, we square it to remove the square root sign, so let’s do that again and see what we have. We get (the length of L(x))-squared is equal to (the length of (Ax))-squared, which, by definition of length, is equal to (Ax) dot product (Ax). Using our very important formula, this is (Ax)-transpose matrix-matrix multiplied by Ax, which, using properties of transposes, gives us (x-transpose)((A-transpose)A)x.

Wow! What is (x-transpose)((A-transpose)A)x? It is a quadratic form with corresponding symmetric matrix (A-transpose)A. So, we can solve this easily by using Theorem 10.5.1. Thus, the maximum occurs at the largest eigenvalue of (A-transpose)A, and the minimum at the smallest eigenvalue of (A-transpose)A. We have (A-transpose)A = [6, -2; -2, 9], which has eigenvalues lambda1 = 10 and lambda2 = 5. Thus, the largest eigenvalue of (A-transpose)A is 10, so the maximum of (the length of L(x))-squared is 10. To find the maximum of the length of L(x), we need to take the square root of this. So, we get the maximum of the length of L(x) subject to the length of x is equal to 1 is the square root of 10. Similarly, the minimum of the length of L(x) subject to the length of x is equal to 1 is the square root of 5.

At what vector x do these occur? When x is a unit eigenvector corresponding to the eigenvalue. We find that a unit eigenvector for lambda1 = 10 is v1 = [-1/(root 5); 2/(root 5)], and v2 = [2/(root 5); 1/(root 5)] is a unit eigenvector for lambda2 = 5. We can easily verify that the length of L(v1) is equal to root 10, and the length of L(v2) is equal to the square root of 5.

Since we had to take the square root of the eigenvalues of (A-transpose)A, it is very lucky that they weren’t negative. What would we do if one of the eigenvalues of (A-transpose)A was negative? We don’t have to worry about it because math is awesome. We will now prove that the eigenvalues of (A-transpose)A have to be non-negative.

Theorem 10.6.1: If A is an m-by-n matrix, and lambda1 to lambda_n are the eigenvalues of (A-transpose)A with corresponding unit eigenvectors v1 to vn, then lambda1 to lambda_n are all non-negative, and the length of (Avi) is equal to the square root of lambda_i. Proof: For i from 1 to n, we are assuming that vi is an eigenvector of (A-transpose)A, so this means that (A-transpose)Avi is equal to (lambda_i)vi. Thus, we get (the length of (Avi))-squared is equal to (Avi) dot product (Avi), which, by our important formula, is (Avi)-transpose matrix-matrix multiplied by Avi, which is (vi-transpose)((A-transpose)A)vi. Using the fact that vi is an eigenvector of (A-transpose)A, this is equal to (vi-transpose)((lambda_i)vi), which equals (lambda_i)(vi dot product vi). But since vi is a unit vector, this is just equal to lambda_i. Therefore, lambda_i is equal to the non-negative number (the length of (Avi))-squared, and the result follows.

Observe from the example we did before the theorem that the square root of the eigenvalues of (A-transpose)A are the maximum and minimum values of the length of (Ax) subject to the length of x is equal to 1. So these are behaving like the eigenvalues of a symmetric matrix. Thus, we make the following definition.

Definition: The singular values sigma1 to sigma_n of an m-by-n matrix A are the square roots of the eigenvalues of (A-transpose)A arranged so that sigma1 is greater than or equal to sigma2 is greater than or equal to all the way down to sigma_n. Note that, by definition, we must arrange the singular values from greatest to least. The reason for this is similar to the reason we did this in the proof of Theorem 10.5.1. We really want to know where the largest and the smallest ones are. In particular, we need to ensure all of the 0 singular values are at the end.

Example: Find the singular values of A = [-1, 2, 1; 1, 1, -1]. Solution: We have (A-transpose)A = [2, -1, -2; -1, 5, 1; -2, 1, 2]. We find that the eigenvalues of (A-transpose)A are 0, 6, and 3. Thus, taking the square roots of these eigenvalues, and remembering to arrange them in the order from greatest to least, we get that the singular values of A are sigma1 = the square root of 6, sigma2 = the square root of 3, and sigma3 = 0.

## Expanding the Definition of Singular Values

We know that the number of non-zero eigenvalues of a square matrix A equals the rank of A. In the example and the check-in, we got that the number of non-zero singular values of a non-square matrix A equaled the rank of A. To prove that this is always the case, we just need to prove that the rank of (A-transpose)A equals the rank of A.

Lemma 10.6.2: If A is an m-by-n matrix, then the nullspace of (A-transpose)A equals the nullspace of A. Proof: Assume that x is in the nullspace of A. Then by definition, Ax = the 0 vector, and so multiplying on both sides by A-transpose, we get (A-transpose)Ax = (A-transpose)(the 0 vector), which equals the 0 vector, and hence, x is in the nullspace of (A-transpose)A. So, the nullspace of A is a subset of the nullspace of (A-transpose)A. On the other hand, assume that x is in the nullspace of (A-transpose)A. We want to prove that Ax is also equal to the 0 vector. And once again, we use our very clever trick of a good way of showing that a vector is equal to the 0 vector is by showing that it has length 0. So, if (A-transpose)Ax = the 0 vector, then (the length of (Ax))-squared is equal to (Ax) dot product (Ax), which, as usual, is equal to (Ax)-transpose matrix-matrix multiplied by (Ax), which is (x-transpose)(A-transpose)Ax, which gives us (x-transpose)(the 0 vector), which is 0. Hence, Ax is equal to the 0 vector, and so x is also in the nullspace of A. Thus, the nullspace of (A-transpose)A is a subset of the nullspace of A, and thus, we have shown that the nullspace of A equals the nullspace of (A-transpose)A.

Theorem 10.6.3: If A is an m-by-n matrix, then the rank of (A-transpose)A equals the rank of A. Proof: Using Lemma 10.6.2 and the Dimension Theorem, we get the rank of (A-transpose)A is equal to n minus the dimension of the nullspace of (A-transpose)A, which is equal to n minus the dimension of the nullspace of A, which equals the rank of A.

Corollary 10.6.4: If A is an m-by-n matrix, and the rank of A is equal to r, then A has r non-zero singular values.

We see that the singular values of a matrix A have a lot of the same properties that eigenvalues have. This shouldn’t be surprising since we have defined singular values in terms of eigenvalues—they are the square roots of the eigenvalues of (A-transpose)A.

In the next lecture, we will continue making singular values look more like eigenvalues by defining singular vectors. We will then finally achieve our goal of mimicking diagonalization for any m-by-n matrix.

This ends this lecture.