## Transcript — Introduction

In the last lecture, we defined singular values of an m-by-n matrix to behave like eigenvalues of a square matrix. In this lecture, we will extend the similarity of singular values and eigenvalues by defining singular vectors, and then use these to mimic diagonalization for m-by-n matrices.

## Singular Vectors

Let A be an m-by-n matrix. To stress what we are doing here, let’s assume that m is not equal to n, although, in the end, this will work even in the case where m does equal n. We want to mimic eigenvalues and eigenvectors. For an n-by-n matrix B, we have the eigenvalue-eigenvector equation Bv = (lambda)v. However, observe that we cannot have Av = (lambda)v since Av is a vector in Rm and (lambda)v is a vector in Rn. Hence, we see that we need to adjust this to Av = (sigma)u where v is a non-zero vector in Rn and u is a non-zero vector in Rm.

But this equation is not very meaningful. For every non-zero vector v in Rn, we can find a value of sigma and a non-zero vector u that satisfies the equation—in fact, infinitely many of them. So this does not match eigenvectors at all. We need to make an extra condition on v and u so that these have some special meaning. In particular, we want them to behave like eigenvectors.

By definition, for any non-zero singular value sigma of A, there is a vector v not equal to the 0 vector such that (A-transpose)Av = (sigma-squared)v. Thus, if we have Av = (sigma)u, then we have (A-transpose)Av = (A-transpose)((sigma)u), and so (sigma-squared)v = (sigma)(A-transpose)u. Since we have assumed sigma is non-zero, we can divide by sigma to get (A-transpose)u = (sigma)v.

Moreover, by Theorem 10.6.1, if v is a unit eigenvector of (A-transpose)A, then we get that the vector u = (1/sigma)Av is also a unit vector. Therefore, for a non-zero singular value sigma of A, we will want unit vectors v and u such that Av = (sigma)u and (A-transpose)u = (sigma)v. However, our derivation does not work for sigma = 0. In this case, we only need one of these conditions to be satisfied.

Definition: Let A be an m-by-n matrix. If v in Rn and u in Rm are unit vectors, and sigma not equal to 0 is a singular value of A such that Av = (sigma)u and (A-transpose)u = (sigma)v, then we say that u is a left singular vector of A and v is a right singular vector of A. Additionally, if u is a unit vector such that (A-transpose)u = the 0 vector, then u is a left singular vector of A. Similarly, if v is a unit vector such that Av = the 0 vector, then v is a right singular vector of A. This definition of right singular vectors not only preserves our relationship of these vectors with the eigenvectors of (A-transpose)A, but we also get the corresponding result for the left singular vectors and the eigenvectors of A(A-transpose).

## Mimicking Orthogonal Diagonalization

Theorem 10.6.5: Let A be an m-by-n matrix. A vector v in Rn is a right singular vector of A if and only if v is an eigenvector of (A-transpose)A. A vector u in Rm is a left singular vector of A if and only if u is an eigenvector of A(A-transpose).

Now that we have singular values and singular vectors mimicking eigenvalues and eigenvectors, we can mimic orthogonal diagonalization. Let A be an m-by-n matrix. As with the case of singular vectors, if m is not equal to n, then (P-transpose)AP is not defined for a square matrix P. Thus, we want to find an m-by-m orthogonal matrix U and an n-by-n orthogonal matrix V such that (U-transpose)AV = Sigma, where Sigma is an m-by-n “diagonal” matrix. For orthogonal diagonalization, the columns of the matrix P are an orthonormal basis for Rn of eigenvectors of the matrix. In this case, we will need the columns of U to be an orthonormal basis for Rm of left singular vectors of A and the columns of V to be an orthonormal basis for Rn of right singular vectors of A. We now prove that such orthonormal bases always exist.

We begin with a lemma. Lemma 10.6.6: Let A be an m-by-n matrix with rank(A) = r. If {v1 to vn} is an orthonormal basis for Rn consisting of the eigenvectors of (A-transpose)A arranged so that the corresponding eigenvalues lambda1 to lambda_n are arranged from greatest to least, and sigma1 to sigma_n are the singular values of A, then the set {(1/sigma1)Av1 up to (1/sigma_r)Avr} is an orthonormal basis for the columnspace of A.

Proof: By Corollary 10.6.4, we have that A has r non-zero singular values, so sigma1 to sigma_r are all non-zero, and sigma_(r+1) = up to sigma_n = 0. Now observe that for all i and j from 1 to r, with i not equal to j, we have (Avi) dot product (Avj) = ((Avi)-transpose)(Avj), which is (vi-transpose)(A-transpose)Avj, which is equal to, since vj is an eigenvector of (A-transpose)A, (vi-transpose)((lambda_j)vj), which is (lambda_j)((vi-transpose)vj), which is (lambda_j)(vi dot product vj), which is equal to 0 since {v1 to vn} is an orthonormal set. Hence, {Av1 up to Avr} is orthogonal, and so the set {(1/sigma1)Av1 up to (1/sigma_r)Avr} is orthonormal by Theorem 10.6.1. Moreover, we know the dimension of the columnspace of A equals r, and so this set is an orthonormal basis for the columnspace of A.

Theorem 10.6.7: If A is an m-by-n matrix with rank r, then there exists an orthonormal basis {v1 to vn} for Rn of right singular vectors of A and an orthonormal basis {u1 to um} for Rm of left singular vectors of A.

Proof: By Theorem 10.6.5, all eigenvectors of (A-transpose)A are right singular vectors of A. Since (A-transpose)A is symmetric, we can find an orthonormal basis {v1 to vn} for Rn of right singular vectors of A. By Lemma 10.6.6, the left singular vectors ui = (1/sigma_i)Avi for i from 1 to r form an orthonormal basis for the columnspace of A. Also, by definition, a left singular vector uj of A corresponding to sigma = 0 lies in the nullspace of A-transpose. Hence, by the Fundamental Theorem of Linear Algebra, if {u(r+1) up to um} is an orthonormal basis for the nullspace of A-transpose, then {u1 to um} is an orthonormal basis for Rm of left singular vectors of A.

Thus, for any m-by-n matrix A with rank r, we have an orthonormal basis {v1 to vn} for Rn of right singular vectors of A corresponding to the n singular values sigma1 to sigma_n of A and an orthonormal basis {u1 to um} for Rm of left singular vectors of A such that A[v1 to vn] = [Av1 to Avn], which, by definition of singular vectors, is [(sigma1)u1 up to (sigma_n)un]. And so this is equal to the matrix [(sigma)u1 up to (sigma_r)ur, with the remaining columns all being the 0 vector]. And thus, we can factor this as the matrix [u1 to um] times the matrix Sigma, where Sigma is the m-by-n matrix with Sigma_ii = sigma_i for i from 1 to r, and all other entries of Sigma are 0.

Hence, we have that there exists an orthogonal matrix V and an orthogonal matrix U such that AV = U(Sigma). Instead of writing this as (U-transpose)AV = Sigma, we typically write this as A = U(Sigma)(V-transpose) to get a matrix decomposition of A.

## Singular Value Decompostion

Definition: A singular value decomposition of an m-by-n matrix A is a factorization of the form A = U(Sigma)(V-transpose), where U is an orthogonal matrix containing the left singular vectors of A, V is an orthogonal matrix containing right singular vectors of A, and Sigma is an m-by-n matrix with Sigma_ii = sigma_i for i running from 1 to the rank of A, and all other entries of Sigma are 0.

To find a singular value decomposition of a matrix A with rank(A) = r, we just follow what we did above.

- Step 1: Find the eigenvalues lambda1 to lambda_n of (A-transpose)A arranged from greatest to least, and a corresponding set of orthonormal eigenvectors {v1 to vn}.
- Step 2: Let V be the matrix with columns v1 to vn, and let Sigma be the m-by-n matrix whose first r diagonal entries are the non-zero singular values sigma1 to sigma_r of A arranged from greatest to least.
- Step 3: Find the left singular vectors of A by computing ui = (1/sigma_i)Avi for i from 1 to r, and then extend the set {u1 to ur} to an orthonormal basis for Rm, and let U is equal to the matrix [u1 to um].

Then, A = U(Sigma)(V-transpose).

Let’s demonstrate this with a couple of examples.

Example: Find a singular value decomposition of A = [1, -1; -2, 2; 2, -2]. Solution: We have (A-transpose)A = [9, -9; -9, 9], which has eigenvalues (ordered from greatest to least) lambda1 = 18 and lambda2 = 0. We have corresponding orthonormal eigenvectors v1 = [1/(root 2); -1/(root 2)] and v2 = [1/(root 2); 1/(root 2)]. Thus, we take V = [1/(root 2), 1/(root 2); -1/(root 2), 1/(root 2)]. The singular values of A are sigma1 = (root 18) and sigma2 = 0, so we take Sigma to be the matrix which is the same size as A where the first rank(A) diagonal entries of Sigma are the non-zero singular values of A. Thus, in this case, we get Sigma is the 3-by-2 matrix [(the square root of 18), 0; 0, 0; 0, 0].

Next, we compute u1 = (1/sigma1)Av1, which is equal to [1/3; -2/3; 2/3]. And from our work above, the set containing u1 is an orthonormal basis for the columnspace of A. Now, we need to extend this to an orthonormal basis for all of R3. To find u2 and u3, we can find an orthonormal basis for the nullspace of A-transpose. Row reducing A-transpose, we get [1, -2, 2; 0, 0, 0], and hence, a basis for the nullspace of A-transpose is {[2; 1; 0], [-2; 0; 1]}. But we need an orthonormal basis for the nullspace of A-transpose, and so we apply the Gram-Schmidt procedure. We get u2 = [2/(root 5); 1/(root 5); 0] and u3 = [-2/(root 45); 4/(root 45); 5/(root 45)]. Thus, we take U = [u1 u2 u3], and we have a singular value decomposition A = U(Sigma)(V-transpose).

Example: Find a singular value decomposition of B = [2, -4; 2, 2; -4, 0; 1, 4]. Solution: We have (B-transpose)B = [25, 0; 0, 36]. Thus, the eigenvalues are lambda1 = 36 and lambda2 = 25 with corresponding unit eigenvectors v1 = [0; 1] and v2 = [1; 0]. So we take V = [0, 1; 1, 0]. Now, the singular values of B are sigma1 = 6 and sigma2 = 5, so we get Sigma is the matrix [6, 0; 0, 5; 0, 0; 0, 0]. We now compute u1 = (1/sigma1)Bv1, which is [-2/3; 1/3; 0; 2/3], and u2 = (1/sigma2)Bv2, which is [2/5; 2/5; -4/5; 1/5]. Next, we extend the set {u1, u2} to an orthonormal basis for R4 by adding on a basis for the nullspace of B-transpose. Applying the Gram-Schmidt procedure to a basis for the nullspace of B-transpose, we get vectors u3 = [1/3; -2/3; 0; 2/3] and u4 = [8/15; 8/15; 3/5; 4/15]. We let U be the matrix [u1 u2 u3 u4], and then we have singular value decomposition U(Sigma)(V-transpose) of B.

This ends this lecture.