## Transcript

The goal of this lecture is to show that every symmetric matrix is orthogonally diagonalizable. Now, we spent time in the last lecture looking at the process of finding an orthogonal matrix P to diagonalize a symmetric matrix A. During the process, we can guarantee that we can find an orthonormal basis for each of the eigenspaces for A, but in the end, we always found that making a matrix P from these orthonormal basis vectors created an orthogonal matrix.

Now, since all of the vectors from our orthonormal bases would have norm 1, this feature of the columns of P is not surprising. It is also obvious that the columns of P from the same eigenspace would be orthogonal, as they are part of the same orthonormal basis. Now, the thing that separates our situation from any random diagonalization process is that the vectors from one eigenspace are orthogonal to the vectors from another eigenspace. In order to prove this crucial fact, we first want to show the following.

An n-by-n matrix A is symmetric if and only if x dot (Ay) = (Ax) dot y for all x and y in Rn.

To prove this lemma, we will need to prove both directions of the if-and-only-if statement. So first, let’s look at the direction that says that if A is symmetric, then x dotted with (Ay) will equal (Ax) dotted with y for all x and y in Rn. So we will suppose that A is symmetric, and let x and y be vectors from Rn. Then we can rewrite the dot product x dot (Ay) as the matrix product (x-transpose)(Ay). Next, we use the fact that since A is symmetric, A-transpose = A, to get that x dot (Ay) = (x-transpose)(Ay), which equals (x-transpose)(A-transpose)y. Now regrouping, we can see that (x-transpose)(A-transpose)y = ((Ax)-quantity-transpose)y, which we can write as the dot product (Ax) dot y. So we have, in fact, shown that x dotted with (Ay) = (Ax) dotted with y, as desired.

Now we need to prove the other direction of the if-and-only-if statement, which is to say that if x dot (Ay) = (Ax) dot y for all x and y in Rn, then the matrix A is symmetric. So this time, we will suppose that x dot (Ay) = (Ax) dot y for all x and y in Rn. As before, we can rewrite these dot products as matrix products, getting that (x-transpose)(Ay) = ((Ax)-quantity-transpose)y for all x and y in Rn. What if we were to set a matrix B = (x-transpose)A, and the matrix C = the quantity (Ax)-transpose? Well, then our equality says that By will equal Cy for all y in Rn, and by a theorem way back, we know that this must mean that B = C. So that is, we’ve shown that (x-transpose)A = (Ax)-quantity-transpose. Now taking the transpose of both sides, we get that the transpose of ((x-transpose)A) equals the transpose of (Ax)-transpose, which, of course, equals Ax. So we’ve shown that Ax = ((x-transpose)A)-quantity-transpose, but this equals (A-transpose)((x-transpose)-transpose), which is simply (A-transpose)x. Now since Ax = (A-transpose)x for all x in Rn, then again, by use of our matrix property, we know that A = A-transpose, and thus, we have shown that A is symmetric.

I’m sure this lemma seems a bit weird out of context, but we will, in fact, use this lemma to prove our theorem.

If v1 and v2 are eigenvectors of a symmetric matrix A corresponding to distinct eigenvalues lambda1 and lambda2, then v1 is orthogonal to v2. Again, this simply states that if you take eigenvectors from different eigenspaces, then they will be orthogonal in the case that A is a symmetric matrix.

Let’s look at the proof of this theorem. So by the definition of eigenvalues and eigenvectors, we have that Av1 = (lambda1)v1 and Av2 = (lambda2)v2. Now, our goal is to show that v1 dot v2 = 0. Well, in what we’ll call a leap of creativity, what we’re actually going to show is that (lambda1 – lambda2)(v1 dot v2) = 0. Now, note that this is equivalent since we needed lambda1 not equal to lambda2, which means that the value (lambda1 – lambda2) is not 0, so this product of (lambda1 – lambda2) with (v1 dot v2) will equal 0 if and only if (v1 dot v2) itself equals 0.

Now, to show that our product (lambda1 – lambda2)(v1 dot v2) = 0, we will first note that (lambda1)(v1 dot v2) = ((lambda1)v1) dotted with v2, which equals (Av1) dotted with v2. Meanwhile, lambda2 times the product (v1 dot v2) equals v1 dotted with ((lambda2)v2), which equals v1 dotted with (Av2). Now this is where our lemma comes in because we know that (Av1) dotted with v2 equals v1 dotted with (Av2). And so we see that (lambda1)(v1 dotted with v2) = (lambda2)(v1 dotted with v2), which means that ((lambda1)(v1 dotted with v2)) – ((lambda2)(v1 dotted with v2)), which is, of course, (lambda1 – lambda2)(v1 dot v2), must equal 0 as desired.

Now, this theorem is an important piece of why symmetric matrices are orthogonally diagonalizable. But if we forget about the “orthogonally” part for a minute, it is pretty amazing just to know that every symmetric matrix is diagonalizable. And in order for this to be true, we need to know the following.

If A is symmetric, then all of the eigenvalues of A are real.

Now, the proof of this theorem requires properties of something known as the complex numbers, which we will be studying after this, so we will postpone the proof for now. But at this point, we have the major building blocks for our proof, so now we need to put them together. We’ll begin with the following lemma.

Suppose that lambda1 is an eigenvalue of the n-by-n symmetric matrix A with corresponding unit eigenvector v1. Then there is an orthogonal matrix P whose first column is v1 such that (P-transpose)AP = [lambda1, O(1, n-1); O(n-1, 1), A1], where A1 is an (n-1)-by-(n-1) symmetric matrix, and Om,n is the m-by-n 0 matrix. That is to say, (P-transpose)AP is a matrix whose first entry is lambda1, the remaining entries in the first row and first column are all zeroes, and the (n-1)-by-(n-1) submatrix we get by removing the first row and first column is a symmetric matrix.

Now let’s prove our lemma. First, we know that we can extend the set containing only v1 to a basis {v1 through vn} for Rn, and then we can use the Gram-Schmidt procedure on this basis to get an orthonormal basis {w1 through wn} for Rn. But since v1 is already a unit vector, the Gram-Schmidt procedure will set w1 = v1, so B = {v1, then w2 through wn} is an orthonormal basis for Rn.

Let P be the matrix whose columns are v1, w2, through wn. First, we need to note that (P-transpose)AP is a symmetric matrix, since the transpose of ((P-transpose)AP) would equal (P-transpose)(((P-transpose)A)-transpose), which equals (P-transpose)A((P-transpose)-transpose), which is, of course, (P-transpose)AP. But what is the matrix (P-transpose)AP? Well, if we multiply it out, we get that (P-transpose)AP will be (the matrix whose rows are v1, w2, through wn) times A times (the matrix whose columns are v1, w2, through wn). Well, this becomes the product of [v1-transpose; w2-transpose; through wn-transpose][Av1, Aw2, through Awn]. And this becomes [(v1-transpose)(Av1), (v1-transpose)(Aw2), so on and so on], but we can rewrite those matrix products as dot products. So we see that our first entry is v1 dotted with (Av1). Our 1,2 entry would be v1 dotted with (Aw2). Our 1,n entry would be v1 dotted with (Awn). Now, our 2,1 entry would be w2 dotted with (Av1), and so forth, so that in general, your m,n entry would be wm (or v1 if it was a 1) dotted with (Awn).

Now, since v1 is a unit eigenvector of lambda1, we know that v1 dotted with (Av1) equals v1 dotted with ((lambda1)v1), which equals (lambda1)(v1 dot v1), but again, v1 dot v1 must equal 1 because v1 is a unit vector, so this equals lambda1. So we’ve shown that our first entry of (P-transpose)AP is lambda1. Well, what about the rest of the first column? Well, since B is orthonormal, we know that wi dotted with (Av1), which equals wi dotted with ((lambda1)v1), which is (lambda1)(wi dotted with v1), must equal 0 because wi dotted with v1 equals 0. So again, this means that all the i,1 entries, or the first, remaining first column of (P-transpose)AP, are all 0. And since we already know that (P-transpose)AP is symmetric, this tells us that our remaining 1,i entries, or the rest of our first row, also has 0 entries. And so, we know that (P-transpose)AP has first entry lambda1, the rest of the first row is 0, the rest of the first column is 0, and because (P-transpose)AP is symmetric, the remaining submatrix A1 must also be symmetric.

So this lemma lets us take the first step in finding P, and then leaves us with the problem of diagonalizing the smaller matrix A1. If we repeat this process on the increasingly smaller matrices, it will come to an end. And so with this in mind, we finally prove our main result, using induction on the size of the symmetric matrix A.

This theorem is known as the Principal Axis Theorem. Suppose A is an n-by-n symmetric matrix. Then there exists an orthogonal matrix P and a diagonal matrix D such that (P-transpose)AP = D. That is, every symmetric matrix is orthogonally diagonalizable.

As mentioned before, the proof is by induction on n, the size of our symmetric matrix A. The base case is when n = 1, which means that A is the single-element matrix whose element is a. Well, then A is diagonalized by the orthogonal matrix P whose only element is 1 to (P-transpose)AP, which is [1][a][1], which is [a].

For our induction step, this will get more interesting. Suppose we know that all (n-1)-by-(n-1) symmetric matrices are orthogonally diagonalizable, and let A be an n-by-n symmetric matrix. Now, by Theorem 8.1.3, we know that A has a real eigenvalue lambda1 because all of its eigenvalues are real, and we’ll let v1 be a unit eigenvector corresponding to lambda1. Now, by the lemma we just proved, we know that there is an orthogonal matrix R whose columns are v1, w2, through wn such that (R-transpose)AR will equal [lambda1, followed by all zeroes; all zeroes underneath lambda1, and then a symmetric submatrix A1].

Now, by our inductive hypothesis, we know that there is an (n-1)-by-(n-1) orthogonal matrix P1 and an (n-1)-by-(n-1) diagonal matrix D1 such that (P1-transpose)A1P1 = D1. Let’s use this to define the n-by-n matrix P2 as follows. We’ll let P2 be the matrix whose 1,1 entry is a 1, the remaining entries in both the first row and first column will all be zeroes, and then the remaining (n-1)-by-(n-1) submatrix will be P1.

Now, note that the first column of P2 is the vector e1, the vector [1; 0; 0; 0; 0; 0; 0; 0], so it is a unit vector. Moreover, all of the other columns of P2 are also unit vectors, as their norm is not changed from P1 by adding a 0 as their first entry. Moreover, all the columns of P are orthogonal, either inherited from P2 or because they have no non-zero entries in common with the first column. As such, P1 is also an orthogonal matrix.

Now, since the product of orthogonal matrices is orthogonal, the matrix P = RP2 is an n-by-n orthogonal matrix. Now let’s look at what P does for us. If we look at (P-transpose)AP, well, this will equal ((RP2)-quantity-transpose)A(RP2). Well, this is the same as (P2-transpose)((R-transpose)AR)P2. That is to say that this is P2-transpose times our matrix (R-transpose)AR, which has a lambda1 for its 1,1 entry, followed by all zeroes in the rest of the row, followed by all zeroes in the rest of the column, and has this remaining (n-1)-by-(n-1) submatrix of A1, and then we’ll multiply by P2 on the right.

Well, let’s take a closer look at what P2-transpose and P2 are. Well, P2-transpose is going to be a matrix with the 1,1 entry of 1, followed with the remaining first row all zeroes and the remaining first column all zeroes, and its submatrix will be a P1-transpose. And then we’ll multiply by ((R-transpose)AR), and then by P2, which is the matrix with a 1 in its first entry, with the remaining first row and first column all zeroes, and the remaining (n-1)-by-(n-1) submatrix being P1. So if we multiply our (R-transpose)AR times the P2 on the right, that becomes a matrix whose first entry is lambda1, we still get that the remaining first row and first columns are all zeroes, but now our (n-1)-by-(n-1) submatrix is going to be the product A1P1. And similarly, when we now multiply by our P2-transpose on the left, we still end up with a first entry of lambda1, the rest of the first row and the rest of the first column are still all zeroes, and our (n-1)-by-(n-1) submatrix is (P1-transpose)A1P1. But of course, the whole point in picking that P1 was that it had the fact that (P1-transpose)A1P1 was the diagonal matrix D1. So we’ve seen that (P-transpose)AP is a matrix whose 1,1 entry is lambda1, the rest of the first row and first column are all zeroes, and the remaining (n-1)-by-(n-1) submatrix is a diagonal matrix, D1. Well, this means that, of course, the entire large matrix is now a diagonal matrix, so this is our diagonal matrix D. And that ends the proof of the Principal Axis Theorem.