## Transcript — Introduction

In Linear Algebra I, and from earlier in this course, we saw that diagonalization finds a basis B for a linear operator L such that the matrix of L with respect to the basis B is diagonal. That is, diagonalization finds a geometrically natural basis for L.

Hmm, what could be better than a geometrically natural basis? An orthonormal geometrically natural basis. Although this would be really nice to have, we have seen that not every matrix is diagonalizable, which means that not every linear operator has a geometrically natural basis. Thus, we would expect that even fewer will have an orthonormal geometrically natural basis. Over the next two lectures, we will look at the theory of orthogonal similarity and orthogonal diagonalization, and find which matrices are orthogonally diagonalizable.

We begin with a definition. Two matrices A and B are said to be orthogonally similar if there exists an orthogonal matrix P such that (P-transpose)AP = B. Notice that since P is orthogonal, we have that P-transpose = P-inverse. Hence, if A and B are orthogonally similar, then they are similar. Therefore, all the properties of similar matrices still hold. That is, rank A = rank B, trace A = trace B, determinant A = determinant of B, and A and B have the same eigenvalues.

Although, as discussed, most matrices are not orthogonally similar to a diagonal matrix, the following theorem shows us the amazing result that every matrix with all real eigenvalues is orthogonally similar to an upper triangular matrix. The Triangularization Theorem: If A is an n-by-n matrix with all real eigenvalues, then A is orthogonally similar to an upper triangular matrix T.

The proof of this theorem is a little complex. Take your time and work through it slowly. After I give the proof, I will demonstrate how the proof works with an example. Proof: We prove the result by induction on n. If n = 1, then A is upper triangular, and therefore, we can take P to be the orthogonal matrix P = [1], and the result follows. Now, assume the result holds for all (n – 1)-by-(n – 1) matrices, and consider an n-by-n matrix A with all real eigenvalues. Since the only thing we know about the matrix A is that it has all real eigenvalues, let’s pick one of them. So let lambda1 be an eigenvalue of A with corresponding unit eigenvector v1. We are trying to construct an orthogonal matrix P, and so we will need an orthonormal basis for Rn, so let’s extend the set containing v1 to a basis for Rn, and apply the Gram-Schmidt procedure to produce an orthonormal basis {v1, w2, up to wn} for Rn.

What do we know about the relationship of the vectors w2 to wn and the matrix A? We know absolutely nothing. Because we could have extended the set {v1} to a basis for Rn in infinitely many different ways, there is no relationship between the vectors w2 to wn and the matrix A.

So we now have the matrix P1 defined to be the matrix with columns v1, w2, up to wn is orthogonal, and calculating (P1-transpose)AP1, we get the matrix [(v1 dot product Av1), (v1 dot product Aw2), up to (v1 dot product Awn); (w2 dot product Av1), (w2 dot product Aw2), up to (w2 dot product Awn); all the way down to (wn dot product Av1), (wn dot product Aw2), up to (wn dot product Awn)]. Let’s now consider the entries in the first column. Since v1 is an eigenvector of A with corresponding eigenvalue lambda1, we have Av1 = (lambda1)v1, and so we get (wi dot product Av1) is equal to (wi dot product (lambda1)v1), which equals (lambda1)(wi dot product v1), which equals 0 since {v1, w2, up to wn} is orthogonal. Similarly, we find that (v1 dot product Av1) is equal to (lambda1)(v1 dot product v1), which equals lambda1 since v1 is a unit vector.

Since there is no relationship between the matrix A and the vectors w2 to wn, we know absolutely nothing about the entries in the rest of the matrix. So, we will represent the rest of the first row of the matrix (P1-transpose)AP1 by b-transpose, and the remaining (n – 1)-by-(n – 1) submatrix by A1.

We have that A is orthogonally similar to the matrix [lambda1, b-transpose; the 0 vector, A1], so all the eigenvalues of A1 are eigenvalues of A, and hence, all the eigenvalues of A1 are real. Therefore, by the inductive hypothesis, there exists an (n – 1)-by-(n – 1) orthogonal matrix Q such that (Q-transpose)(A1)Q = T1 is upper triangular.

We now need to somehow combine this with our previous matrix P1 so that we can show that, in fact, A is orthogonally similar to an upper triangular matrix. The problem is that our matrix Q is (n – 1)-by-(n – 1), so how can we combine that with an n-by-n matrix P1? The trick is to make Q larger by adding in a row and column of 1 and all zeroes. To do this, we let the matrix P2 be the matrix with first row 1 and all zeroes, and first column 1 and all zeroes, and the remaining (n – 1)-by-(n – 1) part be the matrix Q. Since the columns of Q are orthonormal, the columns of P2 are also orthonormal, and hence, P2 is an orthogonal matrix. Consequently, the matrix P = P1P2 is also orthogonal by Theorem 9.2.7.

Then, by using block multiplication, we get (P-transpose)AP is equal to the matrix with first column lambda1 and all zeroes, then (b-transpose)Q, T1. This is upper triangular since T1 is upper triangular, and so the result follows by induction.

Note, if A is orthogonally similar to an upper triangular matrix T, then A and T must share the same eigenvalues. Thus, since T is upper triangular, the eigenvalues of A must appear along the main diagonal of T. Notice that since the proof is by construction, it gives us a method for finding an orthogonal matrix P such that (P-transpose)AP = T is upper triangular. However, since the method is by induction, this leads to a recursive algorithm. We demonstrate this with an example. As I do the example, you should follow along carefully with the steps in the proof, as this will really help you understand the proof of the theorem.

## Example

Example: Let A be the matrix [-1, 3, -1; 0, 1, 2; 0, 2, 1]. Find an orthogonal matrix P and upper triangular matrix T such that (P-transpose)AP = T. Solution: The first step of the proof tells us to pick a real eigenvalue of A, and corresponding unit eigenvector v1, so we observe, from our knowledge of diagonalization, that -1 is an eigenvalue of A with unit eigenvector e1 = [1; 0; 0]. Next, the proof tells us to extend the set containing e1 to an orthonormal basis for R3. Since e1 is just a standard basis vector, we can extend the set containing e1 to the standard basis for R3, and so we take P1 to be the matrix I. Thus, (P1-transpose)AP1 is equal to the matrix -1 with zeroes beneath it, we call the rest of the first row b-transpose, and the remaining 2-by-2 submatrix A1. That is, we have b = [3; -1], and A1 = [1, 2; 2, 1].

We now repeat the procedure on the submatrix A1. So, calculating its eigenvalues, we find that a real eigenvalue is 3, with corresponding unit eigenvector v1 = [1/(root 2); 1/(root 2)]. Again, we extend the set containing v1 to an orthonormal basis for R2. In this case, we clearly see we can take v2 = [1/(root 2); -1/(root 2)]. Hence, our matrix Q is [1/(root 2), 1/(root 2); 1/(root 2), -1/(root 2)].

And so next, according to the proof, we take P2 to be the matrix [1, 0, 0; 0, 1/(root 2), 1/(root 2); 0, 1/(root 2), -1/(root 2)]. As mentioned in the proof, we can see that this is clearly orthogonal. Finally, we take P = P1P2, which just equals to P2 since P1 is the identity matrix. We find that (P-transpose)AP = [-1, (root 2), 2(root 2); 0, 3, 0; 0, 0, -1].

I will conclude this lecture with one final remark. We have seen a few times previously in this course that we can use examples to help us figure out proofs. This is a very good example of the opposite. That is, we could use the proof to teach us how to solve the computational problem.

This ends this lecture.