Lesson: Orthogonal Diagonalization

Question 2

1 point

Transcript — Introduction

In the last lecture, we saw that every square matrix with all real eigenvalues is orthogonally triangularizable. In this lecture, we will determine which n-by-n matrices have the very special property that they are orthogonally similar to a diagonal matrix. That is, they have an orthonormal basis for Rn of eigenvectors.

We begin with a definition. Definition: A matrix A is said to be orthogonally diagonalizable if it is orthogonally similar to a diagonal matrix.

As mentioned, our goal is to figure out which matrices are orthogonally diagonalizable. But how do we even start looking for such matrices? A good trick in mathematics to try to find objects which satisfy a property is to work in reverse—that is, to assume that you have the property, and see what conditions this puts on the objects. So, we assume that we have an orthogonally diagonalizable matrix A. Then, by definition, this means that there exists an orthogonal matrix P such that (P-transpose)AP = D is diagonal. We are trying to find a condition on the matrix A, so let’s solve this equation for A to get A = PD(P-transpose). Since A and D are similar, they are very, um, similar. So, we need to think about what special properties D has since it is diagonal, and check to see if A has the same property.

Since D is diagonal, one nice property that it has is that D-transpose = D. Observe that this implies that A-transpose = (PD(P-transpose))-all-transposed. Using properties of transposes, this is ((P-transpose)-transpose)(D-transpose)(P-transpose), which equals PD(P-transpose), which is just A. So the fact that D = D-transpose, and A and D are similar, gives us that A also satisfies A = A-transpose.

We have proven the following theorem. Theorem 10.2.1: If A is orthogonally diagonalizable, then A-transpose = A.

This theorem says that a necessary condition for A to be orthogonally diagonalizable is that A-transpose = A. We want to know if this is, in fact, sufficient. That is, we want to determine if, in fact, every matrix A such that A-transpose = A is orthogonally diagonalizable. This result is, of course, true, and in fact, extremely important.

The Principal Axis Theorem

The Principal Axis Theorem: If A is a matrix such that A-transpose = A, then A is orthogonally diagonalizable.

How are we going to prove this? We don’t seem to have enough information to even to try to diagonalize the matrix, let alone to find an orthonormal basis of eigenvectors for the matrix. The key here is to think of the Triangularization Theorem. However, to use the Triangularization Theorem, we first need to check the hypothesis that the matrix has all real eigenvalues.

Lemma 10.2.2: If A is a matrix such that A-transpose = A, then A has all real eigenvalues. The proof of this lemma requires properties of complex eigenvalues, and so we will delay the proof until Module 11.

We can now very easily prove the Principal Axis Theorem. Proof: By Lemma 10.2.2, A has all real eigenvalues, and so by the Triangularization Theorem, we have that there exists an orthogonal matrix P such that (P-transpose)AP = T is upper triangular. As we did in the proof of Theorem 10.2.1, we show that T must have the same properties as A since they are similar. We get that T-transpose = ((P-transpose)AP)-all-transposed, which, by properties of transposes, is (P-transpose)(A-transpose)P, which equals (P-transpose)AP since A-transpose = A. Hence, we have that T-transpose = T. But since T is upper triangular, taking the transpose gives us a lower triangular matrix, and so T is both upper and lower triangular, and so T is, in fact, diagonal.

Since the condition A-transpose = A is obviously very important, we give it a name. Definition: A matrix A such that A-transpose = A is called symmetric.

We have now proven that a matrix A is orthogonally diagonalizable if and only if it is symmetric. Although our proof of the Principal Axis Theorem is very quick and easy thanks to the Triangularization Theorem, there is one problem with it. It does not give us a good method for actually orthogonally diagonalizing a symmetric matrix. To figure out a good method for orthogonally diagonalizing a symmetric matrix, we prove a couple more theorems.

Theorem 10.2.4: A matrix A is symmetric if and only if (x dot product with Ay) is equal to (Ax dot y) for all vectors x and y in Rn. Proof: Suppose that A is symmetric. Then for any vectors x and y in Rn, we have (x dot Ay) is equal to x-transpose matrix-matrix multiplied by Ay, which is (x-transpose)(A-transpose)y since A is symmetric, which, by using properties of transposes, gives us ((Ax)-transpose)y, which is (Ax dot product y) as required.

On the other hand, if (x dot Ay) = (Ax dot y), then we have (x-transpose)Ay = ((Ax)-transpose)y, which equals (x-transpose)(A-transpose)y. Since this is valid for all vectors y in Rn, we can apply Theorem 3.1.4 to get (x-transpose)A = (x-transpose)(A-transpose). Now taking the transpose of both sides, we get (A-transpose)x = Ax. Now, since this is also valid for all x in Rn, we have that A-transpose = A by Theorem 3.1.4. Poof.

Take a minute to read over the proof, and make sure that you understand all the steps. Review Theorem 3.1.4 if necessary, and make sure that you really understand how it is being applied, and in particular the necessity of taking the transpose at the end.

Theorem 10.2.5: If A is a symmetric matrix with eigenvectors v1 and v2 corresponding to distinct eigenvalues lambda1 and lambda2, then v1 and v2 are orthogonal. Proof: We need to prove that v1 dot v2 = 0. We have (lambda1)(v1 dot v2) = ((lambda1)v1) dot product v2. We have assumed that v1 is an eigenvector corresponding to lambda1, so (lambda1)v1 = Av1, and hence, we have (Av1) dot v2. By Theorem 10.2.4, since A is symmetric, this equals v1 dot product (Av2), which gives v1 dot ((lambda2)v2) since v2 is an eigenvector of A with corresponding eigenvalue lambda2. And finally, we get that this equals (lambda2)(v1 dot v2). Since lambda1 is not equal to lambda2, this is only possible if v1 dot v2 = 0 as required.

This theorem is extremely helpful. It says that eigenvectors corresponding to different eigenvalues of a symmetric matrix A are necessarily orthogonal. Thus, finding an orthonormal basis for eigenvectors of a symmetric matrix should be relatively easy, as the eigenvectors corresponding to different eigenvalues are naturally orthogonal.

Examples

We will now do two examples of orthogonally diagonalizing a symmetric matrix. Example: Orthogonally diagonalize the symmetric matrix A = [4, 0, 0; 0, 1, -2; 0, -2, 1]. Solution: We begin by diagonalizing the matrix like normal, so first, we find and factor the characteristic polynomial, and then find the eigenvalues and corresponding eigenvectors. We have the characteristic polynomial is the determinant of (A – (lambda)I), which equals –(lambda – 4)(lambda – 3)(lambda + 1). Thus, the eigenvalues are lambda1 = 4, lambda2 = 3, and lambda3 = -1, each with algebraic multiplicity 1.

For lambda1 = 4, we get (A – (lambda1)I) = [0, 0, 0; 0, -3, -2; 0, -2, -3], which row reduces to [0, 1, 0; 0, 0, 1; 0, 0, 0]. Thus, a basis for the eigenspace of lambda1 is {[1; 0; 0]}. For lambda2 = 3, we get (A – (lambda2)I) = [1, 0, 0; 0, -2, -2; 0, -2, -2], which clearly row reduces to [1, 0, 0; 0, 1, 1; 0, 0, 0]. And thus, a basis for the eigenspace of lambda2 is {[0; -1; 1]}. Finally, for lambda3 = -1, we get (A – (lambda3)I) = [5, 0, 0; 0, 2, -2; 0, -2, 2]}, which row reduces to [1, 0, 0; 0, 1, -1; 0, 0, 0]. Thus, a basis for the eigenspace of lambda3 is {[0; 1; 1]}.

Observe that, as predicted by Theorem 10.2.5, the eigenvectors [1; 0; 0], [0; -1; 1], and [0; 1; 1] do form an orthogonal set. Hence, if we normalize them—and we must normalize them because we need to get an orthonormal basis of eigenvectors of A so that we can form an orthogonal matrix P which diagonalizes A—then we get that A is diagonalized by the orthogonal matrix P = [1, 0, 0; 0, -1/(root 2), 1/(root 2); 0, 1/(root 2), 1/(root 2)] to D = [4, 0, 0; 0, 3, 0; 0, 0, -1].

It is important for you to be good at diagonalization. You should be able to easily find all eigenvalues of a given matrix, and you should be able to find a basis for the eigenspace of an eigenvalue lambda by just looking at the reduced row echelon form of (A – lambda)I).

Example: Orthogonally diagonalize the symmetric matrix A = [5, -4, -2; -4, 5, -2; -2, -2, 8]. Solution: We have the characteristic polynomial is equal to the determinant of (A – (lambda)I), which is the determinant of |5 – lambda, -4, -2; -4, 5 – lambda, -2; -2, -2, 8 – lambda|. Using elementary row and column operations to simplify the determinant, and then finding the determinant, we get the characteristic polynomial is equal to –lambda((lambda – 9)-squared). Thus, the eigenvalues are lambda1 = 9, with algebraic multiplicity 2, and lambda2 = 0, with algebraic multiplicity 1.

For lambda1 = 9, we get (A – (lambda1)I) = [-4, -4, -2; -4, -4, -2; -2, -2, -1], which row reduces to [1, 1, 1/2; 0, 0, 0; 0, 0, 0]. Thus, a basis for the eigenspace of lambda1 is {w1, w2} = {[-1; 1; 0], [-1; 0; 2]}. Wait—we have a problem. These eigenvectors are not orthogonal to each other. Notice that Theorem 10.2.5 only guarantees that eigenvectors corresponding to different eigenvalues are necessarily orthogonal. But we shouldn’t panic. We know that the eigenspace of lambda1 is a subspace of R3. What we really need is an orthogonal basis for this eigenspace. How could we find that? Yes, that’s right—we use the Gram-Schmidt procedure. By applying the Gram-Schmidt procedure to the basis {w1, w2} for the eigenspace of lambda1, we get an orthogonal basis for the eigenspace of lambda1 is {[-1; 1; 0] and [-1; -1; 4]}.

For lambda2 = 0, we get (A – (lambda2)I) is [5, -4, -2; -4, 5, -2; -2, -2, 8], which row reduces to give [1, 0, -2; 0, 1, -2; 0, 0, 0]. Thus, a basis for the eigenspace of lambda2 is {[2; 2; 1]}. Notice, it is easy to check that this vector is, in fact, orthogonal to the orthogonal basis for the eigenspace of lambda1, as predicted by Theorem 10.2.5. Hence, we now do have an orthogonal basis for R3 of eigenvectors of A. Normalizing these vectors, we take P = [2/3, -1/(root 2), -1/(root 18); 2/3, 1/(root 2), -1/(root 18); 1/3, 0, 4/(root 18)], and we get that (P-transpose)AP = D, which is [0, 0, 0; 0, 9, 0; 0, 0, 9].

These examples demonstrate that the procedure for orthogonally diagonalizing a symmetric matrix is exactly the same as normal diagonalization, except for a couple of key points. First, if we have a symmetric matrix A, then we automatically know by the Principal Axis Theorem that A is diagonalizable. By understanding the theory of diagonalization, this can lead to some shortcuts in finding eigenvalues, and it can help you in finding some computational errors. Second, Theorem 10.2.5 only guarantees that eigenvalues corresponding to different eigenvectors are orthogonal. Whenever we get an eigenvalue of geometric multiplicity greater than 1, remember to, if necessary, apply the Gram-Schmidt procedure to the basis for that eigenspace to get an orthogonal basis for the eigenspace. Third, when making the diagonalizing matrix P, make sure that you do have an orthonormal basis of eigenvectors for Rn of A. In particular, make sure that you remember to normalize the eigenvectors.

With enough practice, these should become very easy. Note that much of the rest of the course is about orthogonal diagonalization and its uses, so it is very important that you do practice enough.

This ends this lecture.

© University of Waterloo and others, Powered by Maplesoft