## Transcript

A matrix A is symmetric if A-transpose = A, or equivalently, if aij = aji for all i and j. Note that the definition of a symmetric matrix requires that it be a square matrix, since if A is an m-by-n matrix, then A-transpose is an n-by-m matrix, so A-transpose = A would mean that m = n.

Here are some examples of symmetric matrices.

A matrix A is said to be orthogonally diagonalizable if there exists an orthogonal matrix P and a diagonal matrix D such that (P-transpose)AP = D. Now, recall that if P is an orthogonal matrix, then P-transpose = P-inverse, so this definition is just the same as saying that there is an orthogonal matrix P that diagonalizes A.

But before we prove that every symmetric matrix is orthogonally diagonalizable, we will do some examples and some practice problems of diagonalizing symmetric matrices, as this hands-on work will help us understand the theoretical proof.

Let’s diagonalize the symmetric matrix A = [1, 2; 2, -2]. To do this, we first need to find the eigenvalues, which means we need to find the roots of the characteristic polynomial (determinant of (A – (lambda)I)). So the matrix A – (lambda)I is [1 – lambda, 2; 2, -2 – lambda]. If we take the determinant of this matrix, it will be (1 – lambda)(-2 – lambda) – 2(2). Multiplying this out, this equals -6 + lambda + lambda-squared, which factors as (2 – lambda)(-3 – lambda). So we see that the eigenvalues for A are lambda1 = 2 and lambda2 = -3.

Our next step is to find a basis for the eigenspaces. For lambda1 = 2, we’ll need to find the nullspace of (A – lambda1)I, which is [1 – lambda1, 2; 2, -2 – lambda1], which equals [-1, 2; 2, -4]. Row reducing, we see that this matrix will be row equivalent to the matrix [1, -2; 0, 0], so our nullspace consists of all solutions to the equation x1 – 2x2 = 0. If we replace the variable x2 with the parameter s, then we have that x1 = 2s and x2 = s, so the general solution is (s times the vector [2; 1]), which means that the single vector set {[2; 1]} is a basis for the eigenspace for lambda1.

For lambda2 = -3, we need to find the nullspace of (A – (lambda2)I), which is [1 – lambda2, 2; 2, -2 – lambda2], which equals [4, 2; 2, 1]. Row reducing, we can quickly see that this matrix is row equivalent to the matrix [2, 1; 0, 0], so our nullspace consists of all solutions to the equation 2x1 + x2 = 0. If we replace the variable x1 with the parameter s, then we have that x1 = s and x2 = -2s, so the general solution to our equation is s[1; -2]. This means that the set containing only the vector [1; -2] is a basis for the eigenspace for lambda2.

Now, our study of diagonalization—specifically, the Diagonalization Theorem—tells us that A can be diagonalized by a matrix P whose columns are the basis vectors for the eigenspaces. That is, we know that the matrix P = [2, 1; 1, -2] is an invertible matrix such that (P-inverse)AP = D, where D will be the matrix [lambda1, 0; 0, lambda2], which, in our case, is [2, 0; 0, -3].

So that settles the issue of A being diagonalizable, but what about being orthogonally diagonalizable? A quick check shows that the columns of P are orthogonal, but not orthonormal. What if we normalize these vectors?

Well, since the set containing the vector [2; 1] is a basis for the eigenspace for lambda1, we also have that the set containing s[2; 1] would be a basis for the eigenspace for lambda1 for any scalar s, which means, for example, that the set containing the vector [2/(root 5); 1/(root 5)] is a basis for the eigenspace for lambda1. And similarly, the set containing the vector [1/(root 5); -2/(root 5)] is a basis for the eigenspace for lambda2. And going back to our knowledge of diagonalization, this means that
the matrix Q whose first column is [2/(root 5); 1/(root 5)], our basis vector for lambda1, and whose second column is [1/(root 5); -2/(root 5)], our basis vector for lambda2, is an invertible matrix such that (Q-inverse)AQ = D, where D is still [2, 0; 0, -3]. Moreover, we see that Q is orthogonal, so we have shown that A is orthogonally diagonalizable.

Let’s look at a bigger example. We’ll try to diagonalize the symmetric matrix A = [1, -2, -2; -2, 1, 2; -2, 2, 1]. To do this, we first need to find the eigenvalues, which means we need to find the roots of the characteristic polynomial, of the determinant of (A – (lambda)I). So first, we’ll subtract lambda from our diagonal elements, and then proceed to compute the determinant. By expanding along the first row, we’ll see that we get (1 – lambda) times the determinant of the submatrix [1 – lambda, 2; 2, 1 – lambda], and then we get minus a -2 times the determinant of the submatrix [-2, 2; -2, 1 – lambda], and then plus -2 times the determinant of the submatrix [-2, 1 – lambda; -2, 2]. If we multiply all of this out, we’ll see that our characteristic polynomial is 5 + 9(lambda) + 3(lambda-squared) – (lambda-cubed). A quick check shows that lambda = -1 is a root of this equation. If we factor out a (-1 – lambda) term, we are left with the polynomial -5 – 4(lambda) + (lambda-squared). This factors as (-1 – lambda)(5 – lambda). So the characteristic polynomial for A is ((-1 – lambda)-quantity-squared)(5 – lambda), and this means that the eigenvalues for A are lambda1 = -1 and lambda2 = 5.

Now, the next step is to find a basis for the eigenspaces. For lambda1 = -1, we need to find the nullspace of our matrix (A – (lambda1)I), which is the matrix [2, -2, -2; -2, 2, 2; -2, 2, 2]. If we row reduce this matrix, we see that it is row equivalent to the matrix [1, -1, -1; 0, 0, 0; 0, 0, 0]. The nullspace of this matrix will consist of all solutions to the equation x1 – x2 – x3 = 0. If we replace the variable x2 with the parameter s, and the variable x3 with the parameter t, then we have that x1 = s + t, x2 = s, and x3 = t, so our general solution can be written as (s times the vector [1; 1; 0]) + (t times the vector [1; 0; 1]). This means that the set containing the vectors [1; 1; 0] and [1; 0; 1] is a basis for the eigenspace for lambda1.

For our other eigenvalue, lambda2 = 5, we again need to find the nullspace of the matrix (A – (lambda2)I), which, in this case, is the matrix [-4, -2, -2; -2, -4, 2; -2, 2, -4]. This row reduction is a little more complicated, so I’ll show some of the intermediate steps, but we will find that this matrix is row equivalent to the matrix [1, 0, 1; 0, 1, -1; 0, 0, 0]. The nullspace of this matrix will consist of all solutions to the system x1 + x3 = 0, x2 – x3 = 0. If we replace the variable x3 with the parameter s, then we have that x1 = -s, x2 = s, and x3 = s. So we can write the general solution of this system as (s times the vector [-1; 1; 1]), which means that the set containing the vector [-1; 1; 1] is a basis for the eigenspace for lambda2.

Again, our study of diagonalization tells us that A can be diagonalized by a matrix P whose columns are the basis vectors for the eigenspaces. That is, we know that this matrix P = [1, 1, -1; 1, 0, 1; 0, 1, 1] is an invertible matrix such that (P-inverse)AP = D, where D is the matrix whose diagonal entries are lambda1, lambda1, and lambda2, or in our case, [-1, 0, 0; 0, -1, 0; 0, 0, 5].

Now, again, we’ve shown that A is diagonalizable, but what about being orthogonally diagonalizable? In this particular case, we do not even have that the columns of P are orthogonal, much less orthonormal. We could apply the Gram-Schmidt procedure to the columns of P, but our theory of diagonalization requires that the columns of P be basis vectors for the corresponding eigenspaces, not simply any basis vector for R3. So as we did in the previous example, we will go back to our eigenspaces and find orthonormal bases for them.

Looking at the eigenspace for lambda2 first, since it is the span of a single vector, we simply need to normalize this vector to get an orthonormal basis. And so we see that the set containing only the vector [-1/(root 3); 1/(root 3); 1/(root 3)] is an orthonormal basis for the eigenspace for lambda2.

But the eigenspace for lambda1 has two basis vectors, and these basis vectors were not orthogonal, so we will need to use the Gram-Schmidt procedure to find first an orthogonal, and then an orthonormal, basis for our eigenspace. We keep our first vector as [1; 1; 0], and then using the Gram-Schmidt procedure, our second vector will become [1; 0; 1] – ((([1; 0; 1] dotted with [1; 1; 0])/((the norm of [1; 1; 0])-squared)) times the vector [1; 1; 0]). That is to say, it’s the vector [1; 0; 1] – (1/2)[1; 1; 0], which equals [1/2; -1/2; 1]. So we know that the set {[1; 1; 0], [1/2; -1/2; 1]} is an orthogonal basis for our, the eigenspace, and then by normalizing the vectors, we see that the set {[1/(root 2); 1/(root 2); 0], [1/(root 6); -1/(root 6); 2/(root 6)]} is an orthonormal basis for the eigenspace for lambda1.

Now, again, returning to our knowledge of diagonalization, we know that the matrix Q whose columns are these new basis vectors, so the first column can be the basis vector [1/(root 2); 1/(root 2); 0], the second column can be our second basis vector for lambda1, which is [1/(root 6); -1/(root 6); 2/(root 6)], and lastly, the third column can be our basis vector from lambda2, the [-1/(root 3); 1/(root 3); 1/(root 3)]. So again, our theory tells us that this will diagonalize A, but have we, in fact, ended up with a Q that is orthonormal? Well, a quick check verifies that Q is, in fact orthonormal. We’ve already made sure that every, all the vectors are normal vectors, and a quick look shows that they are, in fact, orthogonal as well. So we have shown that A is orthogonally diagonalizable.

Again, I will be assigning the problems on showing that various similar matrices are orthogonally diagonalizable before we prove that this is always the case, as I feel that working hands-on with these matrices will help motivate the steps in the proof. At this point, all you need to know is that if you find an orthonormal basis for all of the eigenspaces, then you will end up with an orthogonal matrix P to diagonalize A.