# Lesson: The Cayley-Hamilton Theorem

1 point

## Transcript — Introduction

To conclude the course, we will look at the very interesting—and surprising—Cayley-Hamilton Theorem. I feel this is a very good demonstration of two things—first, that math is awesome, and second, some of what research mathematicians do.

Consider the matrix A = [1, 2; 3, 2]. We can find that the characteristic polynomial of A is C(lambda) = lambda-squared – 3(lambda) – 4. Now Cayley, being a great mathematician, did what great mathematicians do: he played. Being familiar with the idea of substituting objects in place of scalars from other areas, he decided to try something. He decided to see what would happen if he substituted a matrix A for lambda in its characteristic polynomial. For our matrix A, we get A-squared – 3A – 4, except we better multiply the 4 by the identity matrix so that we’re just adding and subtracting matrices. So calculating this, we get [7, 6; 9, 10] – 3[1, 2; 3, 2] – [4, 0; 0, 4]. So calculating this equals 7 – 3 – 4 is 0, 6 – 3(2) – 0 is 0, 9 – 3(3) – 0 is 0, and 10 – 3(2) – 4 is 0. So A is a root of its characteristic polynomial. Wow! Of course, we now prove that this is true for every matrix A.

The Cayley-Hamilton Theorem: If C(lambda) is the characteristic polynomial of an n-by-n matrix A, then C(A) = the 0 matrix. Proof: Let C(lambda) = ((-1)-to-the-n)(lambda-to-the-n) + (c(n-1))(lambda-to-the-(n-1)) + up to c1(lambda) + c0. We now consider the polynomial C(X) = ((-1)-to-the-n)(X-to-the-n) + (c(n-1))(X-to-the-(n-1)) + up to c1X + c0(the identity matrix), whose argument is a matrix X.

We need to prove that C(A) = the 0 matrix. But how are we going to do this when we know nothing about the individual entries of A? Moreover, since the characteristic polynomial gives the eigenvalues of A, we obviously need to somehow relate A to its eigenvalues. Aha! This makes us think of Schur’s Theorem. Schur’s Theorem will give us an upper triangular matrix T. For this matrix T, we have lots of information about the entries. We know that almost half of the entries of T are 0, and more importantly, the diagonal entries of T are the eigenvalues of A.

So, by Schur’s Theorem, there exists a unitary matrix U and upper triangular matrix T such that (U-star)AU = T. Now it should be much easier to show that T satisfies C(T) = the 0 matrix, and once we have done this, we will show that since A is similar to T, then it has this same property as T.

Since the eigenvalues lambda1 to lambda_n of A are the diagonal entries of T, we have T has the form [lambda1, t12 to t1n; 0, lambda2, up to t2n; all the way down to all zeroes and lambda_n]. Moreover, we know that the roots of the characteristic polynomial are the eigenvalues of A, and so we can factor C(X) as C(X) = (X – (lambda1)I)(X – (lambda2)I)(times up to)(X – (lambda_n)I). Therefore, C(T) = (T – (lambda1)I)(T – (lambda2)I)(times up to)(T – (lambda_n)I). We will show that C(T) is the 0 matrix by showing that the first k columns of (T – (lambda1)I)(times up to)(T – (lambda_k)I) only contain zeroes for k from 1 to n.

First, observe that the first column of T – (lambda1)I is 0 since the first column of T is [lambda1 and all zeroes]. Assume that all entries of the first k columns of (T – (lambda1)I)(times up to)(T – (lambda_k)I) only contain zeroes. Then it is easy to verify that the first k+1 columns of (T – (lambda1)I)(times up to)(T – (lambda_k)I)(T – (lambda_(k+1))I) only contain zeroes by just looking at the definition of matrix-matrix multiplication. Thus, by induction, C(T) is equal to the 0 matrix.

We now observe that since A = UT(U-star), we have C(A) = C(UT(U-star)). Plugging this into the definition of the characteristic polynomial, this equals ((-1)-to-the-n)((UT(U-star))-to-the-n) + up to c1(UT(U-star)) + c0(the identity matrix). Using the fact that U is unitary, we can write this as ((-1)-to-the-n)U(T-to-the-n)(U-star) + up to c1UT(U-star) + c0U(U-star). Factoring out the U on the left and the U-star on the right, this is equal to U(((-1)-to-the-n)(T-to-the-n) + up to c1T + c0(the identity matrix))(U-star), which is U(C(T))(U-star), which is the 0 matrix as required.

It is not uncommon for students to have problems understanding the inductive step part of this proof. If you follow through the steps of the proof with a simple 3-by-3 matrix, then it will be a lot more clear.

A natural question to ask is if the converse of the Cayley-Hamilton Theorem is true. That is, if p(x) is a polynomial such that p(A) = the 0 matrix, then must p(x) be the characteristic polynomial of A? Unfortunately, the answer is no. The converse of the Cayley-Hamilton Theorem is not true. For example, the matrix A = [0, 0, 1; 0, 0, 0; 0, 0, 0] satisfies A-squared = the 0 matrix, but the characteristic polynomial of A is not lambda-squared—it is C(lambda) = lambda-cubed.

## Inverse of a Matrix

The Cayley-Hamilton Theorem has a variety of uses. To conclude the course, we will look at one of these. What does the Cayley-Hamilton Theorem say about the inverse of a matrix? Assume that A is an invertible matrix with characteristic polynomial C(lambda) = ((-1)-to-the-n)(lambda-to-the-n) + up to c1(lambda) + c0. Observe that since A is invertible, we must have c0 is not equal to 0. Then, the Cayley-Hamilton Theorem gives ((-1)-to-the-n)(A-to-the-n) + (c(n-1))(A-to-the-(n-1)) + up to c1A + c0(the identity matrix) = the 0 matrix. Hence, we can solve this equation for I. We get I = –(1/c0)(((-1)-to-the-n)(A-to-the-n) + (c(n-1))(A-to-the-(n-1)) + up to c2(A-squared) + c1A). Factoring out a matrix A gives I = –(1/c0)(((-1)-to-the-n)(A-to-the-(n-1)) + up to c2A + c1(the identity matrix))(all times A). Thus, we get that A-inverse = –(1/c0)(((-1)-to-the-n)(A-to-the-(n-1)) + up to c2A + c1(the identity matrix)). And so the Cayley-Hamilton Theorem gives us a formula for the inverse of an invertible matrix A as a linear combination of powers of A.

Example: Let A be the matrix [1, 1, -3; 2, 0, 2; 1, 1, 1]. Write the inverse of A as a linear combination of powers of A. Solution: We can find that the characteristic polynomial of A is lambda-cubed – 2(lambda-squared) + 8. Thus, by the Cayley-Hamilton Theorem, we have A-cubed – 2(A-squared) + 8(the identity matrix) is the 0 matrix. So, A-cubed – 2(A-squared) = -8I, and so (-1/8)(A-cubed) + (1/4)(A-squared) = the identity matrix. Consequently, A-inverse = (-1/8)(A-squared) + (1/4)A.

This concludes this lecture and the course. Make sure to watch the conclusion video and study lots for your final exam.