Lesson: Orthonormal Bases and Orthogonal Matrices

Question 2

1 point

Transcript — Introduction

In the last lecture, we achieved our goal of defining bases for inner product spaces that have the same properties as the standard basis vectors in Rn. In particular, we defined an orthogonal basis {v1 to vn} for an inner product space V to be a basis for V such that the inner product of vi and vj equals 0 for all i not equal to j. We also defined an orthonormal basis {v1 to vn} for an inner product space V to be a basis for V such that it is an orthogonal basis, and each vi is a unit vector. In this lecture, we will show that, indeed, these bases are easy to use and very useful.

Orthogonal and Orthonormal Bases

We begin by showing that it is quite easy to find the coordinates of a vector with respect to an orthogonal basis. Theorem 9.2.4: If B = {v1 to vn} is an orthogonal basis for an inner product space V, and v is any vector in V, then the coefficient of vi when v is written as a linear combination of the vectors in B is (the inner product of v and vi) divided by ((the length of vi)-squared). In particular, v = ((the inner product of v and v1) divided by ((the length of v1)-squared))v1 + up to ((the inner product of v and vn) divided by ((the length of vn)-squared))vn.

Before I prove this theorem, I will note that even though the inner product is symmetric, it is important to memorize this formula as it is written. Later in this course, we will have to use it in this form.

Proof: This proof gives another good example of how studying previous proofs can help. To prove this theorem, we use the same technique as in the proof of Theorem 9.2.3 in the last lecture. We consider v = c1v1 + up to cnvn. Taking the inner product with vi gives the inner product of v and vi equals the inner product of (c1v1 + up to cnvn) and vi. Since an inner product is bilinear, we get that this equals (c1 times the inner product of v1 and vi) + up to (cn times the inner product of vn and vi). But vi is orthogonal to all of the other vectors, and the inner product of vi with itself is (the length of vi)-squared. And so we have the inner product of v and vi is equal to ci((the length of vi)-squared). Since vi is not equal to the 0 vector, the result follows.

Notice that if {v1 to vn} is an orthonormal basis, then the formula for finding the coordinates of a vector with respect to this basis is even easier since the lengths of all the vectors are 1.

Corollary 9.2.5: If B = {v1 to vn} is an orthonormal basis for an inner product space V, and v is any vector in V, then v = (the inner product of v and v1)v1 + up to (the inner product of v and vn)vn.

Example: Verify that the set B = {1, x, 3x-squared – 2} is an orthogonal basis for P2(R) with inner product, the inner product of p and q is equal to p(-1)q(-1) + p(0)q(0) + p(1)q(1), and find the coordinates of 1 + x + x-squared with respect to the basis B. Solution: First, we have that the inner product of 1 and x, plugging into the definition of the inner product, is 1(-1) + 1(0) + 1(1), which is 0. The inner product of 1 and (3x-squared – 2), by definition, is 1(1) + 1(-2) + 1(1), which also is 0. And finally, the inner product of x and (3x-squared – 2) is equal to (-1)(1) + 0(-2) + 1(1), which is 0. Hence, B is an orthogonal set of three non-zero vectors, and therefore is a linearly independent set of three vectors in a 3-dimensional inner product space, and so B is an orthogonal basis for P2(R).

To find the coordinates of 1 + x + x-squared, we use Theorem 9.2.4. We get the inner product of (1 + x + x-squared) and 1 is equal to 1(1) + 1(1) + 3(1), which is 5. The inner product of (1 + x + x-squared) and x is equal to 1(-1) + 1(0) + 3(1), which is 2. And the inner product of (1 + x + x-squared) and (3x-squared – 2) is 1(1) + 1(-2) + 3(1), which is 2. We also need the lengths of the basis vectors squared, and so we get (the length of 1)-squared is the inner product of 1 and 1, which is 3. (The length of x)-squared is the inner product of x with itself, which is 2. And (the length of (3x-squared – 2))-squared is the inner product of (3x-squared – 2) with itself, which is 6. And thus, we get 1 + x + x-squared = (5/3)(1) + 1x + (1/3)(3x-squared – 2). It is easy to check that we do have the correct answer.

Example: The set B = {v1, v2, v3}, where v1 = [1/(square root 3); 1/(square root 3); -1/(square root 3)], v2 = [1/(square root 2); 0; 1/(square root 2)], and v3 = [1/(square root 6); -2/(square root 6); -1/(square root 6)], is an orthonormal basis for R3. Find the coordinates of x = [1; 2; 3] with respect to B. Solution: Notice that putting this into a matrix and row reducing to find the coordinates would be terrible because of the square roots. But thanks to Corollary 9.2.5, we get that x = (the inner product of x and v1)v1 + (the inner product of x and v2)v2 + (the inner product of x and v3)v3. Evaluating the inner products using the standard inner product for R3—i.e., the dot product—we get x = 0v1 + (2 square root 2)v2 – (square root 6)v3. Again, it is easy to check that we do have the correct answer.

Compare the calculations in the two previous examples. Notice that it was actually harder to do the calculations by hand with the orthonormal basis since these vectors contained square roots. So, when doing calculations by hand, we actually generally prefer to have an orthogonal basis whose vectors do not contain fractions. However, of course, for theoretical purposes, it is better to have an orthonormal basis so that all the lengths are 1.

Orthogonal Matrices

We now look at a very important orthonormal basis. Example: Consider the set B = {[cos theta; sin theta], [-sin theta; cos theta]}. Observe that [cos theta; sin theta] dot [-sin theta; cos theta] is equal to 0. The length of the first vector is the square root of ((cos-squared theta) + (sin-squared theta)), which is 1. And the length of the second vector is the square root of ((-sin theta)-squared + (cos-squared theta)), which also equals 1. And hence, B is an orthonormal basis for R2.

You may recall from Linear Algebra I that this corresponds to a rotation of the coordinate axes by angle theta. Let’s see how to change coordinates between this basis for R2 and the standard basis S for R2. We first recall from Linear Algebra I that if B = {v1 to vn} and C are bases for a vector space V, then the change of coordinates matrix from B-coordinates to C-coordinates is the matrix whose columns are the C-coordinates of the vectors in B. It satisfies the C-coordinate vector of x equals the change of coordinates matrix times the B-coordinate vector of x.

So, in our example, to find the change of coordinates matrix from B-coordinates to standard coordinates, we find the coordinates of the vectors in B with respect to the standard basis. Of course, this gives the matrix [cos theta, -sin theta; sin theta, cos theta], which we again recognize as a rotation matrix. Now we get to the amazing part. Recall that the change of coordinates matrix from S-coordinates to B-coordinates is the inverse of the change of coordinates matrix from B-coordinates to S-coordinates. Using our formula for the inverse of a 2-by-2 matrix, we get that the change of coordinates matrix from S-coordinates to B-coordinates is [cos theta, sin theta; -sin theta, cos theta]. Wow! Notice that this is just the transpose of the change of coordinates matrix from B-coordinates to standard coordinates.

It is pretty amazing that we can find the inverse of a matrix by just taking the transpose—as normally, if you recall from Linear Algebra I, it is quite time-consuming, in general, to find the inverse of a matrix. But we have to ask ourselves, is this just a special case, or is this true for every matrix whose columns form an orthonormal basis for Rn? The answer is, of course it’s true, because math is awesome.

Theorem 9.2.6: If P is a matrix in M(n-by-n)(R), then the following are equivalent:

  1. The columns of P form an orthonormal basis for Rn.
  2. P-transpose = P-inverse.
  3. The rows of P also form an orthonormal basis for Rn.

Recall that when we say the statements are equivalent, we mean that if one is true, then they are all true, or if one is false, then they are all false.

Proof: To prove the three statements are equivalent, we will prove (1) if and only if (2), and then (2) if and only if (3). First, since statement (1) is about the columns of P, we should make columns for P. So let P be the matrix with columns v1 to vn. By definition of matrix-matrix multiplication, we get (P-transpose)P = [(v1 dot v1), (v1 dot v2), up to (v1 dot vn); (v2 dot v1), (v2 dot v2), up to (v2 dot vn); down to (vn dot v1), (vn dot v2), up to (vn dot vn)]. We now observe that this equals the identity matrix if and only if the dot products along the main diagonal—vi dot vi—equals 1 for all i, and all other of the dot products are 0. That is, (P-transpose)P = I if and only if {v1 to vn} is an orthonormal basis for Rn.

To prove (2) if and only if (3), we use the same technique, but this time we are working with the rows. So let P be the matrix with rows w1-transpose to wn-transpose. Notice, to make the matrix-matrix multiplication work out nicely this time, we actually need to look at P(P-transpose) instead. We have P(P-transpose) = [(w1 dot w1), (w1 dot w2), up to (w1 dot wn); down to (wn dot w1), (wn dot w2), up to (wn dot wn)]. Thus, as above, P(P-transpose) = I if and only if {w1 to wn} is an orthonormal basis for Rn.

Take a minute to read over the proof. Make sure that you understand why we needed to use (P-transpose)P in the first half of the proof and P(P-transpose) in the second half. This theorem shows that n-by-n matrices whose columns form an orthonormal basis for Rn are wonderful, as we can find an inverse for the matrix by just taking the transpose. We will also see throughout the rest of the course that these matrices are extremely useful.

Thus, of course, we give them a name. Let P be an n-by-n matrix whose columns form an orthonormal basis for Rn. Then P is said to be an orthogonal matrix. Normally, in mathematics, things are extremely well-named. A rotation matrix is a matrix for a rotation. An orthogonal basis is a basis whose vectors are orthogonal. Notice, however, that this is a very bad name. A much better name would have been an orthonormal matrix, but, unfortunately, in mathematics, we call it an orthogonal matrix. So be sure that you remember that an orthogonal matrix has orthonormal columns and rows.

Example: Which of the following are orthogonal matrices? (a) The matrix A = [0, 0, 1; 1, 0, 0; 0, 1, 0]. Since its columns are standard basis vectors, they form an orthonormal basis for R3, and hence, A is orthogonal. (b) The matrix B = [1/(square root 3), 1/(square root 3), 1/(square root 3); 1/(square root 2), 0, 1/(square root 2); -1/(square root 6), 2/(square root 6), -1/(square root 6)]. Observe that the dot product of the first and second row is not 0, so the rows are not orthogonal, and hence, B is not orthogonal. (c) C = [1/(square root 2), -1/(square root 2); 1/(square root 2), 1/(square root 2)]. Calculating (C-transpose)C, we get the identity matrix, so C-transpose = C-inverse, and hence, C is orthogonal.

Take a minute to compare the calculations required to multiply (C-transpose)C and the calculations required to verify the columns form an orthonormal basis for R2.

Of course, orthogonal matrices have even more very nice properties. Theorem 9.2.7: If P and Q are n-by-n orthogonal matrices, and x and y are vectors in Rn, then

  1. The dot product of Px and Py is equal to the dot product of x and y.
  2. The length of Px equals the length of x.
  3. The determinant of P is either 1 or -1.
  4. All real eigenvalues of P must be 1 or -1.
  5. PQ is also an orthogonal matrix.

We will prove (1) and leave the others as exercises. For (1), we have that (Px) dot (Py) equals, using the amazingly useful formula that relates the dot product to matrix-matrix multiplication, (Px)-transpose matrix-matrix multiplied by Py. By a property of the transpose, this equals (x-transpose)(P-transpose)Py. But P-transpose = P-inverse since P is orthogonal, and hence, we get this equals (x-transpose)y, which is x dot y.

We have now seen that orthogonal and orthonormal bases are wonderful. But we still have to ask one very important question. Does every finite-dimensional subspace of an inner product space have an orthogonal basis? Since math is awesome, the answer, of course, is yes. We will prove this in the next lecture. This ends this lecture.

© University of Waterloo and others, Powered by Maplesoft