# Lesson: Projections Onto a Subspace

1 point

## Transcript

Now that we have explored what it means to be orthogonal to a set, we can return to our original question of how to make an orthonormal basis. We will construct such a basis one vector at a time, so for now, let us assume that we have an orthonormal set {v1 through vk}, and we want to find a vector v(k+1) such that our new set {v1 through v(k+1)} is orthonormal. Well, if we can find any vector x such that x is not in the span of {v1 through vk}, then we can split x into two pieces: the part of x that is in the span of {v1 through vk}, and the part of x that is orthogonal to the span of {v1 through vk}. We did something similar to this in Linear Algebra I when we looked at the values of proj(y)(x) and perp(y)(x). So what we want to do is expand these definitions to now look at the projection of a vector x onto a subspace S, instead of just a vector.

Now, recalling that proj(y)(x) is a scalar multiple of y, we will now define proj(S)(x) to be a linear combination of the basis vectors for S, as this would be the generalization of a scalar multiple of one vector. But of course, proj(y)(x) wasn’t just any scalar multiple of y. The scalar was (x dot y)/((the norm of y)-squared). So, to make sure that our definitions coincide in the case that S is the span of a single vector, we will take the scalars in our expanded definition to be (x dot vi)/((the norm of vi)-squared).

Let S be a k-dimensional subspace of Rn, and let B = {v1 through vk} be an orthogonal basis of S. If x is any vector in Rn, the projection of x onto S is defined to be proj(S)(x), which equals ((x dot v1)/((the norm of v1)-squared))v1 + ((x dot v2)/((the norm of v2)-squared))v2 + through to ((x dot vk)/((the norm of vk)-squared))vk. Note that this definition only works if B is an orthogonal basis. We will not consider projections based on arbitrary bases.

Continuing to parallel our original construction of proj(y) and perp(y), we now define perp(S) as follows. The projection of x perpendicular to S is defined to be perp(S)(x) = x – proj(S)(x).

Now, our hope is that perp(S)(x) will, in fact, be an element of S-perp, and it turns out that it is. To verify this, we will show that perp(S)(x) dotted with vi = 0 for all the vectors in our basis. perp(S)(x) dot vi = (x – proj(S)(x)) dotted with vi, which we can distribute to get that this equals (x dot vi) – (proj(S)(x) dot vi). Let’s expand out our proj(S)(x), and then again distribute this dot product so that we’re looking at (x dot vi) – ((((x dot v1)/((norm of v1)-squared))v1 dot vi) + through to ((x dot vk)/((the norm of vk)-squared))vk dotted with vi)). Now, recalling that B is an orthogonal set, we know that vi dot vj = 0 if i does not equal j, so many of these terms will simply go to 0. So we still have our x dot vi initially, but now we’re subtracting a whole bunch of zeroes and (((x dot vi)/((the norm of vi)-squared))vi dotted with vi). Well, again, the first part is just a scalar, so we’re looking, really looking at this (vi dot vi), which, of course, is just, again, (the norm of vi)-squared, which will cancel with (the norm of vi)-squared. And so we see that we have (x dot vi) – (x dot vi), which is 0, as desired.

Now let’s look at this in action. Let B be the set of vectors {[1; 2; 3] and [1; 1; -1]}, and consider it as an orthogonal basis for a subspace S, and let’s look at another vector x = [4; 5; -2]. To determine proj(S)(x) and perp(S)(x), we will first want to do the following calculations. Let’s look at the dot product of x with our first basis vector. This is [4; 5; -2] dotted with [1; 2; 3], which equals 8. Then we can look at the dot product of x with our second basis vector, so that’s [4; 5; -2] dotted with [1; 1; -1], which equals 11. We will also need to know the norm-squared of our first basis vector, which, in this case, is 14, and the norm-squared of our second basis vector, which, in this case, is 3. Then we have that proj(S)(x) will equal (8/14)[1; 2; 3] + (11/3)[1; 1; -1], which equals [89/21; 101/21; -41/21]. And then this means that perp(S)(x) = x – proj(S)(x), so we’re just looking at [4; 5; -2] – [89/21; 101/21; -41/21], which equals [-5/21; 4/21; and -1/21].

Now we can go ahead and verify our calculation by making sure that the perp(S)(x) really is orthogonal to our basis vectors. So we can look at [1; 2; 3] dotted with [-5/21; 4/21; -1/21], and see that it equals 0, and that [1; 1; -1] dotted with [-5/21; 4/21; -1/21] also equals 0.

Before we finally move onto our algorithm for constructing an orthonormal basis, we want to notice one last feature of proj(S)(x), and that is that it is the vector in S that is closest to x. We see this in the Approximation Theorem. Let S be a subspace of Rn. Then for any x in Rn, the unique vector s in S that minimizes the distance ||x – s|| is s = proj(S)(x).

To prove this, let’s start by having {v1 through vk} be an orthonormal basis for S, and we’ll let {v(k+1) through vn} be an orthonormal basis for S-perp, so that our set {v1 through vn} is an orthonormal basis for all of Rn. Then for any x in Rn, there are scalars x1 through xn such that we can write x = x1v1 + through to xnvn. Let’s let s, which we can write as s1v1 through skvk, be an element of our subspace. Then we also have that s = s1v1 + through to skvk + 0v(k+1) + through to 0vn. So we can write x – s as, we’ll take our coordinates for x, subtract our coordinates for s, and we see that we get (x1 – s1)v1 + through to (xk – sk)vk, then just + x(k+1)v(k+1) + through to xnvn.

Now, in order to minimize the distance from ||x – s||, we will actually minimize the easier-to-calculate (norm of (x – s))-squared. Also recall that, since B is an orthonormal basis, and as we have written x – s in terms of B-coordinates, we can still calculate (the norm of (x – s))-squared the usual way. That is, we sum the square of the coefficients. And so we see that (the norm of (x – s))-squared = (x1 – s1)-squared + through to (xk – sk)-squared + (x(k+1))-squared + through to xn-squared. Now clearly, this value is minimized by setting si = xi so that our terms xi – si = 0 for i from 1 to k. And this means we have shown that the vector s in S that minimizes the distance ||x – s|| is s = x1v1 + through to xkvk.

Now, to see that this is equal to the proj(S)(x), let’s first recall that the proj(S)(x) = (x dot v1)v1 + through to (x dot vk)vk since, again, our {v1 through vk} is, in fact, an orthonormal basis for S. And then we notice that for any i = 1 through k, we get that x dot vi will equal, well, let’s expand out x in terms of our B-coordinates, dotted with vi, and then we can distribute our dot product, so, and pull out the scalars, so that this becomes x1(v1 dot vi) + through to xi(vi dot vi) + through to xk(vk dot vi) + (x(k+1))(v(k+1) dotted with vi) and all the way through to xn(vn dot vi). Well, since our {v1 through vn} is an orthonormal basis for Rn, most of those dot products go to 0, and we are simply left with an xi times a (vi dot vi), which is xi((the norm of vi)-squared). But again, being orthonormal, the norm of vi is 1, so this simply equals xi.

So we see that x dot vi = xi, and thus, that the proj(S)(x), which equals (x dot v1)v1 + through to (x dot vk)vk, simply equals x1v1 + through to xkvk, which was our chosen vector s.