Lesson: The Gram-Schmidt Procedure

1 point

Transcript

We are almost ready to start generating orthonormal bases for a subspace, but we need to make one last observation first.

Suppose that v1 through vk are in Rn. Then the span of {v1 through vk} is the same as the span of {v1 through v(k-1), and vk + t1v1 + dot dot dot + t(k-1)v(k-1)} for any scalars t1 through t(k-1) in R. So the idea is that you can replace the vector vk with (vk + a linear combination of the preceding vectors v1 through v(k-1)).

Now, the secret to this theorem is to realize the following, that a1v1 + through to a(k-1)v(k-1) + ak(vk + t1v1 + dot dot dot + t(k-1)v(k-1)) = the quantity (a1 + t1)v1 + through to the quantity (a(k-1) + t(k-1))v(k-1) + akvk. So on the left, we have written a general vector of the span of our bigger set, {v1 through v(k-1), and then the last vector is the vk + the linear combination}. And then by rewriting our terms, we’ve shown that it is in the span of {v1 through vk}. So we’ve easily been able to show that the span of our modified set is contained in the span of our original set.

But now, what if we were to set bi = ai + ti for all i = 1 through k-1, and just leave bk = ak? Well, then we’ve written the right side as a general vector in the span of our original set, {v1 through vk}. And again, looking at the equality in the other direction, we see that our vector on the right must still be a vector in the span of our modified set {v1 through v(k-1), plus the (vk + linear combination)}.

And so, with this theorem in hand, at long last, we come to the Gram-Schmidt Procedure for using any spanning set to find an orthogonal basis for a subspace. So let’s look at the setup. We want to start with a set {w1 through wk} that’s a spanning set for a subspace S of Rn. And yes, all we need is a spanning set—we don’t need for it to be linearly independent or to have the correct number of vectors. It’ll turn out that if our set contains any dependent members, the procedure will produce the 0 vector, which we will promptly toss aside, and continue.

Our first step will just to be to go ahead and let v1 be w1. Any old vector will do to start. Well, then the span of {v1} is obviously equal to the span of {w1}, and we’ll set a subspace S1 equal to the span of {v1}.

Here’s where things start to get interesting. As our second step, we’re going to let v2 = perp(S1)(w2). So that means we’ll have that v2 = w2 – proj(S1)(w2), which equals w2 – ((w2 dot v1)/((the norm of v1)-squared))v1. Now, if we get that v2 = 0, then it must be the case that w2 = the proj(S1)(w2), which means that w2 is, in fact, an element of S1. That is to say that w2 is in the span of {w1}, which means that the span of {w1} equals the span of {w1, w2}, so we could have, simply have that the span of {v1} equals the span of {w1, w2}. And then we have that {v1} is an orthogonal set such that the span of {v1} equals the span of {w1, w2}, and we would set S2 equal to the span of {v1}.

Now, of course, this is the unusual situation. With any luck, we’ll get that v2 is not equal to the 0 vector. Well, then we’ll note that since the span of {v1} equals the span of {w1}, we know that the span of {v1, w2} equals the span of {w1, w2}. And now we’ll use our new theorem to get that this, this means that the span of {v1, w2} equals the span of {v1, w2 + a scalar multiple of v1}. Well, what’s our scalar? Well, t will be equal to –(w2 dot v1)/((the norm of v1)-squared). So that is to say that the set {v1, w2 + tv1} equals the set {v1, v2}. So the set {v1, v2} is an orthogonal set such that the span of {v1, v2} equals the span of {w1, w2}, and we’re going to let S2 be this, this same span, the span of {v1, v2}.

So the idea here is, right, that we’ve used a projection to find an orthogonal vector that still has the span of the exact same vectors we were starting with, and we’re doing this incrementally. So at a generic ith step, we will by this point have an orthogonal set {v1 through vj}, where the precise value of j will depend on how many times we might have gotten the 0 vector. But we’ll know that the span of {v1 through vj} equals the span of {w1 through w(i-1)}, and we will have set the subspace S(i-1) equal to the span of {v1 through vj}.

Then we’re going to look at setting v(j+1) equal to the perp(S(i-1))(wi). That is, we’ll have that v(j+1) = wi – the proj(S(i-1))(wi), or that it equals wi – ((wi dot v1)/((norm of v1)-squared))v1 – through to ((wi dot vj)/(the norm of vj)-squared))vj. As before, if this does turn out to be 0, then we’ll see that wi equals the proj(S(i-1))(wi), which means that wi is in the span of {w1 through w(i-1)}. In this case, it means that simply adding in a wi would not change the span, so we can ignore this vector, and see that {v1 through vj} is an orthogonal set such that the span of {v1 through vj} equals the span of {w1 through wi}. And we will set Si equal to the span of {v1 through vj}.

But if v(j+1) does not equal 0, then again we’ll notice that the span of {v1 through vj} equals the span of {w1 through w(i-1)}, so this means that the span of {v1 through vj, with wi} will equal the span of {w1 through w(i-1), with wi}. But then our new theorem tells us that that span of {v1 through vj, and wi} will equal the span of {v1 through vj, and with wi replaced with (wi + the scalar linear combinations of the v1 through vj)}. Again, what are the scalars we’re using in this linear combination? Well, it’s what we get from the projection. It’s this (wi dot v1)/((norm of v1)-squared), and so on. So we’ve seen that {v1 through v(j+1)} is an orthogonal set such that the span of {v1 through v(j+1)} equals the span of {w1 through wi}. And we will set Si equal to the span of {v1 through v(j+1)}.

So what does our kth step look like? Again, k is going to be our last step, and it will proceed the same as the ith step, but at the end, we will have an orthogonal set {v1 through vj} such that the span of {v1 through vj} equals the span of {w1 through wk}, which was defined to be S. Now, since we made sure that {v1 through vj} does not contain any 0 vectors, we know that {v1 through vj} is also linearly independent because it’s an orthogonal set, and that means that {v1 through vj} is an orthogonal basis for S.

Now, of course, I’ve explained this all in theory, but hopefully it will make much more sense once we work through an example.

So let’s find an orthogonal basis for the span of these vectors. We start by setting v1, which, again, the v’s are going to be the set we’re creating, but we’ll go ahead and set v1 = w1, the first vector in our set. And we’ll simply let our first subspace be, S1, be the span of {the first vector}.

Now is when we have to start doing calculations because v2 is going to be perp(S1)(w2), or to, w2 – proj(S1)(w2). Doing the calculations, this means we get that v2 equals our w2, our second vector in our set, minus ((w2 dot v1)/((the norm of v1)-squared))v1, where, again, v1 is the same as w1, which is the first vector in our set. Well, plugging in the dot product of w2 and v1, and plugging in (the norm of v1)-squared, we see that we’re looking at this linear combination of vectors. Oh my, but that ends up being the 0 vector. Sure enough, a quick look has us see that our second vector is simply 2 times our first vector, so w2 is already in the span of our subset S1, so we can discard it. And we’ll simply set S2 = S1, the span of {our first vector}.

Now we want to look at finding v2 yet again, but this time, it’s going to be the perp(S2)(w3), or w3 – proj(S2)(w3). Let’s go through our calculations again. So v2 is going to equal now our w3, which is our third vector, minus ((w3 dot v1)/((the norm of v1)-squared))v1, where v1 is still our first vector. Plugging in the calculations for w3 dot v1, and (the norm of v1)-squared again, we see we’re looking at this linear combination of vectors, which, in this case, equals [4/3; -5/3; 5/3; and 1]. Now, instead of simply leaving v2 as this vector, we’ll note that if {v1, v2} is an orthogonal set, then so is {v1, sv2} for any scalar s. So to simplify our calculations, we can multiply our v2 by (in this case) 3. We will set S3 equal to the span of {[1; -2; -1; -3], [4; -5; 5; 3]}.

And now we move on to the next step, to find v3, which will be equal to the perp(S3)(w4), which equals w4 – proj(S3)(w4). Performing the calculations, we see that v3 equals our w4, which is [-1; 4; 2; 3], minus ((w4 dot v1)/((the norm of v1)-squared))v1 minus ((w4 dotted with v2)/((the norm of v2)-squared))v2, again, remembering, v2 is not the same as w3. Let’s go ahead and plug in those dot products and plug in those norms, and we’ll see we’re looking at this linear combination of vectors, which calculates to be [3/5; 1; 1; -4/5], or we could pull out 1/5 and see that it’s (1/5)[3; 5; 5; -4]. Again, we can go ahead and set v3 = [3; 5; 5; -4] since we can factor out the 1/5 to make our calculations easier. And we now set S4 equal to the span of the vectors {[1; -2; -1; -3], [4; -5; 5; 3], [3; 5; 5; -4]}.

And one last calculation to do, and that’s to try to find v4, which will be the perp(S4)(w5), which equals w5 – proj(S4)(w5). Now performing the calculations, we’ll see that v4 equals our w5 minus ((w5 dot v1) / ((the norm of v1)-squared))v1 minus ((w5 dot v2)/((the norm of v2)-squared))v2 minus ((w5 dot v3)/((the norm of v3)-squared))v3. If we plug in all those dot products and all those norms, we see that we’re looking at this linear combination of vectors, which again gave us the 0 vector. So this simply means that w5 is already contained in the span of S4, so we can discard it and see that S5 can be equal to S4, which is the span of our three vectors.

But since w5 was the last vector in our spanning set, our algorithm comes to an end, and we have that our set {[1; -2; -1; -3], [4; -5; 5; 3], [3; 5; 5; -4]} is an orthogonal basis for the original span.