Transcript — Introduction
In the last lecture, we derived the method of least squares by looking at a scientist trying to find an mth-degree polynomial which best fits a set of experimental data. In this lecture, we will look at some examples of this, but first, we will prove under which condition we get a unique polynomial of best fit.
Theorem 9.6.2: Let n data points (x1, y1) to (xn, yn) be given. Let y = [y1 to yn] and X be the matrix with first column all ones, second column [x1 to xn], third column [x1-squared to xn-squared], et cetera. If a = [a0 to am] is any solution to the normal system (X-transpose)Xa = (X-transpose)y, then the polynomial p(x) = a0 + a1x + up to am(x-to-the-m) is a best-fitting polynomial of degree m for the given data. Moreover, if at least m + 1 of the xi values are distinct, then the matrix (X-transpose)X is invertible, and hence, a is unique, with a = (((X-transpose)X)-inverse)(X-transpose)y.
Proof: Putting the data points into p(x), we get the system of n equations in m + 1 unknowns, a0 + a1x1 + up to am(x1-to-the-m) = y1, down to a0 + a1xn + up to am(xn-to-the-m) = yn. This is the system Xa = y. Thus, from our work last class, we get that any solution a of the normal system (X-transpose)Xa = (X-transpose)y gives the coefficients of a polynomial of best fit.
To prove the second part of the theorem, we need to prove that (X-transpose)X is invertible. We will do this in two stages. We will first prove that if at least m + 1 of the xi values are distinct, then the columns of X are linearly independent. As usual, to prove linear independence, we consider a linear combination of the columns of X which equals the 0 vector. Simplifying the right-hand side, we get c0 + c1x1 + up to cm(x1-to-the-m) = 0, down to c0 + c1xn + up to cm(xn-to-the-m) = 0. Notice that this implies that x1 to xn are all roots of the polynomial q(x) = c0 + c1x + up to cm(x-to-the-m). Therefore, q(x) has at least m + 1 distinct roots. But by the Fundamental Theorem of Algebra, the only mth-degree polynomial with greater than m roots is the 0 polynomial. Hence, 0 = q(x), which equals c0 + c1x + up to cm(x-to-the-m). Therefore, all of the coefficients c0 through cm equal 0. And consequently, the columns of X are linearly independent.
Now to prove that (X-transpose)X is invertible, we will prove that the only solution of (X-transpose)Xv = the 0 vector is v = the 0 vector. We will do this by showing that this equation implies that Xv is the 0 vector. To show that Xv is the 0 vector, we will use a very good trick that we will use a few more times in the course. To prove a vector is the 0 vector, we will prove that the length of the vector is 0. We have (the length of Xv)-squared equals, by definition, Xv dot product with Xv, which, by our very important formula, is (Xv)-transpose matrix-matrix multiplied by Xv. Using properties of transposes, this equals (v-transpose)(X-transpose)Xv. But (X-transpose)Xv is the 0 vector, and so we get v-transpose times 0 vector, which is 0. Thus, (X-transpose)Xv equals the 0 vector implies Xv is equal to the 0 vector, but the columns of X are linearly independent, and so the only solution of Xv = the 0 vector is v = the 0 vector as required. Therefore, (X-transpose)X is invertible, and so we get a = (((X-transpose)X)-inverse)(X-transpose)y.
The matrix X is called the design matrix. Note that it is derived from the form of the desired polynomial. It is important to understand how it is designed, as the desired polynomial is not always in standard form. I will demonstrate this in the second example.
Example: Find a, b, and c to obtain the best-fitting equation of the form y = a + bx + cx-squared for the given data. Solution: We have y = [3; 1; 1; 2; 4] and the design matrix X has first column all ones, the second column has all the x-values, and the third column has all of the x-values squared. Therefore, by Theorem 9.6.2, we have that there is a unique solution a = (((X-transpose)X)-inverse)(X-transpose)y, which equals [121/105; 1/5; 11/42]. And so, the second-degree polynomial of best fit is 121/105 + (1/5)x + (11/42)(x-squared).
Example: Find the best-fitting polynomial of the form p(x) = ax-squared + bx for the given data. Solution: In this case, based on the desired form of p(x), we need to pick the design matrix X to have first column of the x-values squared, and the second column to have the x-values. We let y = [4; 1; 1] and then get that [a; b] = (((X-transpose)X)-inverse)(X-transpose)y, which equals [5/2; -3/2]. Therefore, the polynomial of the desired form of best fit is p(x) = (5/2)(x-squared) – (3/2)x.
This ends this lecture.