Gottlob Gienger

## Polynomial L2 Approximation NB   CDF   PDF

### Relating Orthonormal Polynomials, Gram—Schmidt Orthonormalization, QR Factorization, Normal Equations and Vandermonde and Hilbert Matrices

This didactic synthesis compares three solution methods for polynomial approximation and systematically presents their common characteristics and their close interrelations:

1. Classical GramSchmidt orthonormalization and Fourier approximation in
2. Linear least-squares solution via QR factorization on an equally spaced grid in
3. Linear least-squares solution via the normal equations method in and on an equally
spaced grid in

The first two methods are linear least-squares systems with Vandermonde matrices ; the normal equations contain matrices of Hilbert type . The solutions on equally spaced grids in converge to the solutions in All solution characteristics and their relations are illustrated by symbolic or numeric examples and graphs.

### Definitions and Notation

Let . Consider the Hilbert space of real-valued square integrable functions (or , for short), equipped with Lebesgue measure and scalar product

and the corresponding -norm

scalar products can be approximated by scalar products on discrete grids in based on Riemann sums and similarly for norms.

Partition the finite interval into subintervals by the points

and set

Suppose that is a bounded function on . Let be any point in the subinterval and define the grid

The Riemann sums on the partition and grid are defined by

Equation (3) is called the left-hand Riemann sum, (4) the right-hand Riemann sum and (5) the (composite) midpoint rule.

For an equally spaced partition, the step size is

The equally spaced partition points are

and the equally spaced grid of length (excluding the endpoint ) is

It is also possible to use the grid points or grid shifted by an amount , where so that , as

Let

For equally spaced grids, the Riemann sums simplify to

Setting , and gives the left-hand Riemann sum, the composite midpoint rule and the right-hand Riemann sum, respectively. The error of the Riemann sums is defined as

The set of continuous real-valued functions forms a dense subspace of , [1], Theorem (13.21). For , the restrictions and to this grid are well-defined. Define the -dimensional scalar product on this grid:

The -dimensional 2-norm is

The factor ensures that the norms of constant functions agree:

Denote the linear space of polynomials with real coefficients of degree at most by and define the polynomial by

The polynomial can be written as a scalar product (or dot product) of two -tuples, the monomials up to degree and the coefficients:

Introducing the Vandermonde matrix

every polynomial of degree can be written as the product of a matrix and a vector as

The product of a matrix and an -vector is regarded as a 1-vector, not a scalar, as in Mathematica.

Restricting the Vandermonde matrix to the interval gives an operator mapping into :

Whereas is an unbounded operator, is a bounded operator with respect to the
2-norms on and .

The polynomial approximates in the -norm, as measured by

In matrix-vector notation, this constitutes a linear least-squares problem for the coefficients :

where

Now take a discrete grid

and sample on this discrete grid:

The polynomial of degree approximates in the 2-norm on this grid as measured by

In matrix-vector notation, this constitutes a linear least-squares problem for the coefficients :

where

A rectangular or square matrix of this form is called a Vandermonde matrix.

Let , be Hilbert spaces with scalar products , and let be a bounded linear operator defined everywhere on . Then the adjoint operator of is defined by the identity ([2], p. 196, Definition 1)

For Hilbert spaces , over the reals, one writes instead of .

### Theorems and Properties for Later Use

All Riemann sums integrate constant functions exactly on any grid, since

If is bounded on and continuous except for finitely many points, it has a Riemann integral. Consult [3], Chapter 2, for proofs and further references on quadrature formulas.

Theorem 1

If exists and is bounded and integrable over , then the errors of the right-hand Riemann sums satisfy

A similar result holds for the left-hand Riemann sums (with ).

If , the error term of the elementary midpoint formula is given in [3], (2.6.5):

Therefore the error of the composite midpoint formula can be bounded by

#### Convergence of Discrete to Scalar Products

By Theorem 1, for functions and , the discrete scalar product converges at least as fast as to the scalar product:

By equation (19), for functions , the discrete scalar product converges at least as fast as to the scalar product:

#### Singular Value Decomposition

See [4], sections 2.4 and 2.6.

Theorem 2

If is a real matrix, then there exist orthogonal matrices

such that

where .
The equivalence

is called the full singular value decomposition (SVD).
##### Matrix Properties from SVD

Define to be the index of the last positive singular value

Then

The condition number of a rectangular matrix with full column rank with respect to the 2-norm (in short, the 2-condition number) is defined as the quotient of its largest to its smallest singular value ([4], equation (2.6.5)):

By equation (20), the 2-condition number has the properties

If is an invertible matrix, . The SVD of is obtained from the SVD of as

If is a real matrix with orthonormal columns , it can be completed to an orthogonal matrix . Therefore the SVD of is

The 2-condition number of is

See [2], sections VII.1 and VII.2.

Theorem 3

If is a bounded linear operator, defined everywhere on , then is a bounded linear operator defined everywhere on and

.

#### Finite-Dimensional Linear Operators

For and a subset , linear combinations from the subset define a finite-dimensional linear operator

Obviously, and if and only if is linearly independent.

is bounded and has the matrix representation

Apply the definition of the adjoint operator and notice that the first scalar product is that of , ; then

Substituting into the preceding equation gives the representation of as an matrix (note the scalar products are taken in ):

Here is Hermitian positive semidefinite if is over the complex numbers, and symmetric positive semidefinite if is over the reals, and

#### Properties of Vandermonde Matrices

A polynomial of degree has at most distinct zeros, therefore the set of monomials is a linearly independent subset of and

By [5], the determinant of the Vandermonde matrix is the product

Therefore, the rectangular Vandermonde matrix (15) has full rank if and only if the points are pairwise disjoint.

### Mathematica Definitions for Polynomial Approximation

Names are chosen according to previous notation and terminology.

This defines the 2-condition number.

This defines symbolic integration with time constrained to 10 seconds.

This defines numerical integration.

The function (which is ) first attempts symbolic integration; if that is unsuccessful or takes too long, it performs numerical integration.

This defines the scalar product in .

Since is listable in and , it also implements the adjoint operator for a set of functions according to equation (32).

This defines the norm in .

This defines functions for discrete grids.

If , or is a machine number, the functions , , and return machine numbers.

This defines the functions , , and .

To avoid potential clashes with predefined values for the variables , and , the script letters , and are used for symbolic results.

These sample functions are used in the commands.

### Linear Least-Squares Solution via QR Factorization

Let . Given a rectangular data matrix and an observation vector ,

the linear least-squares (LS) problem is to find:

A comprehensive description of the QR factorization of an matrix via Householder, Givens, fast Givens, classical GramSchmidt and modified GramSchmidt methods is given in [4], section 5.2. Here only the essential steps are presented.

#### Principle of QR Factorization

Let be an orthogonal matrix. Such a matrix preserves lengths of vectors:

Given the real matrix , , the goal is to construct an orthogonal matrix such that

where is an upper-triangular matrix of the form

Obviously

#### Linear Least-Squares Solution by “Thin” QR Factorization

The Mathematica function QRDecomposition deviates from the full QR factorization as follows:

is output as an matrix. The rows of are orthonormal. Only the upper-triangular

submatrix is output.

Then the unique solution of the upper-triangular system is straightforward:

It gives the unique solution of the full-rank linear least-squares problem (34):

Since multiplication with an orthogonal matrix does not change the singular values, the condition numbers do not change either:

This holds in particular for the Vandermonde matrices, both in the discrete and continuous case.

#### Mathematica Definitions for QR Orthonormalization

This defines and .

### Classical Gram–Schmidt Orthonormalization

#### Polynomial Approximation via Classical Gram–Schmidt Orthonormalization

Applying the classical GramSchmidt orthonormalization process in a pre-Hilbert space, described in [2], p. 88 ff., to the monomials in gives an orthonormal system of polynomials

that satisfy

The Fourier coefficients of any function with respect to this orthonormal system are defined according to [2], p. 86, equation (1) (the dot here denotes the placeholder for the integration argument in the scalar product):

The best approximation to is given as the Fourier sum of terms:

#### Mathematica Definitions for Classical Gram–Schmidt Orthonormalization

The orthonormal system of polynomials is given by the function . The functions and are also defined.

#### Example Interval : Fourier Approximation by Legendre Polynomials

This defines the polynomials .

These polynomials differ from the classical Legendre polynomials LegendreP built into Mathematica only by normalization factors.

This shows the Fourier approximation of a sample set of functions.

#### The Classical Gram–Schmidt Orthonormalization Interpreted as QR Factorization

Proposition 1

Define as the matrix of the monomial coefficients of the polynomials of equation (36). Then is an invertible lower-triangular matrix and

Proof

The GramSchmidt orthonormalization is an invertible linear mapping of from the basis of monomials to the orthonormal basis , representable by a real matrix . By (37), the matrix is lower triangular. Transposing (38) gives (39).

Theorem 4

The classical GramSchmidt orthonormalization applied to the monomials is equivalent to the QR factorization of the Vandermonde operator ;

where is the -dimensional operator with orthonormal columns

Proof

This follows from Proposition 1 by interpreting matrices as operators.

For this QR factorization, the inverse of the upper-triangular matrix is already calculated by the GramSchmidt process:

Equations (26), (28) and (35) give the relations on the 2-condition numbers of the operators or matrices:

Corollary 1

The numerical instability of the classical GramSchmidt process in machine arithmetic is discussed in [4], section 5.2.8. However, since the GramSchmidt orthonormalization of the monomials with respect to the scalar product can be performed by symbolic calculations, numerical algorithm stability is not an issue here, contrary to the -dimensional scalar product .

#### Mathematica Definitions: Gram–Schmidt Orthonormalization as QR Factorization

This defines the GramSchmidt coefficient matrix .

This defines the lower- and upper-triangular matrices of QR decomposition.

This defines the orthogonal matrix .

#### Illustrating Proposition 1 and Theorem 4 for an Arbitrary Interval

Here is a set of orthonormal polynomials (, ).

This gives the GramSchmidt coefficient matrix , with .

Multiply the matrix of monomials from the right or left.

This reproduces orthogonal matrix .

By construction, the polynomials contained in the matrix columns are orthonormal with respect to the scalar product in .

This verifies the QR decomposition , as in Theorem 4.

#### Correspondence between Gram–Schmidt Orthonormalization and QR Factorization Interpretation

Select one of the sample functions and compare the results from the GramSchmidt orthonormalization and QR factorization interpretation.

#### Examples: Condition Numbers of Vandermonde Matrices

For the continuous case, because , the singular values of equal the singular values of .

Here is the case of a discrete grid.

#### Example Interval: Relation to Hilbert Matrices

Obviously, is the lower-triangular Cholesky factor of the .

Consequently, the 2-condition number of the Vandermonde matrix of size on is the square root of the 2-condition number of .

### Summary of Results for Orthogonalization Methods

This gives summary results.

#### Characterizing Orthogonalization Solutions

1. The system matrix is a Vandermonde matrix of dimension in the continuous setting and in the discrete setting.
2. has full rank in the continuous and discrete settings.
3. The discrete case is solved by the QR factorization of into an orthogonal matrix and an upper-triangular matrix .
4. In the continuous case, the GramSchmidt orthonormalization applied to the monomials in reads in matrix vector notation as .
5. This is reformulated as QR factorization of the Vandermonde matrix: .
6. Consequently, the 2-condition number of equals the 2-condition number of the upper-triangular matrix :
7. a. For intervals symmetric around zero, , the columns of are alternating
orthogonal. Therefore the 2-condition number is much smaller than for .
b. The minimum condition number is achieved for interval .
c. Therefore, to obtain the most precise numeric results, the approximation interval
should be linearly mapped onto .
d. is well-conditioned (i.e. less than about 10,000) for the interval and .
e. is ill-conditioned for the interval , , and generally for intervals ,
.

#### The Normal Equations Method for Finite Dimensions

The approach for deriving the normal equations for the least-squares problem (34) is described in [4], section 5.3, for example. Define

A necessary condition for a minimum is , or equivalently,

These are called the normal equations. The minimum is unique if the Hessian matrix has full rank . Then there is a unique solution of the linear least-squares problem (34) or (40):

For an equally spaced , defined as in equation (5), the Vandermonde matrix has full rank, so the Hessian matrix for polynomial approximation has full rank as well.

If is rank deficient (), then there are an infinite number of solutions to the least-squares problem. There is still a unique solution with minimum 2-norm, which can be written with the pseudo-inverse matrix ([4], section 5.5):

For full rank, .

#### The Normal Equations Algorithm for the Case

Suppose with (see [4] section 5.3.1). Then this algorithm computes the unique solution to the linear least-squares problem (34):

• Compute the lower-triangular portion of .
• Compute .
• Compute the Cholesky factorization (or via CholeskyDecomposition).
• Solve the triangular systems and (or and ).

#### The Normal Equations Method on

The approach for deriving the normal equations for the case applies with one modification to the continuous least-squares approximation equation (12) (see [6]):

The matrix transpose has to be replaced by the adjoint operator :

A necessary condition for a minimum is , or equivalently,

These are called the normal equations.

is called the Hessian matrix. The minimum is unique if has full rank . The elements can be calculated via equation (31):

Obviously, the Hessian matrix is symmetric and positive semidefinite. Since has full rank for any nonzero interval , then has full rank (as well by equation (32)) and is therefore positive definite. Then there exists a unique solution of equations (42) and (12).

Finally, calculate the elements on the right-hand side of via equation (30):

#### Convergence of Discrete Approximations to Continuous Approximations

This subsection investigates under which conditions and how fast the polynomial approximations on discrete grids converge to the continuous polynomial approximation in .

For an equally spaced , the normal equations, multiplied by the step size , read

Define

then by equation (7), the matrix elements of are just the Riemann sums for the matrix elements (integrals) of . Therefore, the Hessian matrices on the discrete grid converge to the continuous Hessian matrix in any matrix norm according to:

Define

then by equation (7), the elements of are just the Riemann sums for the elements of , the moments of . Therefore, the right-hand side of the normal equations on the discrete grid converge to the right-hand side of the continuous normal equations in any vector norm according to:

Proposition 2

The polynomial approximations on the discrete grids converge to the continuous polynomial approximation with the same order as the Riemann sums.

Proof

From equation (42), the solution of the polynomial approximation in is

From equation (44), the solution for the discrete grid is

For the matrix inverses

Expanding the difference of the solution coefficient vectors completes the proof:

#### The Relation of the Normal Equations Method to Gram–Schmidt Orthonormalization on

Theorem 5

The Hessian matrix of the normal equations on is related to the lower-triangular GramSchmidt coefficient matrix from Proposition 1 and the upper-triangular matrix from Theorem 4 by

Equivalently, is the lower-triangular Cholesky factor and the upper-triangular Cholesky factor of the symmetric positive-definite Hessian matrix of the normal equations method.

Proof

From Theorem 4 and because ,

By [4], Theorem 4.2.7, the Cholesky decomposition of a symmetric positive-definite square matrix is unique.

Equations (24) and (26) give the relation for the 2-condition numbers of the matrices.

Corollary 2

#### Definitions to Solve Normal Equations

This defines the continuous and discrete Hessian matrices.

These are the righthand sides of the continuous and discrete normal equations.

This gives the solution of the normal equations system.

This gives the approximation polynomials for the continuous and discrete cases.

#### Illustrating Theorem 4 for the General Interval

This gives the GramSchmidt coefficient matrix, its inverse and inverse transpose.

The matrix times its transpose equals the Hessian matrix of the normal equations.

Equivalently, is the lower-triangular and is the upper-triangular Cholesky factor of the Hessian matrix .

#### Special Case: The Interval for Hilbert Matrices

For , the Hessian matrix is identical to the Hilbert matrix of dimension :

Hilbert matrices are ill-conditioned already for dimensions (i.e. have condition numbers greater than about 10,000) and soon reach the limits of 64-bit IEEE arithmetic.

#### Summary of Results for the Normal Equations Method

Here are the summary results. It takes some time for .

#### Characterization of the Normal Equations Solution

1. Continuous case: The Hessian matrix originates from the full-rank Vandermonde matrix of the orthogonalization solutions by taking .
2. Discrete case: The Hessian matrix comes from the full-rank Vandermonde matrix of the orthogonalization solutions by taking .
3. Consequently,
4. a. has full rank.
b. is positive definite.
c. The 2-condition number of is the square of the 2-condition number of the Vandermonde
matrix . This is the root cause of the inherent ill-conditioning of the normal equations.
5. gives the exact Hilbert matrix of dimension .
6. For other intervals , the Hessian matrix is close to a Hilbert matrix (Hilbert-type matrix).
7. Hilbert matrices and Hilbert-type matrices are very ill-conditioned already for dimensions greater than four and return extremely inaccurate numerical solutions.
8. For intervals symmetric around zero, , has a chessboard pattern of zeros, since the columns of are alternating orthogonal.
9. The normal equations are solved by Cholesky decomposition in the continuous and discrete settings.

### Comparison of Solution Methods

#### Gram–Schmidt Orthonormalization versus Normal Equations

This performs GramSchmidt orthonormalization for a special case. For other cases, change the 6 in to an integer between 1 and 10.

Here are the results from the normal equations.

The two solutions agree both symbolically and in exact arithmetic.

But there are numeric differences in IEEE 758 machine arithmetic.

These differences come from the lower error amplification expressed in a lower 2-condition number,

The numerical solution via the GramSchmidt orthonormalization solution is usually more accurate than the normal equations solution.

#### QR Factorization versus Normal Equations on an Equally Spaced Grid

Here is the QR factorization. Again, for other cases, change the 3 in to an integer between 1 and 10.

Normal equations.

The difference between the numerical solutions is due to the difference in how their rounding errors propagate.

Because of the lower error amplification expressed in a lower 2-condition number, the numerical solution via QR factorization is more accurate than the normal equations solution.

#### Convergence of the Right-Hand Sides of Normal Equations

This calculates the convergence order for grids of powers of 2.

Choose problem parameters. Yet again, for other cases, change the 7 in to an integer between 1 and 10.

Here are the approximation errors for grids of powers of 2.

Here is the convergence order. To see the result for another function, select another sample function in . Zero errors are suppressed.

For case 3, all but the first two elements of are zero. For case 5, all but the first element of are zero; therefore the logarithmic plots look incomplete.

Sample functions 3, 4, 5, 6 have discontinuities in the zeroth, first or second derivative; 7 and 8 have singularities in the first or second derivative. These sample functions and do not satisfy all the assumptions of Theorem 1. Therefore the convergence order of the right-hand side can be lower than 1 (respectively 2) as predicted by equation (47). Sample functions 1, 2, 9, 10 are infinitely often continuously differentiable; therefore they have maximum convergence order 1 (respectively 2) according to equation (47).

#### Convergence of the Normal Equations Solution

These are the approximation errors for grids of powers of 2.

This takes a few minutes.

This plots the convergence order.

### Relations between Solution Methods

Analyzing polynomial approximation, this article has systematically worked out the close relations between the solutions obtained by:

1. GramSchmidt orthonormalization and Fourier approximation
2. QR factorization
3. Normal equations

The interrelations are:

1. The GramSchmidt orthonormalization applied to the monomials on is reformulated as QR factorization of the Vandermonde matrix with the following one-to-one correspondences.
2. a. The columns of the matrix are the orthonormal polynomials on
b. The upper-triangular matrix is the transpose inverse of the coefficient matrix of
the orthonormal polynomials .
c. The numerical condition number of equals the numerical condition number of
with respect to the 2-norm.
d. The elements of are the Fourier coefficients , where is
the transpose operator.
e. The elements of are the coefficients of the Fourier polynomial
expanded into monomials.
3. For , GramSchmidt orthonormalization returns scalar multiples of the classical Legendre polynomials.
4. The Hessian matrix in the normal equations originates from the full-rank Vandermonde matrix of the orthogonalization solutions by taking .
5. The 2-condition number of is the square of the 2-condition number of the Vandermonde matrix .
6. The upper Cholesky factor of is identical to the upper-triangular matrix of the QR factorization of in both the continuous and discrete cases.
7. is identical to of the QR factorization in both the continuous and discrete cases.
8. The QR and normal equations solutions on an equally spaced grid of points in the interval converge to the solution in with the same order as the error of the quadrature formulas.

### Summary

This article has analyzed the polynomial approximation of a real-valued function with respect to the least-squares norm in different settings:

• 2-norm on an equally spaced grid of points in the interval

Three different solution methods for this least-squares problem have been compared:

1. The orthogonalization solutions
2. a. The GramSchmidt orthonormalization in the continuous setting
b. The matrix QR factorization in the discrete setting
3. The normal equations solution in the continuous and discrete settings

All definitions and solution methods were implemented in Mathematica 11.1.

All solution characteristics and their relations were illustrated by symbolic or numeric examples and graphs.

### Acknowledgments

The author thanks Alexandra Herzog and Jonas Gienger for critical proofreading and the anonymous referees for valuable improvements of the paper. The paper evolved from a lecture series in numerical analysis, given by the author for ESOC staff and contractors from 2012 to 2016, using Mathematica:

• Lecture #4: Linear Least-Squares Systems
• Lecture #9: Interpolation and Approximation of Functions, Curve and Surface Fitting

### References

 [1] E. Hewitt and K. Stromberg, Real and Abstract Analysis, New York: Springer-Verlag, 1975. [2] K. Yosida, Functional Analysis, 5th ed., New York: Springer-Verlag, 1978. [3] P. J. Davis and P. Rabinowitz, Methods of Numerical Integration, 2nd ed., London: Academic Press, 1984. [4] G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., Baltimore: The John Hopkins University Press, 2013. [5] E. W. Weisstein. “Vandermonde Determinant” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/VandermondeDeterminant.html. [6] J. D. Faires and R. Burden, Numerical Methods, 3rd ed., Pacific Grove, CA: Thomson/Brooks/Cole, 2003. G. Gienger, “Polynomial Approximation,” The Mathematica Journal, 2017. dx.doi.org/doi:10.3888/tmj.19-2.

20112016 ESA/ESOC research and technology management office