![]() Volume 9, Issue 4 Articles Tricks of the Trade In and Out Trott's Corner New Products New Publications Calendar News Bulletins New Resources Classifieds Download This Issue Editorial Policy Staff and Contributors Submissions Subscriptions Advertising Back Issues Contact Information |
On the Numerical Accuracy of Mathematica 5.0 for Doing Linear and Nonlinear Regression
3. Methods for Linear and Nonlinear Regression(a) Linear RegressionThe general statistical model assumed for the linear least-squares regression problems is
where
have certain optimal properties, and the estimated standard errors of
where
is an estimate of
are the calculated residuals from the regression. The measure
when an intercept (frequently represented by including a column of ones in the matrix
when no intercept is included (e.g., NoInt1 and NoInt2 [8]). Here is what the NIST says about the method employed for linear regression. "For all datasets, multiple precision calculations (accurate to 500 digits) were made using the preprocessor and FORTRAN subroutine package of Bailey (1995). Data were read in exactly as multiple precision numbers and all calculations were made with this very high precision. The results were output in multiple precision, and only then rounded to fifteen significant digits. These multiple precision results are an idealization. They represent what would be achieved if calculations were made without roundoff or other errors. Any typical numerical algorithm (i.e., not implemented in multiple precision) will introduce computational inaccuracies, and will produce results which differ slightly from these certified values." (See the certification methodology description for each dataset at www.itl.nist.gov/div898/strd/lls/lls.shtml, accessed 07/12/03.) Although not generally advisable, it is perfectly possible to simply write out these equations in Mathematica and solve a particular problem, since standard methods for solving for inverses of well-conditioned matrices may not give accurate answers for ill-conditioned matrices. This was what, in fact, went wrong with most of the extant computer packages for doing ordinary least-squares regression when Longley [10] did his famous study. Mathematica, however, is very sophisticated about the way it inverts matrices and, indeed, gets this problem right, albeit with a warning message in the program exhibited for the computation of invX1. Daniel Lichtblau comments on the difference between these two methods of doing OLS linear regression as follows: "What we [Mathematica] actually work with [in the add-on package, Statistics`LinearRegression`] are singular values. In effect, this is a method to compute a pseudo-inverse (which we do not actually do, unless necessitated by the various report option settings). In contrast, matrix inversion will be done using Gaussian elimination, with appropriate warnings if the matrix is found to be ill conditioned." Here is my "direct" program.
The reader, of course, will have to fill in the exact location of the Longley.txt file, which is copied as a .txt file in Appendix C. The Mathematica parameter estimates and standard errors compared with the benchmark results are:
And of
If I were to exhibit the results to 15 digits, the Mathematica and benchmark results would also coincide exactly. Section 3.1.4 of The Mathematica Book [31] gives a good discussion of the way in which Mathematica handles the problem of numerical precision in general and floating-point arithmetic in particular. So when doing linear least-squares regression it does not appear to matter whether I use my own code or Mathematica's Regress function. I have not been able to find documentation as to exactly how Mathematica performs a linear least-squares regression with its Regress function, other than the full code for LinearRegression in AddOns/StandardPackages/Statistics`LinearRegression`. (See Daniel Lichtblau's earlier comments.) Needless to say, it is not so easy for a nonprogrammer to read this code. It appears that Mathematica uses the same computationally efficient and accurate methods to solve the usual least-squares normal equations as it does to solve any system of linear equations or to invert a numerically specified square matrix. See Section A.9.4 of The Mathematica Book [31]. Comparisons of the results from Regress with the NIST benchmarks for each of the 11 NIST SdRD linear regression examples are presented in summary form in Appendix A. The results presented for Mathematica are simply those Mathematica prints out. The reader, however, should be aware that these are not the actual results of Mathematica's computation, which are all carried out to 16 digits of precision, but rather a rounded form for ease of visual presentation. The full 16 digits can easily be obtained by using NumberForm[expr, {n, f}], which prints approximate real numbers having n digits, with f digits to the right of the decimal point. (b) Nonlinear RegressionFollowing the NIST description, the statistical model assumed for nonlinear regression in the benchmark calculations is the so-called generic univariate case:
where "The certified values for the nonlinear least-squares regression problems were obtained using 128-bit precision with the reported results confirmed by at least two different algorithms and software packages using analytic derivatives. "The certified values for the estimates,
where
it follows that these are the maximum-likelihood estimators. "The certified values for the standard deviations of the estimates of the model parameters are the square roots of the diagonal elements of the asymptotic covariance matrix,
where
evaluated at the current values of the parameters "The certified value of the residual sum of squares is defined by
where The benchmark estimate of Under the assumption Before proceeding, it is worth noting a few limitations of the standard univariate nonlinear regression model,
In this case, the distribution of the There is a extensive literature on nonlinear regression, including standard texts, such as Gallant [36], Bates and Watts [37], and Seber and Wild [38], each with somewhat different emphasis. Most emphasis falls on computational issues and problems of parametric inference. A very good brief expository introduction is Gallant [39]. Most algorithms for computing nonlinear regression parameters are based on Hartley's [40] modified Gauss-Newton method (Hartley and Booker, [41]) or as extended and modified by Marquardt [42] following a suggestion by Levenberg [43]. There is an important difference between maximizing/minimizing functions in general and the problem of finding the minimum of an analytically specified sum of squares (see the previous SS). In general, we can evaluate only the function itself analytically and perhaps its gradient. Using only these two pieces is the basis for the method of steepest descent, sometimes also known as the Gauss-Newton method. But, in the case of nonlinear least squares, we know a great deal more: we know the Hessian matrix. Hartley's insight was to exploit this information explicitly, rather than build it up piecemeal during the iterative process of descent. His modified Gauss-Newton method is called the inverse Hessian method. Levenberg's insight was to recognize that simple steepest descent is much faster and more accurate when you are close to the minimum and to suggest separating the problem of getting into the neighborhood of the minimum from "zipping in." Marquardt's contribution was an elegant algorithm designed to vary smoothly between the two methods as the minimum point was approached. Mathematica's NonlinearRegress uses the Levenberg-Marquardt algorithm by default, but also allows you to choose among several other algorithms for function minimization. In [30, 44], Hiebert did a careful evaluation of existing programs for doing nonlinear least squares and for solving systems of nonlinear equations (a harder problem). One surprising conclusion he drew [30, 15] was that, "On the basis of this testing and these specific implementations of the Levenberg-Marquardt and the augmented Gauss-Newton methods, one method does not appear to be superior to the other." Perhaps, for this reason, the NIST uses a variety of methods, each considered appropriate for the problem at hand. However, the NIST is not more explicit about the methods used in specific cases than the statements quoted earlier. Standard-error calculations and inference for nonlinear least-squares estimates are all based on asymptotic results (Hartley and Booker, [41]; Jennrich, [45]) and deftly summarized by Gallant [39]. These are the basis for the NIST calculations detailed earlier. Mathematica does not say how its calculations are carried out in the description of the NonlinearRegress function; however, the code is available in the Statistics`NonlinearRegress`package, and it appears that the same formulae are used. Comparisons of the results from NonlinearRegress with the NIST benchmarks for each of the 27 NIST StRD nonlinear regression examples with their published starting values are presented in summary form in Appendix B. The NIST actually ran all computations using not only these two starting values but also the final benchmark answers as starting values because some nonlinear regression algorithms have a problem if you start from the right answer. However, in a few cases, I also tried starting at the benchmark answers: in the case of MGH10, as noted earlier, and for those few starting values for which Mathematica did not converge: Lanczos2; MGH17; MGH09; as well as MGH10. However, as noted by the NIST, MGH10 is a particularly difficult problem: "This problem was found to be difficult for some very good algorithms. There is a local minimum at (+inf, -14.07 ... , -inf, -inf) with final sum of squares 0.00102734." [46, 17] Lanczos3 is also a rather peculiar function, but Mathematica generally performs quite well for chosen starting values. The crucial problem of choosing good starting values is discussed extensively by Bates and Watts [37, 72-76], and, more briefly, by Seber and Wild [38, 665-666]. Bates and Watts suggest using a linearized form of the expectation function, while Seber and Wild suggest grid search or randomized starting values. Ratkowsky [26] gives a much more extensive discussion. When only one or two parameters are involved, a graphical approach works wonders. However, when three or more parameters are involved, special techniques are required and the difficulties are formidable. I have been unable to ascertain how the NIST chose their starting values, which are always one quite close to and another far away from the final answer, but Mathematica does offer an alternative. Before invoking NonlinearRegress, you could use NMinimize with no starting values (or some known to cause problems) assumed to find the minimum of the appropriate sum of squares. This is a very elaborate and well-documented method in Mathematica for finding global minima. Such more sophisticated methods may not be very efficient compared to the algorithm Mathematica employs in NonlinearRegress, but may work in such difficult cases. In computing the results reported here, I avoid the problem of choosing starting values by simply using those reported by the NIST.
|
||||||||
About Mathematica | Download Mathematica Player © 2005 Wolfram Media, Inc. All rights reserved. |