The 
Mathematica Journal
Volume 9, Issue 4

Search

In This Issue
Articles
Tricks of the Trade
In and Out
Trott's Corner
New Products
New Publications
Calendar
News Bulletins
New Resources
Classifieds

Download This Issue 

About the Journal
Editorial Policy
Staff and Contributors
Submissions
Subscriptions
Advertising
Back Issues
Contact Information

On the Numerical Accuracy of Mathematica 5.0 for Doing Linear and Nonlinear Regression
Marc Nerlove

2. The NIST StRD Datasets and Certified Results for Linear and Nonlinear Regression

The NIST describes the statistical benchmark program on their website as follows.

"Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method.

"Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software" (2).

(a) The NIST StRD Datasets for Linear Regression

The particular sets for linear regression analysis are summarized in Table 1. The problem of linear regression and the methods employed by the NIST, as well as Mathematica, are discussed in Section 3 (a).

DatasetLevel ofModelNumber ofNumber of
NameDifficultyClassParametersObservationsSource
Norris [6]LowerLinear 236Observed
Pontius [7]LowerQuadratic 340Observed
NoInt1 [8]AverageLinear 111Generated
NoInt2 [8]AverageLinear 1 3Generated
Filip [9]HigherPolynomial1182Observed
Longley [10]HigherMultilinear 716Observed
Wampler1 [11]HigherPolynomial 621Generated
Wampler2 [11]HigherPolynomial 621Generated
Wampler3 [11]HigherPolynomial 621Generated
Wampler4 [11]HigherPolynomial 621Generated
Wampler5 [11]HigherPolynomial 621Generated

Source: www.itl.nist.gov/div898/strd/ils/ils.shtml

Table 1. StRD benchmark datasets for linear regression.

The following edited comments from the NIST website describe these datasets in more detail.

"Both generated and 'real-world' data are included. Generated datasets challenge specific computations and include the Wampler data developed at NIST (formerly NBS) in the early 1970s. Real-world data include the challenging Longley data, as well as more benign datasets from our statistical consulting work at NIST.

"... Two datasets are included for fitting a line through the origin. We have encountered codes that produce negative R-squared and incorrect F-statistics for these datasets. Therefore, we assign them an 'average' level of difficulty. Finally, several datasets of higher level of difficulty are provided. These datasets are multicollinear. They include the Longley data and several NIST datasets developed by Wampler.

"... Certified values are provided for the parameter estimates, their standard deviations, the residual standard deviation, R-squared, and the standard ANOVA table for linear regression. Certified values are quoted to 16 significant digits and are accurate up to the last digit, due to possible truncation errors.

"... If your code fails to produce correct results for a dataset of higher level of difficulty, one possible remedy is to center the data and rerun the code. Centering the data, that is, subtracting the mean for each predictor variable, reduces the degree of multicollinearity. The code may produce correct results for the centered data. You can judge this by comparing predicted values from the fit of centered data with those from the certified fit." (www.itl.nist.gov/div898/strd/ils/ils_info.shtml).

(b) The NIST StRD Datasets for Nonlinear Regression

The particular sets for nonlinear regression analysis are summarized in Table 2. The problem of nonlinear regression and the methods employed by the NIST, as well as Mathematica, are discussed in Section 3 (b).

DatasetLevel ofModelNumber ofNumber of
NameDifficultyClassificationParametersObservationsSource
Misra1a [12]LowerExponential2 14Observed
Chwirut2 [13]LowerExponential3 54Observed
Chwirut1 [13]LowerExponential3214Observed
Lanczos3 [14]LowerExponential6 24Generated
Gauss1 [15]LowerExponential8250Generated
Gauss2 [15]LowerExponential8250Generated
DanWood [16]LowerMiscellaneous2 6Observed
Misra1b [12]LowerMiscellaneous2 14Observed
Kirby2 [17]AverageRational5151Observed
Hahn1 [18]AverageRational7236Observed
Nelson [19]AverageExponential3128Observed
MGH17 [20]AverageExponential5 33Generated
Lanczos1 [14]AverageExponential6 24Generated
Lanczos2 [14]AverageExponential6 24Generated
Gauss3 [15]AverageExponential8250Generated
Misra1c [12]AverageMiscellaneous2 14Observed
Misra1d [12]AverageMiscellaneous2 14Observed
Roszman1 [21]AverageMiscellaneous4 25Observed
ENSO [22]AverageMiscellaneous9168Observed
MGH09 [23]HigherRational4 11Generated
Thurber [24]HigherRational7 37Observed
BoxBOD [25]HigherExponential2 6Observed
Rat42 [26]HigherExponential3 9Observed
MGH10 [27]HigherExponential3 16Generated
Eckerle4 [28]HigherExponential3 35Observed
Rat43 [26]HigherExponential4 15Observed
Bennett5 [29]HigherMiscellaneous3154Observed

Source: www.itl.nist.gov/div898/strd/nls/nls_main.shtml

Table 2. StRD benchmark datasets for nonlinear regression.

The following edited comments from the NIST website describe these datasets in more detail.

"... Hiebert [30] notes that 'testing to find a "best" code is an all but impossible task and very dependent on the definition of "best."' Whatever other criteria are used, the test procedure should certainly attempt to measure the ability of the code to find solutions. But nonlinear least squares regression problems are intrinsically hard, and it is generally possible to find a dataset that will defeat even the most robust codes. So most evaluations of nonlinear least squares software should also include a measure of the reliability of the code, that is, whether the code correctly recognizes when it has (or has not) found a solution. The datasets provided here are particularly well suited for such testing of robustness and reliability. [Emphasis added.]

"... both generated and 'real-world' nonlinear least squares problems of varying levels of difficulty [are included]. The generated datasets are designed to challenge specific computations. Real-world data include challenging datasets such as the Thurber problem, and more benign datasets such as Misra1a. The certified values are 'best-available' solutions, obtained using 128-bit precision and confirmed by at least two different algorithms and software packages using analytic derivatives.

"... For some of these test problems, however, it is unreasonable to expect the correct solution from a nonlinear least squares procedure when finite difference derivatives are used.... These difficult problems are impossible to solve correctly when the matrix of predictor variables is only approximate because the user did not supply analytic derivatives.

"... the datasets have been ordered by level of difficulty (lower, average, and higher). This ordering is meant to provide rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will correctly solve all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your own particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software.

"The robustness and reliability of nonlinear least squares software depends on the algorithm used and how it is implemented, as well as on the characteristics of the actual problem being solved. Nonlinear least squares solvers are particularly sensitive to the starting values provided for a given problem. For this reason, we provide three sets of starting values for each problem: the first is relatively far from the final solution; the second relatively close; and the third is the actual certified solution.

"... sometimes good starting values are not available. For testing purposes, therefore, it is of interest to see how a code will perform when the starting values are not close to the solution, even though such starting values might be ridiculously bad from a practical standpoint. In general, it can be expected that a particular code will fail more often from the starting values far from the solution than from the starting values that are relatively close. How serious it is that a code fails when using starting values far from the solution will depend on the types of problems for which the code will be employed." (www.itl.nist.gov/div898/strd/nls/nls_info,shtml).

In my assessment of Mathematica 5.0, I use only the NIST starting values, which are different from the certified results, except in one instance. This is the MGH10 dataset, for which Mathematica gives no results for both the first and second starting values. When the certified results are used as the starting value, Mathematica crashes. The problem of choosing good starting values is central to nonlinear regression. As discussed later, I believe that such a choice should not be made automatically, but rather involve some prior investigation.



     
About Mathematica | Download Mathematica Player
© Wolfram Media, Inc. All rights reserved.