![]() Volume 9, Issue 4 Articles Tricks of the Trade In and Out Trott's Corner New Products New Publications Calendar News Bulletins New Resources Classifieds Download This Issue Editorial Policy Staff and Contributors Submissions Subscriptions Advertising Back Issues Contact Information |
On the Numerical Accuracy of Mathematica 5.0 for Doing Linear and Nonlinear Regression
2. The NIST StRD Datasets and Certified Results for Linear and Nonlinear RegressionThe NIST describes the statistical benchmark program on their website as follows. "Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. "Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software" (2). (a) The NIST StRD Datasets for Linear RegressionThe particular sets for linear regression analysis are summarized in Table 1. The problem of linear regression and the methods employed by the NIST, as well as Mathematica, are discussed in Section 3 (a).
Source: www.itl.nist.gov/div898/strd/ils/ils.shtml Table 1. StRD benchmark datasets for linear regression. The following edited comments from the NIST website describe these datasets in more detail. "Both generated and 'real-world' data are included. Generated datasets challenge specific computations and include the Wampler data developed at NIST (formerly NBS) in the early 1970s. Real-world data include the challenging Longley data, as well as more benign datasets from our statistical consulting work at NIST. "... Two datasets are included for fitting a line through the origin. We have encountered codes that produce negative R-squared and incorrect F-statistics for these datasets. Therefore, we assign them an 'average' level of difficulty. Finally, several datasets of higher level of difficulty are provided. These datasets are multicollinear. They include the Longley data and several NIST datasets developed by Wampler. "... Certified values are provided for the parameter estimates, their standard deviations, the residual standard deviation, R-squared, and the standard ANOVA table for linear regression. Certified values are quoted to 16 significant digits and are accurate up to the last digit, due to possible truncation errors. "... If your code fails to produce correct results for a dataset of higher level of difficulty, one possible remedy is to center the data and rerun the code. Centering the data, that is, subtracting the mean for each predictor variable, reduces the degree of multicollinearity. The code may produce correct results for the centered data. You can judge this by comparing predicted values from the fit of centered data with those from the certified fit." (www.itl.nist.gov/div898/strd/ils/ils_info.shtml). (b) The NIST StRD Datasets for Nonlinear RegressionThe particular sets for nonlinear regression analysis are summarized in Table 2. The problem of nonlinear regression and the methods employed by the NIST, as well as Mathematica, are discussed in Section 3 (b).
Source: www.itl.nist.gov/div898/strd/nls/nls_main.shtml Table 2. StRD benchmark datasets for nonlinear regression. The following edited comments from the NIST website describe these datasets in more detail. "... Hiebert [30] notes that 'testing to find a "best" code is an all but impossible task and very dependent on the definition of "best."' Whatever other criteria are used, the test procedure should certainly attempt to measure the ability of the code to find solutions. But nonlinear least squares regression problems are intrinsically hard, and it is generally possible to find a dataset that will defeat even the most robust codes. So most evaluations of nonlinear least squares software should also include a measure of the reliability of the code, that is, whether the code correctly recognizes when it has (or has not) found a solution. The datasets provided here are particularly well suited for such testing of robustness and reliability. [Emphasis added.] "... both generated and 'real-world' nonlinear least squares problems of varying levels of difficulty [are included]. The generated datasets are designed to challenge specific computations. Real-world data include challenging datasets such as the Thurber problem, and more benign datasets such as Misra1a. The certified values are 'best-available' solutions, obtained using 128-bit precision and confirmed by at least two different algorithms and software packages using analytic derivatives. "... For some of these test problems, however, it is unreasonable to expect the correct solution from a nonlinear least squares procedure when finite difference derivatives are used.... These difficult problems are impossible to solve correctly when the matrix of predictor variables is only approximate because the user did not supply analytic derivatives. "... the datasets have been ordered by level of difficulty (lower, average, and higher). This ordering is meant to provide rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will correctly solve all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your own particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. "The robustness and reliability of nonlinear least squares software depends on the algorithm used and how it is implemented, as well as on the characteristics of the actual problem being solved. Nonlinear least squares solvers are particularly sensitive to the starting values provided for a given problem. For this reason, we provide three sets of starting values for each problem: the first is relatively far from the final solution; the second relatively close; and the third is the actual certified solution. "... sometimes good starting values are not available. For testing purposes, therefore, it is of interest to see how a code will perform when the starting values are not close to the solution, even though such starting values might be ridiculously bad from a practical standpoint. In general, it can be expected that a particular code will fail more often from the starting values far from the solution than from the starting values that are relatively close. How serious it is that a code fails when using starting values far from the solution will depend on the types of problems for which the code will be employed." (www.itl.nist.gov/div898/strd/nls/nls_info,shtml). In my assessment of Mathematica 5.0, I use only the NIST starting values, which are different from the certified results, except in one instance. This is the MGH10 dataset, for which Mathematica gives no results for both the first and second starting values. When the certified results are used as the starting value, Mathematica crashes. The problem of choosing good starting values is central to nonlinear regression. As discussed later, I believe that such a choice should not be made automatically, but rather involve some prior investigation.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
About Mathematica | Download Mathematica Player © 2005 Wolfram Media, Inc. All rights reserved. |