This article shows how to use some of Mathematica’s built-in financial functions and define new functions useful for the practical analysis of real-world financial data. The main topics covered are linear programming and its application in bond portfolio management, conditional value-at-risk minimization, introductory time-series analysis, simulation, bootstrapping, robust equity portfolio optimization and artificial intelligence.

Our main objective is to apply the Wolfram Language to solve financial models. We do not explain financial concepts or any mathematical background related to the financial applications introduced in this article. Nor do we introduce Mathematica. Wellin [1] gives a good introduction to programming in Mathematica. Mathematical background related to the models discussed here can be found in standard textbooks, including the ones cited.

First, we define three supporting functions used in the rest of this article.

The function downloads historical stock returns given its four arguments:

• a list of one or more ticker symbols

• the start date as a date object

• the end date as a date object

• the period as the frequency of data

The function computes 11 basic descriptive statistics given a list or a matrix of returns as its input argument.

The function was taken from Stack Exchange Network (https://mathematica.stackexchange.com/questions/194234/plot-of-histogram-similar-to-output-from-risk) and modified. The function takes a vector of numerical data and returns a histogram with a handle on each side. You can drag these two thin vertical gray lines to vary the percentage of data within two values.

For example, this downloads monthly historical stock returns for Boeing Company (BA), Apple Inc. (AAPL) and NVIDIA Corporation (NVDA) for the period May 1, 2000, to May 30, 2019, and computes descriptive statistics. We chose returns of the Boeing Company for the histogram. Drag the handles at the far ends of the histogram (the thin vertical lines) to see the percentage of values that lie within two values.

The article is organized as follows:

• Section 2 introduces linear programming and applies it to bond portfolio management.

• Section 3 discusses mean-conditional value-at-risk portfolio optimization.

• Section 4 shows how to use built-in functions for introductory financial time-series analysis.

• Section 5 describes how simulation can be used in capital budgeting.

• Section 6 applies bootstrapping to risk management and financial planning.

• Section 7 describes robust equity portfolio optimization.

• Section 8 introduces functions useful for machine learning.

• Finally, the last section concludes the discussion.

In this section, we illustrate a few applications of linear programming to financial problems similar to those in Cornuéjols, Peña and Tütüncü [2]. A linear program is an optimization problem whose purpose is to minimize or maximize a linear objective function subject to linear constraints. We first decide the decision variables, objective function and constraints. Then we find the values of the decision variables to optimize the objective function.

For vectors , and and matrices and , we can specify a general linear program in the following form:

We can convert any constraint to by adding auxiliary variables.

The Wolfram Language has built-in functions to solve linear optimization problems with real variables. They include:

•

•

•

•

•

•

•

For large-scale problems, the most flexible and efficient of these is . The others are appropriate for solving linear programs written in terms of equations.

We consider an example very similar to the one presented in [2]. Assume that we have obligations to pay cash flows in the next eight years as shown in the following table. The first row shows years and the second row shows the amount of cash to be paid each year.

Also assume that we have five government bonds available to invest with the cash flows from the obligations and current prices given as follows.

To make a portfolio that minimizes the overall cost and still meets all the expected future cash payments, we can decide on how to allocate the funds by converting this problem into a linear program and solving it. Assume that is the number of bonds to purchase.

We define four variables for convenience.

Then the problem can be stated as follows.

This solves the problem.

The problem can also be solved using the function. One of its most useful forms is , where:

• is the vector of coefficients of the objective function.

• is the matrix of coefficients in the constraints.

• is a two-column matrix representing the constants on the right side of the constraints and the direction of inequality.

• is a two-column matrix of lower and upper bounds for the decision variables.

The variables and were defined before; here they are displayed in matrix form.

We define from and .

This solves the problem.

Mathematica’s built-in capabilities to solve linear programming problems can be used in a wide variety of financial problems. We refer interested readers to [2].

In this section, we solve the mean-CVaR portfolio problem, which was proposed by Rockafellar and Uryasev [3]. CVaR optimization does not depend on any assumption of how returns are distributed, but it works for the normal distribution. We summarize the linear programming formulation of the CVaR problem.

Following [3], is the joint density function of the underlying asset returns , where is the return on asset ; is the loss associated with the decision vector , where is the proportion of money invested in asset ; and is the -quantile of the loss distribution.

Then the CVaR can be defined as:

,

where is expectation and

For a sample size , the CVaR is approximately:

.

The problem can be restated as a linear optimization problem by introducing auxiliary variables, one for each observation in the sample:

Find ,

subject to and for .

As a linear program, the problem is:

Find ,

subject to

and for ,

and for ,

,

where , and is the target optimal portfolio return.

We define the function to estimate the optimal weights that minimize CVaR. It takes three arguments:

• the returns matrix ()

• the target portfolio return ()

• the confidence level (), between 0.9 and 0.99

This downloads monthly returns of three stocks over the period May 1, 2000, to May 30, 2019, and computes the CVaR-based optimal weights and associated values given the target portfolio return and confidence level.

The function computes optimal weights for a long-only portfolio. It can easily be modified to account for short-selling.

Data collected over time is common in finance. Mathematica has many built-in functions to model the stochastic nature of financial time series and to forecast the future value of a series. This section gives examples of functions that are useful for model specification, estimation, diagnostics and forecasting of univariate time-series data.

The first step in any exploratory analysis is to construct and plot time series. You can use the built-in functions and to construct financial time series as pairs . There are two formats for each:

•

•

•

•

Time-series data can be manipulated using many built-in functions; see the documentation.

Once we create a time series, we can use functions like or to visualize it.

We illustrate the historical global price of WTI Crude (POILWTIUSDM). First, we download the historical price of WTI crude oil since January, 1990, and then make a plot. Use the API key 207071a5f2e90e7816259d3c32c1ab81 if needed.

The built-in function supports both linear and nonlinear time-series models. It fits an automatically selected parametric model to a time series. We can customize the model fit specification by changing its options. The currently supported families of models are:

• autoregressive (AR)

• moving-average (MA)

• autoregressive moving-average (ARMA)

• autoregressive integrated moving-average (ARIMA)

• seasonal autoregressive moving-average (SARMA)

• seasonal integrated autoregressive moving-average (SARIMA)

• autoregressive conditionally heteroscedastic (ARCH)

• generalized autoregressive conditionally heteroscedastic (GARCH)

You can find descriptions of these models in any time-series books, including Tsay [4].

Although selects the model automatically, there are many built-in functions for choosing appropriate values for the parameterizations for a given family and checking the appropriateness of the fitted models.

Next, we are going to show the use of some tools for model specification and checking the adequacy of fitted models.

Use to test whether the data is autocorrelated. (Use to estimate the partial correlation function of the data.)

Use to test whether data comes from an autoregressive time-series process with unit root.

A number of other tools are available for model specification:

• Akaike information criterion (AIC)

• Finite sample corrected AIC (AICc)

• Bayesian information criterion (BIC)

• Schwartz–Bayes information criterion (SBC)

To choose the appropriate model for the , we can do the following. (For information on , see the Mathematica help system.)

Once the model has been specified, you can estimate its parameters with and you can assess its goodness of fit through analysis of residuals.

This estimates the parameters of the model for .

Use , or of the residuals to assess the whiteness of the model residuals.

A primary objective of building a time-series model is to forecast its future values. Prediction limits are important to assess the potential accuracy of the forecast. We can use to forecast unobserved future values. The function takes the methods , and . We use mean-squared errors to get the precision of our prediction.

This calculates and plots the forecast for the next 5 months of the series within 95% confidence limits.

The autoregressive conditional heteroskedasticity (ARCH) model and the generalized ARCH model (GARCH) are often used to get a volatility forecast of a time series. You can also use the built-in function to estimate parameters of these volatility models. Most volatility models are based on using returns that are obtained after subtracting unconditional mean returns. For our illustration, we de-mean our returns data. The parameters of the model are typically estimated with maximum likelihood.

To estimate a GARCH model for WTI Crude data, we find the de-mean data and estimate the model as follows.

You can use the built-in functions , , , , and to simulate time-series data and for risk management. is another important function to estimate the parameters of a process given its data and model specification.

A financial model consisting of fixed relations and variables may not be accurate because most relationships between financial variables are random. Therefore, we must be able to incorporate stochasticity. Monte Carlo simulation is widely used to represent the true features of random modeling. Simulation modeling is a computer-based modeling technique that mimics a real-life situation and helps to incorporate uncertainties in input variables. Such techniques give a distribution of a forecast variable, not just a single value. Therefore, it is very useful when we are uncertain about future outcomes. In this section, we give examples for simulating data using Mathematica and show an application in capital budgeting.

is a powerful function to get data from built-in statistical distributions, including those that are:

• continuous or discrete

• univariate or multivariate

• parametric or derived

• defined by data

Consider a situation in which you have to evaluate an investment by forecasting the present value of its future cash flows.

We define a function to compute the present value of future cash flows. Here are its 11 arguments and values for an example. Assume that the revenue and terminal value both follow a triangular distribution and that gross margin follows a uniform distribution.

For the given values, we simulate the data 10,000 times. (This takes a minute or so.)

We can summarize the data as follows.

Here is a histogram of the distribution of the present value of the cash flows. Drag the handles (thin vertical lines) on either side of the red region to see the percentage of cash flows that falls within the range of data.

This is just one example of simulation. We can define similar functions to compute the value of a firm using different valuation models:

• discounted cash flows

• residual operating income valuation

• abnormal growth in earnings valuation

There are many other areas of finance where simulation can be used.

We can apply the bootstrap approach in several contexts in finance. When the data is limited and the true distribution of the population is unknown, we can generate the sampling distribution of a statistic by generating many new samples from the original data and use the empirical distribution for statistical inference. This is called bootstrapping. Performing a bootstrap analysis in Mathematica is very straightforward using the or functions with or without replacement.

Performing bootstrap analysis entails two steps. First, we define a function that computes the statistic of interest. Second, we estimate the statistic of interest by repeatedly sampling observations (usually 10,000 times or more) from the original sample with replacement. Then we can use the distribution of sample statistics to infer an appropriate decision.

In this section, we illustrate the use of bootstrapping through two examples.

We consider estimating the distribution of an equally weighted portfolio’s conditional value-at-risk (CVaR) using weekly returns of Walmart stock (WMT) and Procter & Gamble (PG) over the period January 1, 1982, to March 30, 2019.

We get the historical weekly returns data.

We define two functions.

The function computes the conditional value-at-risk given the returns data.

The function returns a distribution of conditional value-at-risk measures given the size of the sample.

We use these functions to get distributions of CVaR with 10,000 observations.

We summarize the data.

The next example focuses on retirement planning using the bootstrapping concept. Assume that we want to calculate the terminal value of the following retirement portfolio. The savings are invested equally in two market indices: the S&P 500 Index and the NASDAQ 100 Index. Assume that future returns would be a random draw from past returns. The initial deposit is $1000. Monthly saving for the next 20 years is $1,500. The number of retirement years is 15. During the 15 years of retirement, $2000 will be withdrawn monthly. Starting at the 10th year, $30,000 will be withdrawn annually for three years.

Define the function , which calculates the terminal value of a retirement portfolio. It takes 10 arguments:

• returns of the stocks in which money is equally invested ()

• initial portfolio value (), which must be positive

• –years in service

• –periodic saving

• –frequency of contribution per year; coded as 12 for monthly data and 1 for annual data

• –number of retirement years

• –periodic income during retirement years

• –big annual withdraw amount during planning period

• –big withdrawal starting year; annual and in successive years with no gap

• –number of annual big withdrawals

Using the values given, this computes the terminal value.

Optimization under uncertainty (or robust optimization) is another approach that helps to get solutions that are good for most realizations of data. Many financial problems fit into this framework; for example, the mean-variance portfolio optimization problem. Here uncertainty may arise due to many factors:

• uncertainty in the mean vector

• fluctuations in the covariation matrix

• variability of risk in the market over time

• imprecise model approximation

Feasibility depends on both the decision variable and the uncertain vector ; uncertainty can be introduced in the expected value, the variance or both.

This section considers a mean-variance problem that allows some degree of variation in returns and covariances.

The standard form of a mean-variance problem is expressed in terms of the information about the expected returns and the covariance structure of the returns. For given asset returns , where is the return on asset ; and the decision vector , where is the proportion of money invested in asset ; the mean variance problem is written as:

subject to .

Here , where is the covariance between securities and ; is the risk coefficient; , where is the average return on security ; and is an column vector of ones, .

It is important to find the portfolio with the maximum Sharpe ratio, which can be obtained by solving the following problem; assume that and denote the risk-free rate by :

subject to .

The analytical solution to this basic problem for portfolio optimal portfolio weights is

.

We define the function to compute . It takes the arguments returns data () and risk-free rate () and returns an optimal weight vector.

One way to get a robust solution to the mean-variance problem is to sample data in several scenarios to estimate parameters. Assuming that the returns have multivariate normal distribution, we define a function to compute tangency portfolios with simulated data. The function takes four arguments:

• –returns data

• –risk-free rate

• –sample size

• –number of iterations

It returns a distribution of optimal Sharpe ratios.

We apply those two functions to simulated data and get the distribution of portfolio means that maximize the Sharpe ratio.

We download historical monthly returns as before, compute the distribution of maximum Sharpe ratios and draw a histogram.

Another way to get robustness in the parameter estimate is to introduce some kinds of uncertainty; interval uncertainty sets and ellipsoidal uncertainty sets are the most commonly used. Following Kim, Kim and Fabozzi [5] (using and instead of their and ), the interval uncertainty set for the expected returns can be defined as:

,

where , the variable is an estimate of the expected return and is a constant used to control the expected returns of stock .

The mean-variance problem with box uncertainty can be written as:

subject to and where is such that for .

This objective function can be modified as follows (see [5] for the derivation and explanation of the notation and ):

subject to and , where is the transformation matrix for a unit matrix of size ; for example, when , .

We define the function to find the solution to the mean variance optimization problem with box uncertainty. This function takes three arguments:

• –a matrix of stock returns

• –the risk coefficient

• –the confidence level for the uncertainty set

We compute optimal weights for the following portfolio.

In [5], an ellipsoidal uncertainty set on expected returns is defined as:

,

where is the covariance matrix of estimation error of expected returns and controls the size of the ellipsoid. With this uncertainty set, the mean variance problem can be written as:

,

subject to .

The covariance matrix of estimation errors can be approximated in several ways using the sample covariance matrix of stock returns. Assuming that the samples of stock returns are independent and identically distributed, [5] defines , where is the covariance matrix of stock returns and is the sample size.

Define to compute an optimal portfolio with an ellipsoidal uncertainty. It takes three arguments:

• returns ()

• value of risk coefficient ()

• confidence level for the uncertainty set ()

We compute optimal weights for our usual portfolio subject to an ellipsoidal uncertainty set.

We can use the built-in function to optimize the portfolio problem, introducing uncertainty in the mean returns or in the covariance.

Similarly, we can introduce uncertainty in risk and solve the optimization problem using this same function. See the documentation for an example.

With increasing computational resources and larger datasets, machine learning or artificial intelligence is a growing field in finance. A recent book by Dixon, Halperin and Bilokon [6] is a good reference on theory and applications.

Although the Wolfram Language includes a wide range of functions that work on many types of data, including numerical, categorical, time series, textual, image and audio, we focus on the function and time-series data only. The function uses input data and returns a predictor function that can be used to forecast the value of dependant variables given the values of independent variables. In this section, we show two examples using financial time series.

This example uses the Aruoba–Diebold–Scotti (ADS) business conditions index as one predictor of percentage change in the S&P 500 index. We define the function . It downloads the ADS index from the Federal Reserve Bank of Philadelphia and the S&P 500 index returns and then merges them for use in the function, given the start and end date as arguments.

Using this function, we download the data for the period January 30, 1970, to September 30, 2019, and generate a prediction function. (This takes several minutes.)

Once the prediction function is generated, we can use it to predict the future value of the stock index value. For instance, here it predicts percentage change in the S&P 500 index if ADS is 0.9, 2 or -1.2.

In this example, we use five monthly macro variables to predict percentage change in the value of the S&P 500 index:

• USSLIND–the leading index for the United States

• UMCSENT–the University of Michigan consumer sentiment index

• CFNAIMA3–the Chicago Fed national activity index: three-month moving average

• MICH–the University of Michigan inflation expectation index

• T10Y2YM–10-year Treasury constant maturity minus 2-year Treasury constant maturity

The function takes a list of macroeconomic series IDs (), start date and end date as input arguments. It returns values for specified macroeconomic variables and S&P 500 index returns in the format suitable for the function. The Federal Reserve Bank of St. Louis may require the API key to download its data. The API key can be obtained freely by creating a user account at https://fred.stlouisfed.org (click “my account” and follow the instructions). Use the API key 207071a5f2e90e7816259d3c32c1ab81 if needed.

Using this function, we download five macro variables as well as the S&P 500 index returns over the period January 30, 1983, to May 30, 2019.

We can generate the prediction function using the data and predict the value. can take the option to specify which regression method to use.

We have shown some applications of only the built-in function . However, the Wolfram Language comes with many other built-in functions that are useful in classification, discriminant analysis and neural networks. You can use these tools to learn from the data and build models to extract useful information. We encourage you to explore more about machine learning in Mathematica.

As financial data becomes increasingly available, serious data analysis requires knowing software to manipulate large datasets. Aside from demonstrating many built-in functions, we introduced many custom functions especially designed for technical computation of financial data. Mathematica can do much more than what we have shown in this article. The Wolfram Language in general and Mathematica in particular are well-suited to implement sophisticated financial models, including pricing securities, trading strategies, simulation, optimization, risk management and time-series analysis [7, 8]. Mathematica’s built-in knowledge is also very useful for asset pricing models based on estimating the stochastic discount factor using the generalized method of moments.

[1] | P. Wellin, Essentials of Programming in Mathematica, Cambridge, UK: Cambridge University Press, 2016. |

[2] | G. Cornuéjols, J. Peña and R. Tütüncü, Optimization Methods in Finance, 2nd ed., New York: Cambridge University Press, 2018. |

[3] | R. T. Rockafellar and S. Uryasev, “Optimization of Conditional Value-at-Risk”, The Journal of Risk, 2(3), 2000 pp. 21–41. https://doi.org/10.21314/JOR.2000.038. |

[4] | R. S. Tsay, An Introduction to Analysis of Financial Data with R, Hoboken, NJ: Wiley, 2013. |

[5] | W. C. Kim, J. H. Kim and F. J. Fabozzi, Robust Equity Portfolio Management, Hoboken, NJ: Wiley, 2016. |

[6] | M. Dixon, I. Halperin and P. Bilokon, Machine Learning in Finance from Theory to Practice, Cham, Switzerland: Springer, 2020. |

[7] | A. L. Lewis, Option Valuation under Stochastic Volatility: With Mathematica Code, Newport Beach, CA: Finance Press, 2000. |

[8] | A. L. Lewis, Option Valuation under Stochastic Volatility II: With Mathematica Code, Newport Beach, CA: Finance Press, 2016. |

R. Adhikari, “Selected Financial Applications,” The Mathematica Journal, 2021. doi.org/10.3888/tmj.23-5. |

Ramesh Adhikari is an Associate Professor of Finance at Humboldt State University. Prior to coming to HSU, he taught undergraduate and graduate students at Tribhuvan University and worked at the Central Bank of Nepal. He was also a research fellow at Osaka Sangyo University, Osaka, Japan. He earned a Ph.D. in Financial Economics from the University of New Orleans. He is interested in the areas of computational finance and high-dimensional statistics.

**Ramesh Adhikari**

*School of Business, Humboldt State University
1 Harpst Street
Arcata, CA 95521*

Mathematica has many built-in functions for doing research in graph theory. Formerly it was necessary to load the Combinatorica package to access these functions; most are now available within Mathematica itself. This article studies a problem concerning the vertex coloring of graphs using Mathematica by introducing some user-defined functions.

We only consider vertex colorings, so a “colored graph” always means a vertex-colored graph. An -coloring of a graph is a partition of the vertices into disjoint subsets. We start with 2-colorings; call the colors red and blue. Two vertices are *neighbors* if they are connected by an edge. We say that two vertices of the same color are *friends* and two vertices of opposite colors are *stranger*s. If more than half the neighbors of a colored vertex are friends of , we say that lives in a *friendly neighborhood*; otherwise, is said to live in an *unfriendly neighborhood*. If all the vertices of the graph have the same color, every vertex lives in a friendly neighborhood. Is there a 2-coloring such that every vertex lives in an unfriendly neighborhood? The surprising answer to this question is yes, as we shall show.

A 2-coloring of a graph is *unfriendly* if each vertex lives in an unfriendly neighborhood, that is, at least half its neighbors are colored differently from itself. It is a theorem that every finite graph has an unfriendly coloring. (The situation is much more complicated for infinite graphs [1, 2]). The proof is clever, but not very long and we give it next. Define the *mixing number* of a colored graph to be the number of its edges whose vertices have different colors. Proceed by successively “flipping,” that is, changing the color of those vertices that live in friendly neighborhoods. When a vertex is flipped, it may change the neighborhood status of other vertices; however, each flip increases the mixing number of the graph. Since the mixing number is bounded by the number of edges in the graph, this flipping process must eventually end with no more flippable vertices, that is, no more vertices living in friendly neighborhoods.

We start with an easy example. We first consider the complete graph with 7 vertices.

We will color the vertices either red or blue. Assume that we start with four blue and three red vertices.

Here is how we show the colors.

To calculate the mixing number, we first generate the set of edges.

Next we introduce , which determines if the vertices of an edge have the different colors.

Then selects those edges whose vertices have different colors.

Finally, we apply to .

This is the mixing number of .

It is easy to see by considering the other 2-colorings of that this coloring has the maximum mixing number for the graph and hence this coloring must be unfriendly. Four of one color and three of the other gives a mixing number of , whereas five of one color and two of the other gives a mixing number of , and so on. Of course, a direct inspection also shows the coloring is unfriendly.

We used the function to construct a graph with 20 vertices and 100 edges. Here is the list of edges that was generated.

Arbitrarily color the first 10 vertices red and the remaining 10 blue.

Here is the image of the graph, colored as before.

The function changes the color of a vertex.

For example, this flips vertex 1.

We need to get the set of *neighbors* of a vertex in a graph , that is, those vertices that share an edge with . Since the built-in Mathematica function includes the vertex itself, we remove it.

For example, here are the neighbors of vertex 3 in .

Next we obtain the edges whose vertices are colored differently.

The length of that list is the mixing number of the graph with edges and the color partition .

Call a vertex a *secure* vertex in a graph if lives in a friendly neighborhood, that is, has the same color as most of its neighbors in .

The function determines whether is a secure vertex in with the color partition .

For example, these are the secure vertices of .

We define the corresponding function, .

We select the first element in the list of secure vertices and set the colors.

We flip the color of since it lives in a friendly neighborhood, or equivalently, it is a secure vertex.

We could now repeat this step using the generated color partition and stop when there are no more secure vertices. In the next section we write the function to carry out the process to the end, with output the final color partition.

The following short program produces an unfriendly coloring of any 2-colored graph starting with the color partition .

Here is the unfriendly color partition.

Here is the unfriendly coloring of the graph .

We can even start with all vertices having the same color, say red.

We run our program and use the new color partition.

The max-cut problem is to partition the vertices of a graph into two sets and so as to maximize the number of edges whose endpoints are in both sets. However, this is equivalent to a two-coloring of the vertices of such that the size of the max cut (i.e. the number of edges joining the two cut sets) is the same as the mixing number of the coloring. The max-cut problem is known to be NP-hard [3, 4], which implies that methods of finding this maximum will not run in polynomial time unless P = NP, something most mathematicians consider unlikely.

In the context of the max-cut problem, our procedure, , is known as local search and can easily be shown to always produce a cut with at least half the number of edges in the graph [3].

We have shown how to find unfriendly 2-colorings of finite graphs, which can easily be extended to more colors. We feel that treating a vertex coloring as a partition of the list of vertices and regarding this as an integral and dynamic part of the graph should be of use in investigating other coloring problems.

Finally, I wish to dedicate this work to Bill Emerson, who worked on this problem with me over 30 years ago (see reference 3 in [1] to our unpublished paper). Bill was a very talented mathematician and a good friend. He is missed by all who knew him.

I am grateful to George Beck for improving my programming throughout the paper and to Professor Lenore Cowen for pointing out the connection to the max-cut problem.

[1] | R. Aharoni, E. C. Milner and K. Prikry, “Unfriendly Partitions of a Graph,” Journal of Combinatorial Theory, Series B, 50(1), 1990 pp. 1–10.https://doi.org/10.1016/0095-8956(90)90092-E. |

[2] | S. Shelah and E. C. Milner, “Graphs with No Unfriendly Partitions,” A Tribute to Paul Erdős (A. Baker, B. Bollobás and A. Hajnal, eds.), Cambridge, UK: Cambridge University Press, 1990pp. 373–384. |

[3] | D. Panigrahi, “COMPSCI 638: Graph Algorithms, Lecture 22.” (Sep 10, 2021) https://courses.cs.duke.edu//fall19/compsci638/fall19_notes/lecture22.pdf. |

[4] | M. R. Garey, D. S. Johnson and L. Stockmeyer, “Some Simplified NP-Complete Problems,” in Proceedings of the Sixth Annual ACM Symposium on Theory of Computing, New York: ACM, Seattle, WA, 1974 pp. 47–63. https://dl.acm.org/doi/10.1145/800119.803884. |

R. Cowen, “Mixing Numbers and Unfriendly Colorings of Graphs,” The Mathematica Journal, 2021. doi.org/10.3888/tmj.23–4. |

Robert Cowen is a Professor Emeritus at Queens College, CUNY. His main research interests are logic and combinatorics. He has enjoyed teaching students how to use Mathematica to do research in mathematics for many years.

**Robert Cowen**

*16422 75th Avenue
Fresh Meadows, NY 11366
*

We present a straightforward implementation of contour integration by setting options for and , taking advantage of powerful results in complex analysis. As such, this article can be viewed as documentation to perform numerical contour integration with the existing built-in tools. We provide examples of how this method can be used when integrating analytically and numerically some commonly used distributions, such as Wightman functions in quantum field theory. We also provide an approximating technique when time-ordering is involved, a commonly encountered scenario in quantum field theory for computing second-order terms in Dyson series expansion and Feynman propagators. We believe our implementation will be useful for more general calculations involving advanced or retarded Green’s functions, propagators, kernels and so on.

It is well known that we can integrate analytically a large class of functions with known anti-derivatives via ; otherwise, we can use for numerical results. There are various settings that one can use to evaluate integrals, depending on the task at hand. Crucially, this can be performed even if the integrand is complex valued, such as the function . In some cases, when the integrand is a distribution, this integration can also be done analytically by setting the option , for example, or by imposing a regulator, which in physics is often called a UV cutoff.

In this article, we are interested in a contour integral of the form

(1) |

where is a complex-valued function or distribution and is an integration contour, with possibly higher-dimensional generalizations. For a large class of functions and a contour , this can be done in various ways, such as using powerful techniques in complex analysis like Cauchy’s integral formula or the residue theorem. For many of these functions, can provide answers immediately, sometimes involving special functions.

The motivation for this article is based on the observation that there is no good documentation on how to perform explicit contour integration in Mathematica. It turns out that the ingredients necessary to perform this task are already available within the software. We believe that some of these ingredients are already used elsewhere for different purposes. Our task is to synthesize these components in a coherent way to document how to perform numerical contour integration, using only built-in commands and their options. We will show by way of examples that in many cases this method proves superior to direct integration with prescription, especially when the integrand is complicated or when the integral is multidimensional.

Consider the integral

(2) |

where , , and is the entire real line. The first equality comes from Sokhotsky’s formula [1] and the second equality uses principal value integration. This is a commonly used “epsilon-regularization” in physics when evaluating certain apparently divergent integrals, which has to do with the distributional nature of the integrand in question (see Appendix A in [1] for examples).

This integral can be solved using complex analysis instead of Sokhotsky’s formula, based on the observation that can be computed using the following contour.

The contour is the horizontal blue segment and the semi-circular arc . The integral over the semi-circular arc vanishes as by Jordan’s lemma [2, 3]. The contribution from the closed loop can be calculated in various ways, such as the residue theorem. The residue theorem says that since the closed loop encloses the pole at (using the standard counterclockwise convention [2, 3]), . Taking the limit as , we get the same result as before.

There is indeed a built-in function for calculating residues.

Another way to evaluate is based on the fact that the “ prescription” is an instruction to perform a specific contour integration (without an -regulator) in the sense that

(3) |

where is a contour chosen to go from to but deformed near the pole to the lower complex plane.

The prescription in (2) shifts the pole at to in the upper half-plane, so that we can integrate over the real line and then take the limit as . Consequently, if we were to remove the -regulator, the deformation theorem [2] would require that we deform the contour to the lower complex plane near in order to obtain the same value of the integral. (The deformation theorem [2] in complex analysis states that a contour integral remains constant under deformation of the contour that does not cross any poles or branch cuts.) For this contour, the residue is instead given by the following.

If we choose instead, the integral is zero. Here is an explicit computation for fixed .

The zero values have a simple explanation in terms of contour integration: since the pole is now shifted to the lower half-plane, the contour does not enclose any poles, which by Cauchy’s integral theorem guarantees that the integral is zero.

This is a good place to introduce the numerical contour integration scheme. As a simple example, we can use a rectangular contour.

For any and , this calculation gives the correct answer, as expected from direct calculation using residues or textbook pen-and-paper calculations (see e.g. [3]). Varying does not change the integral, as a consequence of the deformation theorem. In particular, other contours also work, such as, for example, a triangular contour.

For completeness, let us show that in this simple case the prescription (which is commonly used in physics) can also be done. We add the condition .

The final result involves , which tends to 1 as . Therefore, in this simple example, all the methods yield good results. It is straightforward to show that many common textbook examples (see e.g. [3]) can be performed in this way.

Finally, our main goal is to show that can also perform the task reliably. First do the prescription.

Here we see the first instance where the prescription starts to fail: when performed numerically, the most basic setting does not work, and increasing the working precision does not help much.

Remark: The command silences all error messages and is used for readability. In practice, in order to discover the issues and possibly infer the sources of problems, it is useful to not use on a first try.

Now do the contour integral using a triangular contour.

The contour integration works as is without further modification. The size of the contour does not matter, as long as it is not so small that it leads to numerical instabilities associated with small numbers.

We refer to the numerical computation using that involves contour deformation in the complex plane as numerical contour integration. We will see recurring examples where prescriptions, albeit simplest to implement, almost never work when needed in spite of optimizing various settings. Thus numerical contour integration is a highly competitive technique to use.

(Remark: Some of the discussion here was briefly outlined in [4, 5], which is based on what this article contains.)

The main example we consider in this article involves the integral over a distribution or bi-distribution of the form

(4) |

where , , and is a distribution or bi-distribution given by

(5) |

In physics, more specifically quantum field theory, this distribution is the two-point vacuum correlation function (also called the two-point Wightman function) of a massless scalar field in ()-dimensional Minkowski space [6].

(6) |

The arguments of the two-point function are the coordinates of two spacetime events and , where and in Cartesian coordinates and . The integral is evaluated at and is thus independent of the spatial coordinates. The prescription tells us that the Wightman function is a distribution or bi-distribution [6].

Our task is to evaluate the integral in (4). This is a two-dimensional complex integral over and with a continuum of poles at for any fixed . In the literature, gives the transition probability (divided by a small coupling constant ) of an Unruh–DeWitt (UDW) detector consisting of a two-level quantum system (qubit) with energy gap interacting with the massless scalar field for a finite duration prescribed by the Gaussian function with timescale set by (see e.g. [7] and references therein). The Gaussian functions in the integrand ensure that the interaction between the detector and the quantum field is adiabatically switched on and off smoothly; that is, the Fourier transform of the Wightman function has polynomial tails and is strongly suppressed by the exponential tails of the Fourier transform of the Gaussians. This guarantees that there is no spurious divergence in the transition probability due to switching on the detector very sharply or suddenly, such as with a switching function with discontinuity at an endpoint, like a rectangular function, which is unphysical. An analogy is switching on a lightbulb: in realistic scenarios the current in the circuit does not simply increase from zero to a constant value instantaneously, but rather smoothly over some short time interval.

The example in equation (4) is important for three reasons.

1. It is one of the simplest examples in quantum field theory; many other examples in physics involve distributions that take similar and sometimes more complicated forms (e.g. different power laws where or for some function ).

2. It is one of the simplest examples of a two-dimensional integral with a continuum of poles, thus going beyond standard textbook examples (e.g. in [3] all the examples are one dimensional).

3. Despite its somewhat complicated appearance, it admits a closed-form expression [7].

In [7] it is shown that the exact closed-form expression is given by

(7) |

where is the complementary error function. We want to show that numerically this result can be obtained in a satisfactory way. For this purpose, let us focus on a particular numerical value by setting , so that

(8) |

We can then use this value as a benchmark for our methods.

Direct integration for is given by the following command.

The most modest settings yield poor results.

Clearly, although the real parts are all positive (hence physically at least sensible for a transition probability), the results are very different from the benchmark value. The third value is wrong because the probabilities should be real (hence numerically the imaginary part should be much closer to zero).

Increasing as a first remedy does not help much, if at all.

Restricting the integration domain to (, ), where the Gaussian is strongly supported, improves the result, but not enough. (The negligible part of the integral associated to Gaussian tails was ignored.) To see that the behavior scales badly with , let us choose a particular setting.

The first two entries, while not accurate enough, are again at least physically valid results because the imaginary part is negligible and hence can represent transition probabilities. In contrast, the last three values have negative real parts and are thus invalid by default.

We leave it to the reader to verify that other possible settings, such as increasing or , do not help, nor does changing the setting from to another. There may be some combination of settings that makes this integration work, but if they exist then they are for our purposes not worth the time. We will in fact show that the exact same setting that yields the bad results above does work with numerical contour integration.

A possible reason for these issues is that prescription changes the asymptotic behavior at . Numerically, we are evaluating the integral from to , so has to deal with points at infinity that are shifted with a very small imaginary component.

Here is transformed into a numerical contour integral, . In contrast to the previous subsection, the -regulator now serves as a contour deformation near a reasonably shaped contour. Hence the asymptotic behavior of the integral is not altered by varying .

(9) |

where is a contour along the real line but deformed around the poles of the Wightman function. Here we have in the Wightman function because we have converted the prescription into an instruction to perform contour integration, so the is now part of the definition of the contour .

Here is the contour , where is the location of the continuum of poles for every fixed .

Since many contours work, let us first choose a particularly simple one: a rectangular deformation near the poles. If we integrate over first, then the continuum of poles is located in the complex plane at . Therefore, the contour should be deformed to the upper complex plane.

Based on the previous subsection, we see that using yields a better answer, since more than 99.99% of the area of the Gaussian in the integrand is contained in the interval . We will not set in what follows. Consequently, the parts of the contour from to and to have been truncated and do not yield significant error. This gives the following results.

This result may look undesirable, but the first value is numerically the same as the benchmark value up to very small imaginary part. This suggests that in the neighborhood of there is a good range that we can use to get a numerically accurate result; we will show that the result is invariant under smooth deformation of the contour. For example, if we zoom in between and , we get the following results.

That the results are unchanged as we vary (only the negligible error changes a little) is what we expect from the deformation theorem in complex analysis, since the deformation does not cross any singularities. This invariance under contour deformation is very useful because it is one of the consistency checks we can do (or perhaps need to do) in the absence of exact closed-form expressions and other cross-validation. Direct integration does not allow this, because every yields different values (because the asymptotics are also shifted by ). Furthermore, numerical contour integration is also remarkably fast.

Here is a different contour. The width is fixed but the height toward the imaginary axis varies.

This choice of contour highlights that while contour integration is much more powerful than the direct integration method, one should avoid numerical issues associated with values of that are either too large or too small; sometimes different contours yield different stability. Nonetheless, the main takeaway is that we have found that there is a large enough range of for which the contour can be varied but still yields the correct value (in this case, ). In this range, we can vary and the answer is invariant. The correct range of that provides reliable results must however be found empirically and depends strongly on the integrand. For completeness, we show the results for zooming into the range of (except that this time we need this to be ).

A significant advantage of numerical contour integration over direct integration is that it allows for robust consistency check: if we can find a range for in which the integral is constant (up to small numerical errors like a small imaginary part), then the integral is likely to be correct. This is an extremely important necessary condition, especially when a closed-form expression is not available. Of course, if there are other ways to check the correctness of the value (as in this example where we know the exact closed-form expression or if there are physical arguments to back it up), we should always also perform these other checks.

Next, we consider a more complicated integral,

(10) |

where , , , is the Heaviside step function and the two-point Wightman function is given in equation (5). This integral differs conceptually from Example 2 in two respects:

(1) the additional Heaviside step function in the integrand

(2) we also allow for and changed the sign on the phase

The Heaviside step function appearing in the integrand naturally arises in physics calculations where a Dyson series expansion is involved (i.e. where the notion of time ordering is necessary). In particular, this occurs in various time-dependent perturbation theory calculations within quantum mechanics and quantum field theory. The specific integral in (10) appears as part of the calculation of the nonlocal part of the joint two-detector density matrix of two qubits interacting with a massless scalar field in the so-called entanglement harvesting protocol [7, 8].

Perhaps somewhat surprisingly, this integral admits a closed-form solution [7], given by

(11) |

where . As before, we can assume that the coordinate system is aligned so that .

For benchmarking, assume and set . This gives

(12) |

Here is the calculation.

This is an alternative if one prefers arbitrary in the expression.

This number is complex, so we no longer have the benefit of verifying our numerical calculation by requiring that the imaginary part be small (which we used in Example 2).

Let us now evaluate numerically. We will not bother with the prescription anymore because it does not work; this is not surprising since the Heaviside step function only complicates the integral compared to Example 2. However, we will evaluate this integral in two ways, both using numerical contour integration, with the two methods differing in how they handle the Heaviside step function.

We rewrite the integral as:

(13) |

By suitable relabeling, the two integrals are equal (this property is used in [7, 8]), but we will not need this fact, since in more general situations the two integrals are different. (See e.g. [5, 9], where the upper limits differ slightly by a “relative redshift” factor.) Although equation (13) is written in the notation of an prescription, this is merely a shorthand for performing the numerical contour integration, since we already know that direct integration does not work.

Pick a rectangular contour and set it so that . For this choice, the two integrals have a continuum of poles at and , respectively. The contour will be chosen so that it does not cross any poles as varies.

We can then write the following integral command.

The most modest settings yield good results.

The contour parameter is good in the range and starts to deviate at around . Remarkably, we can compute this result very well despite the extra complications, with nearly no additional settings beyond .

The integral in equation (9) is actually written in a more useful form than the upper limit form in equation (12), in the sense that it accommodates a more general scenario where the two-point functions may not be simple functions of and *.* In general, the poles of the Wightman functions for the problem at hand can be located at , where is some function that does not have a simple inverse. In such a situation, the upper limit of the integrals is not simply or and may have to be worked out numerically.

More specifically, in the context of quantum field theory in curved spacetimes, the Wightman function in equation (5) is given in flat spacetime in terms of the Minkowski coordinates, and the integrals we computed thus far correspond to two Unruh–DeWitt (UDW) detectors separated by proper distance , both of which are at rest relative to this coordinate system. When the detectors are in motion, we have to replace with , where is Minkowski coordinate time and and are the proper times of each detector, which take the roles of and in the previous sections. Similarly, we have to replace with , where now takes the role of Minkowski spatial coordinates. In this case, in general it is not true that is a simple function of , so it may not always be possible to find an upper limit version similar to equation (12). The static detectors (at rest relative to Minkowski coordinates) correspond to the special case where , which can then be rewritten as . This complexity occurs more generally when the spacetime is not flat, such as when we consider two-dimensional truncation of Schwarzschild spacetime [5].

For this reason, it is useful to try to integrate equation (10) using the Heaviside step function directly. Since the built-in function is not defined for complex numbers, we approximate it with a smooth function. For example, the Heaviside step function is a limit of a logistic function:

(14) |

provided that we define . We can thus define an approximate Heaviside step function for any fixed by

(15) |

Any other function used in computing a cumulative probability distribution would work.

Our integral now becomes

(16) |

again with the rectangular contour of Figure 2. The task now is to manually find and that yield good results by trial and error. Again we set so that .

We define and .

The results are promising, since there is a range of that gives a result very close to the benchmark , but we can do better.

The key is to set so that . For example, for , a good choice of is 1/5 but not 1 or 1/12. The only difference in the next command from the last is in the fourth argument of , where is instead of 5. The results are slightly better.

We can study this relationship better for a smaller set of values of and judge the quality.

Thus we have obtained a good range of where the deformation theorem clearly works. We need to work with because the smooth approximation of the Heaviside step functions introduces infinitely many new poles in the complex plane, since has (countably) infinitely many poles along the imaginary axis. This can be easily seen by rewriting as:

(17) |

Therefore the choice of must depend on so that the contour does not cross extra poles, making the integral gain spurious contributions. Nonetheless, finding the right that works is an empirical process, especially when the Wightman function is very complicated. Our examples thus far are simple enough that they can be easily cross-validated and evaluated without much effort.

The above examples have been used by the author to calculate the density matrix elements of Unruh–DeWitt detectors interacting with massless scalar fields for various background spacetimes, where the calculations require time-dependent perturbation theory involving Dyson series expansions. Starting from the calculations in [4] where a primitive version was first used to study entanglement harvesting in two-dimensional spacetimes with accelerating mirror (see the Appendix in [4]), the technique was subsequently improved (using a full rectangular contour that is not close to the poles) for the entanglement harvesting problem in Schwarzschild and collapsing shell spacetimes [5]. In [5], the derivative-coupling functions (strictly speaking they are not Wightman functions but the proper time derivatives of the Wightman functions) are so complicated that it is somewhat surprising the upper-limit technique outlined above worked very well across large parameter spaces. The author has verified that the approximation of the Heaviside step function works well for calculations in [5], although with the caveat: when the derivative-coupling Wightman functions can be expressed as a sum of distinct terms, the part of the derivative Wightman function that contains and requires relatively small but the part that contains or may need very large . This highlights the need for empirically testing the choice of before proceeding with physical calculations.

We extend the results of [5] for harvesting of correlations by two quantum-mechanical detectors interacting with a quantum field in [9], where we consider more complicated detector trajectories. In this context, the authors verified that the upper-limit form analogous to equation (12) proved numerically unstable in physically relevant regimes, thus making indispensable the approximate Heaviside step functions as a way to calculate the nonlocal correlation term in entanglement harvesting contexts. In that setting, the calculation of the nonlocal term involves an integral similar to in equation (10, but with much more complicated numerical calculations than in [5]; in particular the upper limits are not simply affine functions of or . We will explain the technicalities specific to the problem at hand when invoking the techniques outlined in [9].

In quantum field theory in generic spacetimes (flat or curved), there are many situations of computing integrals over distributions. Our techniques should be general enough to use in such contexts, where correlation functions take a form similar to Wightman functions. For example, we have verified that our techniques work for computing field commutators [5] and in ()-dimensional rotating BTZ black holes [10, 11]. We are also able to reproduce the computations of transition rates as done for example in [12] (also see Appendix of [4]), which are fundamentally simpler because they are one-dimensional integrals over the Wightman functions.

In this article we have presented a straightforward implementation of contour integration using the and commands, taking advantage of powerful results in complex analysis. The goal is to provide a systematic example-based implementation of contour integration that does not require any user-defined functions and only requires setting options. We provide explicit examples of how this can be used when integrating analytically and numerically some commonly used distributions, such as Wightman functions in quantum field theory. We also provide efficient approximation schemes to compute quantities involving time-ordering operations where Heaviside step functions are required. Our examples are geared toward research in relativistic quantum information, where finite-dimensional quantum systems interact with a quantum field on a possible curved background spacetime. We believe our implementation will be useful for calculations involving various Green’s functions, propagators, integral kernels and so on for both students and researchers in fields requiring numerical evaluations of integrals over distributions without resorting to unnecessary or excessive approximation schemes.

The author thanks Kensuke Gallock-Yoshimura for help in testing the numerical schemes proposed here in [9], and Nicholas Funai for reading through the draft and providing useful feedback. The author acknowledges support from the Mike-Ophelia Lazaridis Fellowship from the Institute for Quantum Computing (IQC), University of Waterloo, Canada. The author also would like to thank both the reviewer and editor for very useful and clarifying comments and recommendations.

[1] | J. M. Howie, Complex Analysis, London: Springer Science & Business Media, 2003. |

[2] | J. W. Brown and R. V. Churchill, Complex Variables and Applications, 8th ed., New York: McGraw-Hill Education, 2009. |

[3] | V. Mukhanov and S. Winitzki, Introduction to Quantum Effects in Gravity, New York: Cambridge University Press, 2007. |

[4] | E. Tjoa. “Aspects of Quantum Field Theory with Boundary Conditions,” UWSpace, 2019. hdl.handle.net/10012/14843. |

[5] | E. Tjoa and R. B. Mann, “Harvesting Correlations in Schwarzschild and Collapsing Shell Spacetimes,” Journal of High Energy Physics, 08(155), 2020. doi.org:10.1007/JHEP08(2020)155. |

[6] | N. D. Birrell and P. C. W. Davies, Quantum Fields in Curved Space, Cambridge: Cambridge University Press, 1982. |

[7] | A. R. H. Smith. “Detectors, Reference Frames, and Time,” UWSpace, 2017. hdl.handle.net/10012/12618. |

[8] | G. Ver Steeg and N. C. Menicucci, “Entangling Power of an Expanding Universe,” Physical Review D, 79(4), 2009 044027. doi.org:10.1103/PhysRevD.79.044027. |

[9] | K. Gallock-Yoshimura, E. Tjoa and R. B. Mann, “Harvesting Entanglement with Detectors Freely Falling into a Black Hole.” arxiv.org/abs/2102.09573. |

[10] | M. P. G. Robbins, L. J. Henderson and R. B. Mann, “Entanglement Amplification from Rotating Black Holes.” arxiv.org/abs/2010.14517. |

[11] | L. J. Henderson, R. A. Hennigar, R. B. Mann, A. R. H. Smith and J. Zhang, “Harvesting Entanglement from the Black Hole Vacuum,” Classical and Quantum Gravity, 35(21) 21LT02, 2018. doi.org/10.1088/1361-6382/aae27e. |

[12] | J. Louko and A. Satz, “Transition Rate of the Unruh–DeWitt Detector in Curved Spacetime,” Classical and Quantum Gravity, 25(5) 055012, 2008.doi.org/10.1088/0264-9381/25/5/055012. |

Erickson Tjoa, “Numerical Contour Integration,” The Mathematica Journal, 2021. doi.org/10.3888/tmj.23–3. |

Erickson Tjoa was born in Jakarta, Indonesia. He moved to Singapore at the age of 15 and studied physics and mathematics as a double major at Nanyang Technological University from 2013-2017. He then completed an MSc at the University of Waterloo, where he is currently a second-year PhD student in relativistic quantum information.

**Erickson Tjoa
**

Institute for Quantum Computing (IQC)

University of Waterloo, ON, Canada

200 University Avenue West

Waterloo, ON, Canada N2L 3G1

Lehmer defined a measure

where the may be either integers or rational numbers in a Machin-like formula for . When the are integers, Lehmer’s measure can be used to determine the computational efficiency of the given Machin-like formula for . However, because the computations are complicated, it is unclear if Lehmer’s measure applies when one or more of the are rational. In this article, we develop a new algorithm for a two-term Machin-like formula for as an example of the unconditional applicability of Lehmer’s measure. This approach does not involve any irrational numbers and may allow calculating rapidly by the Newton–Raphson iteration method for the tangent function.

In 1706, the English astronomer and mathematician John Machin discovered a two-term formula for [1–3]

(1) |

that was later named in his honor. This formula for appeared to be more efficient than any others known by that time. In particular, due to the relatively rapid convergence of the right-hand side of (1), he was able to calculate 100 decimal digits of [1]. Nowadays, identities of the form

(2) |

where and are either integers or rational numbers, are called the Machin-like formulas for . Consequently, a two-term Machin-like formula for is given by

(3) |

If in equation (3) the constants and are some positive integers and , then the unknown value can be found as [4, 5]

(4) |

Furthermore, since we assumed that and are positive integers, from equation (4) it immediately follows that must be either an integer or a rational number.

In 1938, Lehmer [6] introduced a measure (see also [7])

(5) |

showing how much computational effort is required for a specific Machin-like formula for . In particular, when is small, then less computational effort is required and, consequently, the computational efficiency of this formula is higher. Lehmer’s measure is smaller if there are fewer summation terms and the constants are larger in magnitude. For more efficient computation, the constants should be larger by absolute value, since it is easier to approximate the arctangent function as its argument tends to zero (see [7] for more details).

It is also important to emphasize that in the same paper [6] Lehmer presented a few Machin-like formulas where some of the are not integers but rational numbers. This signifies that Lehmer assumed that his measure (5) remains valid whether the are integers or rational numbers.

In 2002 Kanada [8], using the following self-checking pair of Machin-like formulas for ,

and

computed more than 1 trillion digits of . These two examples show that the Machin-like formulas have colossal potential in the computation of decimal digits of .

In 1997, Chien-Lih showed a remarkable formula [9]

with . According to Weisstein [10], this Lehmer’s measure is the smallest known value for the consisting of integers only. Later Chien-Lih [11] showed how Lehmer’s measure can be reduced even further by using an Euler type of identity in an iteration for generating the two-term Machin-like formulas like (3) such that and are rational numbers.

In [12], we derived the following simple identity (see also [13])

(6) |

where and , and described how using this identity, another efficient method for generating the two-term Machin-like formula for

(7) |

with small Lehmer’s measure can be developed. In this approach, the constant can be chosen as a positive integer such that

(8) |

and, in accordance with equation (4), the constant in equation (7) can be found from

(9) |

It is not reasonable to solve equation (9) directly to determine the rational number , as its solution becomes tremendously difficult with increasing integer . However, this problem can be effectively resolved by using a very simple two-step iteration procedure discussed in the next section. Therefore, our approach in generating the two-term Machin-like formula (7) for with small Lehmer’s measure is much easier than Chien-Lih’s method [11].

Wetherfield [7] provides a detailed explanation clarifying the significance of Lehmer’s measure that shows how much computation is required for a given Machin-like formula for when all the constants . However, it is unclear if this paradigm is also applicable when at least one number is rational. More specifically, the problem that occurs in computing the two-term Machin-like formula for in Chien-Lih’s [11] and our [4] iteration methods is related to the rapidly growing number of digits in the numerators and denominators of or . This occurs simultaneously with an attempt to decrease Lehmer’s measure. As a result, the subsequent exponentiation in conventional algorithms makes computing the decimal digits of inefficient. Therefore, the applicability of Lehmer’s measure for a Machin-like formula for for the case is questionable. For example, Lehmer’s measure may be small, say less than 1. This means that less computational work is needed to calculate . However, due to the large number of digits in the numerators and denominators in or , a computer performs more intense arithmetic operations that make the runtime significantly longer. Consequently, we ask, Is Lehmer’s measure still applicable when at least one constant from the set is not an integer but a rational number?

Motivated by an interesting paper in regard to equation (7) that was recently published [14], we further develop our previous work [4, 5]. In this article, we propose a new algorithm showing how unconditional applicability of Lehmer’s measure for the two-term Machin-like formula (7) for can be achieved. We also describe how linear and quadratic convergence to can be implemented.

As mentioned, the number of summation terms in equation (2) should be reduced in order to minimize Lehmer’s measure. Since at there is only one Machin-like formula for ,

(10) |

we consider the case . We attempted to find a method that can be used to generate a two-term Machin-like formula for with small Lehmer’s measure. Equation (7) provides an efficient way to do this.

In fact, the original Machin formula (1) for appears quite naturally from the equations (7), (8) and (9) at . Specifically, equation (8) provides

Substituting into equation (9) results in

Consequently, at we get the constants , and in equation (7). Since the arctangent function is odd (), the constants for equation (3) can be rearranged as , , and . This corresponds to the original Machin formula (1) for .

**Theorem 2.1.**

There are only four possible cases for a two-term Machin-like formula (3) for when all four constants , , and are integers:

The proof for Theorem 2.1 can be found in [10].

**Lemma 2.2**

If in equation (7) , then at any integer .

**Proof**

As we can see from the four cases given in Theorem 2.1, the largest possible value for is and it occurs at (see the example above). Therefore, for any integer at integer , the constant in the equation (7) cannot be an integer. □

**Theorem 2.3.**

**Proof**

This is the simplest kind of Ramanujan nested radical and its proof is straightforward. Let . Then

Squaring both sides,

and solving results in two possible solutions, and . Since for any positive index the value is always positive, we exclude the solution . □

**Lemma 2.4.**

**Proof**

The proof follows immediately from Theorem 2.3 since

**Lemma 2.5.**

**Proof**

Using equation (8) that defines by the floor function, the limit can be rewritten as

(11) |

By definition, the fractional part given by the difference

is positive and cannot be greater than 1. Therefore, the limit (11) can be rewritten in the form

**Lemma 2.6**

We have that . | (12) |

**Proof**

Equation (9) is too hard to work with directly, so we start with the limit (11) from Lemma 2.5. Since the limit is 1, the limit of the ratio of the reciprocals of its numerator and denominator must also be 1,

(13) |

From Theorem 2.3 and Lemma 2.4,

and

(14) |

Since both the numerator and denominator in (13) tend to zero as tends to infinity and since as , we can rewrite (13) as

or

(15) |

Since equation (6) is valid for an arbitrarily large integer , (15) implies that

(16) |

However, we also have

(17) |

since equation (7) is also valid at an arbitrarily large integer . Comparing the limits (16) and (17), we get

(18) |

Equation (18) is valid if and only if

so , which is (12). □

**Lemma 2.7.**

**Proof**

The limit (14) implies that

(19) |

From (12) and (19) we conclude that as , both and tend to infinity. Thus, according to equation (5), Lehmer’s measure for the two-term Machin-like formula (7) for tends to zero with increasing . □

**Theorem 2.8**

If equation (8) holds, then the constant is always negative.

**Proof**

There is only one single-term Machin-like formula (10) for such that the constants and are both integers. Therefore, in equation (6), at any integer the argument of the arctangent function cannot be represented as the reciprocal of an integer; that is, is not an integer. Therefore,

and

This implies

or

or

(20) |

We make (20) into an equality by adding a negative error term such that

Defining the constant in accordance with equation (9), we find the error term to be

(21) |

Since is negative, the constant is also negative. □

Since in equation (7) the constant is an integer, the first arctangent function term can be computed by any existing method. For example, we can use Euler’s formula for the arctangent function

(22) |

Chien-Lih used this formula to develop his iteration method for generating the two-term Machin-like formula for [11] and later he found an elegant derivation of this formula [15].

We derived another series expansion of the arctangent function [5, 12]:

(23) |

It interesting that generalizing the derivation method that was used to get equation (23), we can find by induction the identity

where is the order of arctan function expansion; that yields simple approximations like

and

since .

The representation (23) of the arctangent function is not optimal for algorithmic implementation, since it deals with complex numbers. Fortunately, as we showed in [4], this series expansion can be significantly simplified to

(24) |

where the expansion coefficients are computed by iteration:

Both series expansions (22) and (24) converge rapidly and need no undesirable irrational numbers to compute . However, the computational test we performed shows that the series expansion (24) converges more rapidly by many orders of magnitude than Euler’s formula (22) (see Figures 2 and 3 in [4]). Therefore, the series expansion (24) is more advantageous and can be taken to compute the first arctangent function term from the two-term Machin-like formula (7) for .

The second arctangent function term in equation (7) should not be computed by straightforward substitution of the constant into equation (24). As mentioned, computing with a ratio of numbers with many digits should be avoided. Instead, the second arctangent function term in equation (7) can be computed by Newton–Raphson iteration.

Once the value of the integer is chosen, it is not difficult to determine the integer by using equation (8) with the help of Mathematica. However, determining the second constant with (9) is very hard with increasing , as already mentioned. To overcome this, we proposed a different method [4]. We define the very simple two-step iteration for ,

where

and

Then

(25) |

For , the second arctangent term in (7) deals only with a rational number . As increases, the number of digits in the numerator and denominator of the constant increases. For example, at , equation (8) yields

and using (25) we find

Consequently, the two-term Machin-like formula (7) for is generated as

Lehmer’s measure for this two-term Machin-like formula for is .

However, if we take , then

and using (25) we get

The corresponding two-term Machin like formula for is

for which Lehmer’s measure is only . Such a large number of digits in the numerator and denominator in the second arctangent function may look unusual. However, some formulas for obtained from the Borwein integrals involving the sinc function can also result in ratios of integers with a large number of digits. For example, Bäsel and Baillie reported a formula for that uses a quotient with 453,130,145 digits in the numerator and 453,237,170 digits in the denominator [16]; you can download a file with all the digits of the constant from [17].

As we can see from these examples, Lehmer’s measure decreases with increasing . However, that occurs simultaneously with a rapid increase in the number of digits in the numerator and denominator of the constant . As a result, taking powers of such fractions becomes very slow. This raises the doubt that Lehmer’s measure (5) is indeed relevant for a given Machin-like formula for when at least one coefficient is rational.

To resolve this problem, we considered applying the Newton–Raphson iteration [18]. Specifically, we showed that each consecutive iteration doubles the number of correct digits in the second term of the arctangent function in equation (7). This method is based on the iteration formula:

(26) |

such that

The most important advantage of (26) is that the rational number is not involved in the computation of the trigonometric functions. As we can see from (26), is no longer problematic because it is not taken to a power, nor is it the argument of sine or tangent, which consumes most of the runtime. Instead, it is only applied in a single subtraction. This single subtraction (that can be implemented by changing the precision) takes a negligibly small amount of time as compared to the time to compute or .

To reduce the number of trigonometric functions from two to one, it is convenient to put equation (26) into the form

(27) |

using the elementary trigonometric identities

and

The tangent function can be found, for example, by using the equation

representing the ratio based on the Maclaurin series expansions for the sine and cosine functions. Alternatively, we can use the series expansion

where is a Bernoulli number. There are several other ways to compute the tangent, like continued fractions [19, 20] or Newton–Raphson iteration again [21]. Perhaps the argument reduction method for the tangent function can also be used to improve accuracy, but we did not implement that, to keep the algorithm as simple as possible.

Equation (27) is based on the Newton–Raphson iteration method. However, this iteration formula contains the tangent function. We showed [18] that once the first arctangent term is computed, the number of correct digits in can be doubled at each consecutive step of the iteration as with the Newton–Raphson iteration method. To approximate the tangent function with high accuracy, we can apply the Newton–Raphson iteration again [21]. The derivation of the iteration-based equation for the tangent function is not difficult. Let .

Using the Newton–Raphson iteration formula

where

we get

(28) |

such that

Substituting into (28) yields

(29) |

The iteration-based expansion of the series (24) for the arctangent function converges very rapidly. Therefore, we can apply it to compute the arctangent function in equation (29).

The series expansion (24) of the arctangent function can be computed as follows.

Next we define the nested radicals consisting of square roots of 2.

This computes the constants and for the two-term Machin-like formula (7) for at .

For the constant , we use the iteration-based formula (25) instead of equation (9).

Define a function for the Lehmer’s measure corresponding to a two-term Machin-like formula (7) for .

Here is Lehmer’s measure for the case .

The accuracy improves with each iteration, so we do not need to use the highest accuracy at each step. At , the Newton–Raphson iteration-based formula (29) gives 4 to 5 correct digits of the tangent function at each step. Therefore, it is reasonable to use the argument in , where is taken to minimize rounding or truncation errors.

Recall that . As an initial guess for the Newton–Raphson iteration, we choose , since this is close to the actual value of the tangent function .

This sets up the recurrence for as defined in (29).

This part of the program, which computes the tangent function, takes most of the runtime. It uses the Newton–Raphson iteration built on the basis of the series expansion (24) of the arctangent function (see (29)).

This sets up the Newton–Raphson iteration formula (27).

The next part of the program invokes the value and performs just a few arithmetic operations. As mentioned, the numerator and denominator of contain many digits, but they are not involved in computing the tangent function. Only a minor part of the time is needed to subtract this number (see equation (27)). Consequently, Lehmer’s measure applies unconditionally.

The table shows how the values of the arctangent function gain digits at each iteration step.

This shows the linear rate of convergence to in terms of the number of accurate digits.

After the first two iterations, each of the following iterations adds five correct digits to the approximation of .

The convergence rate increases as Lehmer’s measure decreases, which can be readily confirmed by increasing and readjusting the parameter in this algorithm. No undesirable irrational numbers are needed. Furthermore, since Lehmer’s measure can be made vanishingly small, there is no upper bound in the convergence rate per iteration.

Consider a variation of the algorithm based on the Newton–Raphson iterations for the tangent function that can be implemented to get quadratic convergence to . Assume that only decimal digits of are known at the beginning.

We determined experimentally that at , equation (24) provides four or five correct digits of at each successive step. Therefore, it is sufficient to take terms. However, to exclude truncation and rounding errors, we use terms of the arctangent function.

We define to determine the accuracy of computation and for the number of summation terms in approximation (24) of the arctangent function.

The multiplier was found experimentally. When , we restrict its rapid growth to , as we do not need extra accuracy at this stage. We add 10 to exclude truncation and rounding errors. We have taken the initial value .

We use equation (29).

For our choice of , here is the required index.

Once the tangent function is computed, we substitute it into equation (27). The value has double the accuracy of with only one iteration.

The approximate values of the arctangent function by iteration and by the built-in function match to 100 places.

Since we got with double the accuracy, it can now be used to compute with significantly improved accuracy.

This shows that the number of correct decimal digits of doubled from 50 to 101.

More directly, this shows the complete match between the computed approximation of and that provided by Mathematica.

The first arctangent function in the two-term Machin-like formula (7) for can also be found by using the same algorithm based on the Newton–Raphson iteration. Consequently, this method results in quadratic convergence to . However, unlike the Brent–Salamin algorithm (also known as the Gauss–Brent–Salamin algorithm) with quadratic convergence to [2], our approach does not involve any irrational numbers. The number of summation terms in equation (24) and the number of iteration cycles for computation of the tangent function (29) decrease with increasing . This can be confirmed by using the code given here. To the best of our knowledge, this is the first algorithm showing the feasibility of quadratic convergence to without using any irrational numbers.

In this article, we presented a new algorithm to compute the two-term Machin-like formula (7) for and showed an example where the condition was not necessary in order to validate Lehmer’s measure (5). Since this algorithmic implementation lets us avoid subsequent exponentiation of the second constant , this approach may be promising for computing more rapidly without using irrational numbers.

This work is supported by National Research Council Canada, Thoth Technology, Inc., York University, Epic College of Technology and Epic Climate Green (ECG) Inc. The authors wish to thank the reviewer for useful comments and recommendations. Constructive suggestions from the editor that improved the content of this work are greatly appreciated.

[1] | P. Beckmann, A History of Pi, Boulder, CO: Golem Press, 1971. |

[2] | L. Berggren, J. Borwein and P. Borwein, Pi: A Source Book, 3rd ed., New York: Springer, 2004. |

[3] | J. Borwein and D. Bailey, Mathematics by Experiment. Plausible Reasoning in the 21st Century, 2nd ed., Wellesley, MA: AK Peters, 2008. |

[4] | S. M. Abrarov and B. M. Quine, “An Iteration Procedure for a Two-Term Machin-like Formula for Pi with Small Lehmer’s Measure.” arxiv.org/abs/1706.08835. |

[5] | S. M. Abrarov and B. M. Quine, “The Two-Term Machin-like Formula for Pi with Small Arguments of the Arctangent Function.” arxiv.org/abs/1704.02875. |

[6] | D. H. Lehmer, “On Arccotangent Relations for ,” The American Mathematical Monthly, 45(10), 1938, pp. 657–664. doi.org/10.2307/2302434. |

[7] | M. Wetherfield, “The Enhancement of Machin’s Formula by Todd’s Process,” The Mathematical Gazette, 80(488), 1996, pp. 333–344. doi.org/10.2307/3619567. |

[8] | J. S. Calcut, “Gaussian Integers and Arctangent Identities for ,” The American Mathematical Monthly, 116(6), 2009, pp. 515–530. www.jstor.org/stable/40391144. |

[9] | H. Chien-Lih, “More Machin-type Identities,” The Mathematical Gazette, 81(490), 1997, pp. 120–121. doi.org/10.2307/3618793. |

[10] | E. W. Weisstein. “Machin-like Formulas” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/Machin-LikeFormulas.html. |

[11] | H. Chien-Lih, “Some Observations on the Method of Arctangents for the Calculation of ,” The Mathematical Gazette, 88(512), 2004, pp. 270–278. www.jstor.org/stable/3620848. |

[12] | S. M. Abrarov and B. M. Quine, “A Formula for Pi Involving Nested Radicals,” The Ramanujan Journal, 46, 2018, pp. 657–665. doi.org/10.1007/s11139-018-9996-8. |

[13] | OEIS. “Decimal Expansion of Pi,” A000796. oeis.org/A000796. |

[14] | Wolfram Cloud. “A Wolfram Notebook Playing with Machin-like Formulas.” develop.open.wolframcloud.com/objects/exploration/MachinLike.nb. |

[15] | H. Chien-Lih, “An Elementary Derivation of Euler’s Series for the Arctangent Function,” 89(516), 2005, pp. 469–470. doi.org/10.1017/S0025557200178404. |

[16] | U. Bäsel and R. Baillie, “Sinc Integrals and Tiny Numbers.” arxiv.org/abs/1510.03200. |

[17] | S. M. Abrarov and B. M. Quine, “The Rational Number for the Two-Term Machin-like Formula for Pi Computed by Iteration.” yorkspace.library.yorku.ca/xmlui/handle/10315/33173. |

[18] | S. M. Abrarov and B. M. Quine, “Efficient Computation of Pi by the Newton–Raphson Iteration and a Two-Term Machin-like Formula,” International Journal of Mathematics and Computer Science, 13(2), 2018, pp. 157–169. ijmcs.future-in-tech.net/13.2/R-Abrarov.pdf. |

[19] | J. Havil, The Irrationals: A Story of the Numbers You Can’t Count On, Princeton, NJ: Princeton University Press, 2012. |

[20] | M. Trott. “Continued Fraction Approximations of the Tangent Function” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/ContinuedFractionApproximationsOfTheTangentFunction. |

[21] | J.-M. Muller, Elementary Functions. Algorithms and Implementation, 3rd ed., Boston: Birkhäuser, 2016. |

S. M. Abrarov, R. Siddiqui, R. Jagpal and B. M. Quine, “Unconditional Applicability of Lehmer’s Measure to the Two-Term Machin-like Formula for Pi,” The Mathematica Journal, 2021. doi.org/10.3888/tmj.23-2. |

Sanjar M. Abrarov received a BSc degree in Physics, and CSc and PhD degrees in Physics and Engineering. He is a research scientist at the Algonquin Radio Observatory at Thoth Technology, Inc., Canada, and a senior lecturer at Epic College of Technology, Canada. He is a recipient of the Canadian Astronautics and Space Institute Alouette (2010).

**Sanjar M. Abrarov
**

Algonquin Radio Observatory

Achray Rd, RR6

Pembroke, Canada, K8A 6W7

Rehan Siddiqui received BSc (Honours) and MSc degrees in Physics, an MEng in Power Engineering, and MSc and PhD degrees in Physics and Astronomy. He is a recipient of the Queen Elizabeth II (2015), Vernon Oliver Stong (2015) and Dr. Ralph Nicholls (2016) graduate scholarships, and of the York University Postdoctoral Fellowship (2017). He is a contract faculty and research associate at York University and a dean at Epic College of Technology, Canada.

**Rehan Siddiqui**

*Dept. Earth and Space Science and Engineering
York University
4700 Keele St.
Toronto, Canada, M3J 1P3*

Rajinder K. Jagpal received BSc and MSc degrees in Physics, and MSc and PhD degrees in Physics and Astronomy. He is a contract faculty and research associate at York University, Canada, and a director of Epic College of Technology, Canada. He is a recipient of the Canadian Astronautics and Space Institute Alouette (2010).

**Rajinder Jagpal**

*Dept. Physics and Astronomy, York University
4700 Keele St., Toronto, Canada, M3J 1P3*

Brendan M. Quine received BSc, PhD and DPhil degrees in Physics. He is an associate professor at York University, Canada, and a director of the Algonquin Radio Observatory at Thoth Technology, Inc., Canada. He is a recipient of the Canadian Astronautics and Space Institute Alouette (2010).

**Brendan M. Quine**

*Dept. Physics and Astronomy
York University, 4700 Keele St.
Canada, M3J 1P3*

This article is intended to help students understand the concept of a coverage probability involving confidence intervals. Mathematica is used as a language for describing an algorithm to compute the coverage probability for a simple confidence interval based on the binomial distribution. Then, higher-level functions are used to compute probabilities of expressions in order to obtain coverage probabilities. Several examples are presented: two confidence intervals for a population proportion based on the binomial distribution, an asymptotic confidence interval for the mean of the Poisson distribution, and an asymptotic confidence interval for a population proportion based on the negative binomial distribution.
### 1. Introduction

### 2. A Population Proportion and the Binomial Distribution

#### The Simplest Confidence Interval

#### A Better Confidence Interval for a Population Proportion

### 3. The Mean of the Poisson Distribution

### 4. The Population Proportion and the Negative Binomial Distribution

### 5. Summary

### Acknowledgments

### References

### About the Author

]]>Introductory courses in mathematical statistics present the rudimentary concepts behind confidence intervals. The creation of confidence intervals often involves the use of maximum likelihood estimation and the central limit theorem along with estimated standard errors.This is described in Casella and Berger [1] p. 497. Consequently, the level of confidence is often only approximate. This is particularly the case when continuous probability models are used to approximate discrete probabilities. The probability that the interval surrounds the unknown parameter depends on the value of the unknown parameter. Such a probability is called a *coverage probability*. Confidence is defined as the infimum of the coverage probabilities. The following definitions can be found in Casella and Berger [1] p. 418.

**Definition (Coverage and Confidence)**

Let where the are all independent from a distribution with probability density (or discrete mass) function given by . The support of each is and the parameter space is . Let and be the lower and upper limits of a confidence interval. Then the coverage probability of the interval evaluated at is . The level of confidence is .

Students are often confused about how to compute coverage probabilities. This tutorial is intended to help students understand them. We give a detailed explanation of calculating one particular coverage probability. This also allows one to perform the calculations with a minimum of distraction involving programming. We then compute coverage probabilities using higher-level functions and that allow specifying a function of a random variable along with its distribution. In both cases these functions allow one to focus on the higher-level ideas rather than low-level nuts and bolts of programming.

Coverage probabilities are best calculated by computer. This necessitates the choice of a programming language and programming environment. Statisticians are generally familiar with one or more statistical programming languages such as SAS, R and so on. Such languages are necessary productivity tools due to their significant data handling capabilities as well as their statistical methods. They are indispensable to the statistician. However, they are not as useful as a language for describing algorithms. Small “bookkeeping” matters often obscure the algorithm or method to be calculated. This tutorial uses Mathematica as a language to describe the computation of coverage probabilities. With a little additional effort, one can produce graphs of coverage probabilities as well as dynamic demonstrations that use a slider to illustrate the effect of the sample size on the graph. The Wolfram Demonstrations Project website contains numerous Demonstrations involving a wide variety of topics. One such Demonstration provided by Heiner and Wagon [2] involves coverage probabilities for a population proportion using a Wald approach as well as a Bayesian approach. This article takes a different approach than Heiner and Wagon.

We illustrate the idea of coverage (and hence confidence) with several examples.

Section 2 describes two asymptotically justified confidence intervals for estimating a population proportion based on the binomial distribution. The first confidence interval is a simple hand calculation interval contained in many textbooks. We present a step-by-step algorithm for computing the coverage probability for one specific value of the population parameter. We stress clarity of computation rather than efficiency. The approach is adequate for a population described by a discrete distribution with a finite number of possible values. We then compute the coverage probability using a much higher-level function, , to automatically compute the probability associated with an inequality. We also use for subsequent calculations. We produce a typical graph of coverage probabilities found in some textbooks. The second confidence interval for a population proportion (again based on the binomial distribution) is more complicated but has gained popularity. Naturally, it will be seen that coverage probabilities are generally higher than the level of confidence when approximations are used to create a confidence interval. This is illustrated in the examples below.

Section 3 presents an asymptotically justified confidence interval for the mean of a population described by a Poisson distribution. The Poisson distribution has infinitely many possible observable values. The function used to evaluate coverage probabilities automatically takes this into account.

Section 4 presents a graph of coverage probabilities based on an asymptotically justified confidence interval for estimating a population proportion based on the negative binomial distribution.

Section 5 presents a summary.

A population has a proportion of members with a given characteristic. In order to estimate , one randomly selects members of the population with replacement, say , where the are independent and identically distributed random variables, each with a Bernoulli distribution with parameter . If is the number of members in the random sample possessing the target characteristic, that is, , then has a binomial distribution with parameters and . The sample proportion of members with the characteristic is . Two large sample confidence intervals for are typically given. We start with the simplest. A large sample confidence interval of size for is given by

(1) |

where is the upper part of the standard normal distribution.

Let , the standard error of . So, we may shorten (1) by writing it as

(2) |

One can find the confidence interval in expression (2) in virtually any statistics book; in particular, see Devore and Berk [3] p. 396. Also, coverage probabilities for this confidence interval are described in Brown, Cai and DasGupta [4]. The derivation of the interval leads one to believe that the level of confidence is . However, two approximations are used to derive the interval in expression (2). One approximation uses the central limit theorem. A second approximation uses an estimated variance for the sampling distribution of the sample proportion . We want to compute the actual coverage probability for any possible value of the true population proportion . The coverage probability is

(3) |

where . Books are sometimes vague about whether or not to include the endpoints in the inequality. We exclude the endpoints in order to be consistent with typical hypothesis testing methods.

The definition of coverage confuses many students. For a given value of with , one must determine the set values of satisfying the inequality in expression (3) and compute the probability of observing such values of . We will describe how to determine the set of values and then compute their probability. Once we know what is actually being computed, we will move on to higher-level functions that perform the computations automatically.

We use an example with . A plot will show how bad the approximation can be and also displays the output of each step of the algorithm. We will compute the coverage probability for . The input and output are presented in a conversational style with some editorial comments along the way.

We wish to determine the upper percentage value from the standard normal distribution. The variable is often called a critical value for the standard normal distribution. The result will be a floating-point number, which restricts the accuracy and precision of all calculations that use it; the result of this calculation is a floating-point number.

Define .

Here is the confidence interval inequality for sample size 10 and general .

The support of the random variable is the set of values for which the probability mass function is positive. They also represent the observable values of for a discrete random variable. We represent the support of with the programming variable .

This tests whether the inequality is true for each value of and probability 0.5.

These are the positions that yield ; eliminates one level of parentheses. We wish to compute the probabilities of at those positions and sum them.

These are the appropriate values of the variable .

Now one computes the probabilities for the individual values of satisfying the inequality.

Finally, the values of the individual probabilities are summed to create the actual coverage probability for .

The steps have been broken down so that students can easily understand what is needed. A large sample justification leads us to believe that this number should be about 0.95. The coverage probability is about 0.89 rather than 0.95.

Here is a much more transparent manner in which to compute the coverage probability. We may use a system function for evaluating the probability of expressions of a random variable. Apparently, the system function automatically tests each possible value of the random variable to determine the ones that satisfy the inequality. (This works quite well for a discrete random variable with a finite number of observable values.) The relevant probabilities are then summed. This approach is not efficient in cases with infinitely many observable values of a random variable. However, it is straightforward and easy for a student to understand. We evaluate the probability of an expression involving the binomial random variable. The expression of the binomial random variable is the confidence interval inequality.

Let us define a function that constructs the inequality more explicitly.

Define the function that computes the coverage.

We now plot the coverage probabilities for a range of values of in Figure 1 below. We also create a horizontal line at a level of 0.95 for comparison purposes. The graph is symmetric due to properties of the binomial distribution and the large sample approximation involved in the confidence interval justification.

**Figure 1.** Coverage plot for first binomial confidence interval, .

Examining Figure 1 indicates several points. First, the coverage probabilities are in general not equal to the nominal level of confidence—namely .95. Moreover, coverage probabilities near and are effectively zero. Finally, the coverage probability function is discontinuous. All this with a minimum level of programming. In fact, the programming statements presented are simply a good description of the algorithm.

More is available. We wish to be able to change the plot by varying the sample size with a slider. A dynamic demonstration can easily be created with the function. The manipulate variable is the sample size , which you can vary with a slider from 5 to 100.

The graph is in Figure 2. The computer processing time increases with the value of the sample size because the inequality must be tested for each possible value of . The initial sample size is .

**Figure 2.** Coverage plot as a function of sample size.

A larger sample size improves the coverage probabilities as one expects. After all, the confidence interval formula is justified by a large sample argument. However, it is very clear that the coverage probability is small when is close to either 0 or 1 even with . For some sample sizes it is even more obvious that this function contains discontinuities.

This subsection presents coverage probabilities for an improved confidence interval for a population proportion. The improvement makes coverage probabilities generally larger.

Devore and Berk [3, p. 395] give a better large sample confidence interval for a population proportion. Based on the same assumptions as expression (1), a sample confidence interval of size for a population proportion is given by

(4) |

This confidence interval is based on solving the following inequality for :

(5) |

This defines the new inequality accordingly.

Just as with the previous kind of inequality, define .

Figure 3 is the corresponding plot, again with .

**Figure 3.** Coverage plot for the better confidence interval, .

The inequality in (5) is supposed to have a probability of approximately before sampling the population. We can of course compute the true probability with respect to the correct binomial distribution. The Mathematica code follows along with a dynamic graph in Figure 4.

**Figure 4.** Coverage probabilities for the superior asymptotic confidence interval for a population proportion.

Figure 5 contains the code and plot for the dynamic version of the plot. This plot allows for an easy comparison of the coverage probabilities for the two types of confidence intervals.

**Figure 5.** A comparison of coverage probabilities for the two binomial intervals.

The coverage probabilities for this improved confidence interval for a population proportion are indeed superior to the simpler interval. In particular, the coverage probabilities are quite large when is close to 0 or 1. One can see this even with a sample size of , for which the large sample approximation is not appropriate. The difference in coverage probabilities with the simple interval (displayed in Figure 2) and this improved interval is striking.

We now turn our attention to the Poisson distribution.

The book by Devore and Berk [3, p. 400] presents a homework exercise for determining a confidence interval of size for the mean of a population described by a Poisson distribution. Let , where the are independent and identically distributed with a Poisson distribution with parameter (mean) of . Ideally, we must solve the inequality

(6) |

to obtain the desired confidence interval. However, if we have a large enough sample, we may replace the true standard error in the denominator with its estimate. Again, this produces a less than ideal result.

The resulting simple confidence interval of approximate size for the mean is given by

(7) |

which has an approximate level of confidence of . The parameter in the denominator was replaced by the sample mean. Figure 6 contains the code and graph for the coverage probabilities. We use . We let where has a Poisson distribution with a mean of . In principle, the inequality must be tested for each of the infinitely many possible values of . Coverage probabilities are evaluated at a discrete set of points in order to save computational time.

**Figure 6.** Coverage probabilities for the confidence interval for the Poisson mean .

Unless is close to zero, this large sample approximation is quite good for , which is easily seen in Figure 6. Given the two approximations used, it is not surprising that the coverage probability is small when is close to zero.

This section addresses the situation of estimating a population proportion when the negative binomial distribution is appropriate.

Let , where the are independent and identically distributed with a geometric distribution with parameter . It is well known that has a negative binomial distribution with parameters and , (see [5], p. 127). Consequently, we use the negative binomial distribution for estimating a population proportion. There are many ways to define the negative binomial distribution. We use the version described in Kinney [5, p. 125]. Conduct independent success/failure trials, each with a probability of success . Let be the total number of trials needed to obtain successes. The probability mass function for is given by

(8) |

where .

Some authors count the number of trials before the success. Other authors count the number of failures before the success. There are other possibilities still. Mathematica uses , the number of failures before the success. Consequently, .

Casella and Berger [1, p. 496] describe large sample confidence intervals based on maximum likelihood. It is easily shown that the maximum likelihood estimator for is . Moreover, the asymptotic variance of this estimator is the reciprocal of the Fisher information, . Fisher information is described in Casella and Berger [1, p. 388]. This variance expression is not useful for creating a confidence interval for since it depends on . So, we estimate the large sample variance by replacing with . This leads to the large sample confidence interval:

(9) |

In order to conveniently perform the calculations, we note that . We evaluate the coverage probability for in steps of 0.01. We based the calculations on . The calculation can take some time depending on the computer. When is small, values of or are extremely unlikely. This makes the internal algorithm take quite a while. We can help speed up the calculations by using rather than the symbolic . The speedup occurs by reducing the required number of digits in calculations. Even so, this calculation takes some time (about four minutes on the author’s computer). A graph of the coverage probabilities is contained in Figure 7.

**Figure 7.** Coverage probabilities for the confidence interval for a population proportion on the negative binomial distribution, .

We see from Figure 7 that the approximation is quite good for values of close to 0.2. We infer that the approximation is also quite good if is close to 0. The approximation generally gets worse as increases (though not monotonically). A large sample approximation was used. Also, an approximate standard error was used. One sees that the coverage probability is essentially zero when is close to 1.

Large sample confidence intervals are often quite easy to derive. This is particularly true when using an estimate for the standard error of an estimator. However, the actual probability of surrounding the parameter value (coverage) can be quite different from the nominal value. It is helpful to graph the coverage probabilities to see this. Mathematica is particularly useful in performing these calculations and providing a language for describing the algorithms.

The author wishes to thank the anonymous reviewer and the editor for their help in improving this article.

[1] | G. Casella and R. Berger, Statistical Inference, 2nd ed., United States: Brooks/Cole Cengage Learning, 2002. |

[2] | K. Heiner and S. Wagon. “Wald and Bayesian Confidence Intervals” from the Wolfram Demonstrations Project—A Wolfram Web Resource. www.demonstrations.wolfram.com/WaldAndBayesianConfidenceIntervals. |

[3] | J. Devore and K. Berk, Modern Mathematical Statistics with Applications, 2nd ed., New York: Springer, 2012. |

[4] | L. D. Brown, T. T. Cai and A. DasGupta, “Confidence Intervals for a Binomial Proportion and Asymptotic Expansions,” The Annals of Statistics, 30(1), 2002 pp. 160–201. www.jstor.org/stable/2700007. |

[5] | J. Kinney, Probability: An Introduction with Statistical Applications, New York: John Wiley and Sons, 1997. |

P. Cook, “Coverage versus Confidence,” The Mathematica Journal, 2021. https://doi.org/10.3888/tmj.23-1. |

Peyton Cook earned a B.A. in Psychology, B.S. in Mathematics, and an M.S. and Ph.D. in Statistics. He is an Associate Professor at The University of Tulsa.

**Peyton Cook**

*Department of Mathematics
The University of Tulsa
800 Tucker Drive
Tulsa, Oklahoma 74104
pcook@utulsa.edu*

Structural equational modeling is a very popular statistical technique in the social sciences, as it is very flexible and includes factor analysis, path analysis and others as special cases. While usually done with specialized programs, the same can be achieved in Mathematica, which has the benefit of allowing control of any aspect of the calculation. Moreover, a second, more flexible, approach to calculating these models is described that is conceptually much easier yet potentially more powerful. This second approach is used to describe a solution of the attenuation problem of regression.

Linear structural equation modeling (SEM) is a technique that has found widespread use in many sciences in the last decades. An early foundational work is Bollen [1]; a more recent overview is provided by Hoyle [2]. The basic idea is to model the linear structure of observed variables of cases (observations, subjects) by linear equations that may involve latent variables. These variables are not measured directly but inferred from the observed variables by their linear relation to the observed variables.

Many commercial programs (including LISREL, Amos, Mplus) and free ones (including lavaan, sem, OpenMX) have been developed to carry out the estimation procedure. From my perspective, the R package lavaan [3, 4] by Yves Rosseel is the most reliable and convenient one among the free programs. I use it as the gold standard to judge results of my own code.

This article first gives a quick overview of the standard SEM theory, then shows how to perform the calculations in Mathematica. In the last section, a second approach is discussed.

There is a standard example due to Bollen that is also used in the lavaan manual. The dataset consists of observations of 11 manifest variables , , , , , , , , , , . SEM models are usually depicted graphically. In the lavaan documentation, this is displayed as in Figure 1.

**Figure 1.** Bollen’s democracy model (image from lavaan documentation [4]).

The variables , , are observed variables that measure the construct of industrialization in 1960, which is described by the latent variable . This means that the level of industrialization is assumed to be representable by one number for each country, but this number cannot be measured directly; it has to be inferred from its linear relation to gross national product , energy consumption per capita and share of industrial workers . Next, and are the democracy levels in 1960 and 1965, measured by , , , and , , , (these indicators are freedom of the press, etc.). The data matrix consists of these 11 numbers for each of 75 countries (cases). The data is delivered with the lavaan package for R. The aim of estimating the model is twofold. First, the weights of the linear connections (represented in the picture by arrows) are estimated. These arrows encode linear equations by the rule that all arrows that end in a variable indicate a linear combination that yields the value of this variable plus some error term variable. To bring this mysterious language down to earth, here are the equations represented in Figure 1:

, ,

, ,

, ,

,

.

The variable is called an exogenous latent variable because no arrow ends there. It has no associated error variable. However, its manifest (measured) indicator variables , , have associated error variables (they are called in [1]). The indicator variables , , , and , , , of the two endogenous latent variables (those latent variables where arrows end) have error variables (called in [1]). The equations that relate latent and manifest variables define the measurement part of the model. The two equations (coming from three arrows) between the latent variables are the structure model, usually of most interest. Fitting the model to the data gives estimates for the weights of the arrows, , , , , …. The second goal of SEM modeling is to check how well the structure of the model fits the data; that is, SEM is also a hypothesis-testing method.

The equations given do not yet identify all variables. Assume we have a solution of ; then for any number , the numbers and would be solutions, too. To avoid this problem, we either fix the variance of the latent variables to be 1 or we fix some of the weights to be 1. This is the default in lavaan and we adopt it here, hence , , .

Ever since SEM’s invention, SEM models are estimated by calculating the model’s covariance matrix. From the data, we get the empirical covariance matrix . On the other hand, from the model, we can calculate a theoretical covariance matrix between the observed variables. ( depends on the model and thus on the parameters.) For example, one entry in this matrix would be . Using linearity and other properties of the covariance, this boils down to a matrix with entries that are polynomials in the model parameters and the covariances and variances between latent variables and error variables. However, without further assumptions, this gives a lot of covariances (e.g. ) that are not determined by the model and hence must be estimated. As this usually leads to too much freedom, the broad assumption is that most error variables are uncorrelated. Only some covariances between error variables are not assumed to be 0; those are marked in the diagram by two-headed arrows between the observed variables. For every pair of observed variables, we calculate the covariance by using the above given model equation as replacement rules and applies linearity and independence assumptions. In the end, we get a covariance matrix that depends on the model parameters , , , , , … and on the variances of the latent variables and the covariances of error variables that are not assumed to be 0. Details can be found in Bollen [1].

To fit the empirical and the theoretical covariance matrix, we have to choose these parameters to minimize some distance function. The three most common are uniform least-square, , generalized least-square, (*I* is the identity matrix), and maximum likelihood, (here is the number of manifest variables).

Now we are in the position to define a Mathematica function that performs SEM. First, we define the helper function that gets all variables contained in an expression in such a way that, for example, counts as one variable.

Here is an example.

The method will be explained with Bollen’s democracy dataset, so first, we need to load this dataset. The file bollen.csv contains headers (the names of the variables are saved in the list ) and a first column numbering the cases, which is dropped.

The data has 75 rows.

Here is the first row of 11 numbers.

The model itself has to be specified as a list of replacement rules that mirror the model equations discussed.

The code for the estimation function includes some utilities. For example, it defines its own covariance and variance functions that take into account which variables are assumed to be uncorrelated. The input of is the data matrix , a matrix of numerical values, one row per case. The structural equations are given in the format detailed in the previous section, “The Standard Example.” Moreover, the function needs:

• the lists of free parameters, (e.g. path weights)

• endogenous latent variables,

• exogenous latent variables,

• the list of error variables of latent variables,

• errors of exogenous manifest variables

• errors of endogenous manifest variables

• a list of pairs of error variables specifying which error variables are allowed to be correlated

The code after defining can be omitted on a first reading; it is only needed to calculate some fit indices (if required by the option , which asks to do the fit index (FI) calculation; similarly, asks to do the maximum likelihood estimation). The estimation is done at the end of the function.

The goal of the first half of the program is the definition of the covariance function that takes into account the SEM assumptions: that most error variables are uncorrelated (except those specified to be correlated), leaving variances of latent variables as symbolic entities to be estimated.

This function is then used to calculate the model implied covariance matrix . Applying the model equation rules repeatedly gives a matrix that depends only on parameters, variances of latent variables and error variables and some allowed covariances of error variables. The code from the line defining (the degree of freedom) onward is only important for getting fit indices. If we are only interested in estimating the model parameters, the next interesting lines are where is applied to estimate the model. As described in the introduction, there are several strategies to measure deviation of covariance matrices; for example, the definition of is a straightforward coding for minimizing .

Let us run the code on Bollen’s model in a simplified version where no correlation of error variables is assumed. This may take several minutes.

The result combines parameter, variance and covariance estimations according to the various estimating strategies. To judge how well the model fits the data, you can set the option to some fit indices:

• RMSEA is the root square mean error

• CFI is the comparable fit index

• TLI is the Tucker–Lewis fit index

• NFI is the normed fit index

RMSEA should be less than 0.1 or better, less than 0.05, and the last three should all be greater than 0.9 or 0.95 for good model fit.

The results of estimating using the three different methods differ somewhat. This is not a bug of our program; lavaan determines the same numbers up to several decimal places. There are results in the literature about which methods are equivalent under which conditions. For these fit indices to be interpretable, we need to assume that the data is multivariate normally distributed. If this assumption is violated, then we should judge model fit by other indices, which is beyond the scope of this article; however, they could be calculated based on the current approach as well. The book edited by Hoyle [2] gives some information on these methods.

For the original model that allows some covariances between error variables, the runtime gets worse, especially for maximum likelihood estimation. Hence, this is turned off in the following code.

The results of both models are exactly the same as calculated with lavaan.

When I first learned about SEM, I was puzzled by the many notions (e.g. exogenous, endogenous) and the assumptions needed. For example, I felt that correlation of error variables should be calculated by the estimation algorithm and not be set at will when specifying the model. However, these difficulties seem to play no large role in practice and there are thousands of research papers (mainly) in the social sciences that use these methods with great success. Yet, there are some reasons why the standard approach to SEM via covariance matrices can be criticized (a more detailed discussion is given in [5]). Traditional SEM:

• is well suited only for linear models (there are some nonlinear extensions, but they have not yet become mainstream)

• does not give estimates of the values of latent variables for each case (Bayesian variants can do this)

• requires the covariance matrix of observed data to be nonsingular; however, improving measurement methods in , , , for example, may result in highly correlated measures of (in the extreme case with identical vectors of measured values) and hence their covariance matrix will be almost singular

• has resulting estimations for parameters that depend a lot on the estimation method used

• forbids certain linear models that are not identified in this approach, even though the model itself is sensible and well defined (e.g. the number of covariances of error variables allowed to be nonzero is limited, although in practice there may be correlations)

You may then wonder why the covariance matrix–based approach is so popular. I suppose that more than 40 years ago, computers were not powerful enough to deal with a full dataset, so that the information reduction by calculating the correlation matrix was essential. Since then, many powerful programs have been developed and research has been carried out that gave a good understanding of conditions under which the method works well. Moreover, the psychometric community reached a consensus on how model fit should be judged and thus studies using this method faced no problem being published.

After this discussion of pros and cons, it is time to present the following case-based approach to SEM estimation that is very easy (one may even call it naive) to implement but is also very flexible and with today’s computing power, it is feasible in many real-world situations.

Hence, I propose to do SEM case-based by least-square optimization of the defects of the equations. Assume we have observations (cases) of variables , . A general equational model consists of equations , , which involve the data, latent variables , , and parameters . Then the latent variables and the parameters are estimated by minimizing .

Another twist is needed to get the best results, however. The above objective function gives all equations the same weight. However, it turned out (by working with simulated data where it is clear which parameters should be found) that we get better results by multiplying by a factor that gives the equations different weights, that is, . The factor can be modified by an option in the code that follows. Best results are obtained for , where is the number of latent variables in . The idea behind this choice is that an equation that involves only one latent variable links this variable directly to the manifest data and thus should have a high weight. In contrast, equations with many latent variables are not so close to the manifest observations and are thus are more hypothetical, so they should have a lower weight.

The model equations are not formulated as rules as for the first SEM, but as equations with the name of the error variable attached to each equation. Moreover, the dataset is not normalized, so there are nonzero intercepts in the linear equations. In the first approach this had no consequences, because such additive values are eliminated by calculating the covariance matrix, but in the SEM2 approach, intercepts must be modeled explicitly (and we have the benefit of getting estimates for them as well).

The function SEM2 that carries out the model estimation takes as input and the names of the manifest () and latent variables (). At the technical heart of the function is the subroutine . This function takes an equation involving latent variables (e.g. ) and adds to the objective function the appropriate term for each case (i.e. with values from the data replacing the names of manifest variables):

There is one option.

This code estimates Bollen’s model.

As mentioned, there is a version that weights equations according to the number of latent variables they have.

The results for the estimates differ from what is calculated in the traditional covariance matrix–based approach given for . A simulation study that compares the two approaches [5] showed that in many situations the case-based approach gives better results, especially when the assumption of independent errors is violated. Moreover, the case-based approach is easily applied to nonlinear equations. However, in certain situations it may be necessary to perform the minimization with higher accuracy than provided by standard hardware floating-point numbers.

In standard linear regression , one assumes that the independent variables are measured exactly, while the dependent variable has an error that is ideally normally distributed. If the independent variables are measured with error too, standard linear regression underestimates the regression coefficient. This is the famous attenuation problem and I will show how to solve it. Let us first simulate a dataset with error on both variables.

Then linear regression underestimates the slope, which should be 0.5.

When using case-based modeling, several strategies are possible. We may use one or two latent variables for the true values. As the true dependent variable is just , the following code uses just one latent variable. Another twist is that the equations are divided by the empirical standard deviations to put them on an equal footing.

This example shows both the power of this method and the responsibility of the modeler to set up sensible equations. If we are sure that the errors are uncorrelated, we may add as another constraint to further improve the estimate. This may also be done automatically with an extended version of SEM2, which will be published when its development is completed.

Two methods for the estimation of structural equational models are presented. One uses the traditional covariance matrix–based approach and is therefore restricted to linear equations, while the other approach is more general but not yet established in practice. Estimating the models is rather easy in Mathematica, but the numerical problems that arise can be demanding. The new case-based approach is very flexible and promising in certain situations where the standard approach shows limitations.

Case-based calculation of SEM looks very promising given the numerical power of today’s computers and might give insight in situations where the restrictions of the traditional approach urge researchers into making assumptions that may not be warranted.

It is my pleasure to thank Ed Merkle and Yves Rosseel for many explanations of SEM.

[1] | K. A. Bollen, Structural Equations with Latent Variables, New York: Wiley, 1989. |

[2] | R. H. Hoyle (ed.), Handbook of Structural Equation Modeling, New York: Guilford Press, 2012. |

[3] | K. Gana and G. Broc, Structural Equation Modeling with lavaan, Hoboken: John Wiley & Sons, 2019. |

[4] | Y. Rosseel. “lavaan.” (Aug 25, 2019) https://lavaan.ugent.be. |

[5] | R. Oldenburg, “Case-based vs. Covariance-based SEM,” forthcoming. |

R. Oldenburg, “Structural Equation Modeling,” The Mathematica Journal, 2020. https://doi.org/10.3888/tmj.22–5. |

Reinhard Oldenburg has studied physics and mathematics and received a PhD in algebra. He has been a high-school teacher and now holds a professorship in Mathematics Education at Augsburg University. His research interests are computer algebra, the logic of elementary algebra and real-world applications.

**Reinhard Oldenburg**

*Augsburg University
Mathematics Department
Universitätsstraße 14
86159 Augsburg, Germany
*

A method of generating minimally unsatisfiable conjunctive normal forms is introduced. A conjunctive normal form (CNF) is minimally unsatisfiable if it is unsatisfiable and such that removing any one of its clauses results in a satisfiable CNF.

Ivor Spence [1] introduced a method for producing small unsatisfiable formulas of propositional logic that were difficult to solve by most SAT solvers at the time, which we believe was because they were usually minimally (i.e. just barely) unsatisfiable. Kullmann and Zhao [2] claim that minimally unsatisfiable formulas are “the hardest examples for proof systems.” We will generalize Spence’s construction and show that it can be used to generate minimally unsatisfiable propositional formulas in conjunctive normal form, that is, formulas that are unsatisfiable but such that the removal of even a single clause produces a satisfiable formula. In addition to increasing our understanding of the satisfiability problem, these formulas have important connections to other combinatorial problems [3].

We assume the reader has at least a minimal acquaintance with propositional logic and truth tables. An *interpretation* of a propositional formula is an assignment of truth values to its propositional variables. A propositional formula is *satisfiable *if there is an interpretation that makes it true when evaluated using the usual truth table rules. A *literal* is a propositional variable or a negated propositional variable. A *clause* is a disjunction of literals; if it contains exactly literals, we call it a *-clause*. A *conjunctive normal form* (or *CNF*) is a conjunction of clauses. A -CNF is a conjunction of -clauses.

For example, is a 3-CNF. It is often convenient to think of CNFs as a list of lists of literals; in this format, the 3-CNF example would be written as . This way of writing CNFs is quite common in computer science and is the approach that we took in [4], where we showed how the famous Davis–Putnam algorithm for satisfiability testing could be easily programmed in Mathematica. , Mathematica’s built-in function for satisfiability testing, requires replacing “¬” by “!”, “∨” by “||” and “∧” by “&&”; so in Mathematica the 3-CNF example is written as . In this article, we adopt the “list of lists” approach for programming purposes and then convert to Mathematica’s format when testing for satisfiability with .

In this section, we show how to generalize a construction of Ivor Spence [1] that produces unsatisfiable 3-CNFs that are relatively short but take a relatively long time to verify that they are indeed unsatisfiable using standard computer programs, even though it is relatively easy, as we shall show, to demonstrate that they are unsatisfiable. (Perhaps humans are not replaceable by computers, after all!)

Given positive integers and , suppose the propositional variables , , …, are partitioned in order into sets of size and one set of size . For each cell of the partition, form all -clauses from the ’s in that cell and let be the conjunction of all these -clauses. If is to be true under an interpretation , no more than of the -variables from each partition cell can be false, since if were false, the -clause containing exactly these -variables would be false, as would their conjunction . Thus no more than of the -variables can be false under .

Next let, , …, be a random permutation of the ’s and partition these ’s just as the ’s were partitioned. However, this time, for each cell of the partition, form all -clauses from the __negated__ -variables in that cell and let be the conjunction of all these -clauses. Reasoning as before, no more than -variables can be true under .

Let . If is to be true under some interpretation , both and must be true under ; thus no more than -variables can be false and no more than -variables can be true under . Since the -variables are permuted -variables, it follows that no more than -variables can be true under . However, , the number of -variables in ! Thus is an unsatisfiable CNF, because there is no interpretation of all its -variables that makes true.

Suppose next that we drop one of the clauses in , say for example, ; let and let . Let be an interpretation that assigns false to , , …, and true to the remaining variables in the first -cell. As long as no more than -variables in each of the remaining cells of the partition of the ’s are assigned the value false, would be true under . Whether or not and hence are true under depends on whether also has the property that at most -variables in each cell of the partition of the randomly permuted variables (the ’s) are assigned the value true under . While this is unlikely for any given interpretation , there are so many interpretations satisfying that it is most likely that some such interpretation has this property and the reduced CNF will then be true under .

We will investigate this intuitive argument. For to be true, the -variables , , …, can now be false in the first cell, as long as each of the remaining cells in the -partition has at most false variables; thus -variables can be false and, as above, -variables can be true. However, , and the argument showing to be unsatisfiable cannot be applied to .

First we allow for different choices for the parameters and . Initially we set and . The next several steps serve to introduce the variables and partition them into cells.

Define the partition of the -variables.

Here is the -partition for our example.

Next we generate, negate and partition the -variables.

We join and and form all -sets from the result.

puts these steps together. The argument is a permutation of .

For the experiments that follow, leaving out the third argument uses a random permutation.

Because of , the negated pieces can change from one run to the next.

The function transforms a list of clauses into an expression that allows for satisfiability testing, as described in the section Definitions.

Equivalently, here is a longer form.

To test that a CNF is minimally unsatisfiable, we must show both that it is unsatisfiable and that the removal of any one clause always results in a satisfiable CNF. tests the satisfiability of a CNF in list form by converting it to a logical expression with and applying the built-in function ; then it tests if all the formulas with one clause deleted are indeed satisfiable.

This tests whether C3 is minimally unsatisfiable.

Most of the unsatisfiable formulas generated in this way are, as we shall see, minimally unsatisfiable, but not always. However, we conjecture that the bigger we take and , the more likely we are to get a minimally unsatisfiable formula. We have done some experiments on this conjecture and discuss them in the next section.

If we remove two different clauses instead of one from our unsatisfiable formulas, the result is almost always a satisfiable formula. We define the function and experiment with it in the next section.

We run some experiments to investigate the frequency of minimally unsatisfiable CNFs obtained with our Spence generalizations.

The table below summarizes some larger computer experiments we have conducted to test our conjecture that most of the formulas constructed by the above methods are minimally unsatisfiable. The third column, “”, stands for the number of clauses in the CNF defined by `SpenceCNF@SpenceList[k,g]`. The fourth column gives the percentage of these clauses that were minimally unsatisfiable.

Each line in the table represents results on 500 formulas. For example, the first line constructs 500 3-CNFs, based on variables and clauses in each. It turned out that 62% were minimally unsatisfiable.

Next we look at what happens if we remove two different clauses from our Spence formulas. With , , there are 92 clauses in the Spence CNF; hence distinct ways to remove two different clauses. We count the number of times the resulting CNF is true in 100 trials.

In this trial, we are very close to all Spence CNFs becoming satisfiable after removing two different clauses. We believe this is true in general.

In this section we modify our construction to generate only derangements of the -variables. A *derangement* is a permutation that leaves no number in its original position. There are three resource functions that deal with derangements.

It is known that derangements constitute slightly more than a third of the permutations (see [5]), as the following calculation illustrates.

We define the function that does “derangement” experiments.

On the basis of these experiments we conjecture that the derangements are about as likely to produce minimally unsatisfiable formulas as permutations in general.

We have adapted a method of I. Spence [1] to easily obtain large numbers of unsatisfiable CNFs that are usually but not always minimally unsatisfiable. We also ran some experiments to indicate what percentages would be minimally unsatisfiable. In addition, our experiments suggest that if two different clauses are removed rather than one, the resulting formula will almost always be satisfiable. Finally we restricted the random permutations in our construction by requiring them to be derangements and saw that this gave similar percentages of minimally unsatisfiable formulas.

I am grateful to the referee for his advice and to the editor, George Beck, for greatly improving my Mathematica coding throughout the paper.

[1] | I. Spence, “sgen1: A Generator of Small but Difficult Satisfiability Benchmarks,” ACM Journal of Experimental Algorithmics, 15, 2010 pp. 1.1–1.15. doi:10.1145/1671970.1671972. |

[2] | O. Kullmann and X. Zhao, “On Davis–Putnam Reductions for Minimally Unsatisfiable Clause-Sets,” Theoretical Computer Science, 492, 2013 pp. 70–87. doi:10.1007/978-3-642-31612-8_ 21. |

[3] | R. Aharoni and N. Linial, “Minimal Non-Two-Colorable Hypergraphs and Minimal Unsatisfiable Formulas,” Journal of Combinatorial Theory, Series A 43(2), 1986 pp. 196–204. doi:10.1016/0097-3165(86)90060-9. |

[4] | R.Cowen, M. Huq and W. MacDonald, “Implementing the Davis–Putnam Algorithm in Mathematica,” Mathematica in Education and Research, 10, 2005 pp, 46–55. www.researchgate.net/publication/246429822_Implementing_the_Davis-Putnam_Algorithm_in_Mathematica. |

[5] | Wikipedia. “Derangement.” (Jul 10, 2010) en.wikipedia.org/wiki/Derangement#Limit_of_ratio_of_derangement_to_permutation_as_n_approaches_∞. |

R. Cowen, “Generating Minimally Unsatisfiable Conjunctive Normal Forms,” The Mathematica Journal, 2020. https://doi.org/10.3888/tmj.22–4. |

Robert Cowen is a Professor Emeritus at Queens College, CUNY. His main research interests are logic and combinatorics. He has enjoyed teaching students how to use Mathematica to do research in mathematics for many years.

**Robert Cowen**

*16422 75th Avenue
Fresh Meadows, NY 11366
*

Given a rationally parameterized curve in or , where the and are polynomials, we find the dimension of the smallest linear subset of containing the curve. If all the and are of degree or less, then it is known abstractly that this dimension is or less and *rational normal curves* play a key role in the argument. We consider this from a computational point of view with playing an essential part in the discussion.

The ancients were confused about the concepts of degree and dimension. As late as 1545 in his famous book *Ars Magna* [1], Cardano, who did not hesitate to invent imaginary numbers, in reference to his assistant Ferrari’s solution of the quartic gives the following disclaimer:

Although a long series of rules might be added and a long discourse given about them, we conclude our detailed consideration with the cubic, others being merely mentioned, even if generally, in passing. For as the first power refers to a line, the square to a surface, and the cube to a solid body, it would be very foolish to go beyond this point. Nature does not permit it.

The distinction between degree and dimension was later resolved by Descartes’s algebraic notation. But, in the context of parametric curves, I recently noticed a simple linear algebra proof of the following theorem:

**Theorem A**

Let , be a curve in or where the coordinate functions are polynomials of degree or less. Then for any , the curve lies in a linear subset of or of dimension .

This theorem, as well as many of the other facts in this article, is given in Joe Harris’s book [2] from a projective geometry point of view. He also considers the degree versus dimension issue in a number of other situations. We give the linear algebra proof in Section 2.

Unfortunately, projective geometry is not computationally friendly. Instead we can view these results from an affine point of view using the built-in function [3], which we discuss in Section 3.

We then generalize and rephrase our result in Section 4 as Theorem B. The generalization is to rational curves and we can give the dimensions of the smallest linear space containing the curve. Theorem B does clarify that, while the degree bounds the size of a linear set, the curve may lie in a smaller dimensional linear set.

In Section 5 we observe that the *rational normal curve* in or , , is universal for rational curves. That is, every rational curve is a transform of a normal curve. This is very easily seen via the . This lets us rephrase Theorem B in another useful form, where the can be found directly from the expression of a rational function in the form

where and the common denominator are all polynomials of degree or less written in descending degree. To simplify notation we generally work with coefficients in the real numbers , but it should be understood that one could work in any subfield of the complex numbers as well. But, as immediately below, in some cases we must consider parameter values in the algebraic closure of the subfield.

Sections 5 and 6 give two applications.

The first discusses the recognition problem: given a point , is for some ? This is equivalent to the well-studied problem of finding a common solution of a family of univariate polynomials, which we do not consider here. We show that modulo the linear , the recognition problem can often be solved in a linear space of smaller dimension.

The second example is the implicitization problem for rational functions, which is to find an implicit system that describes the ideal of the rational curve. We only sketch this, as there is no room to carefully describe the routines in [4].

In fact, this article was motivated by the author’s work on implicitization of parametric curves. I noticed that an unexpectedly large number of linear equations appeared in the implicit systems.

In this article a *linear subset of * is a set defined by a system of linear equations, not necessarily homogeneous. A linear subset is distinguished from a *linear subspace*, which is a subspace of the vector space and defined with homogeneous equations. The big difference is that a subspace contains the origin . A linear subset is a coset of a linear subspace under the operation of vector addition.

A polynomial parametric curve is a function where each coordinate function is a polynomial that we write in descending degree:

where is the *degree* of the coordinate polynomial. The largest such degree is the degree of the parameterization. Note that . This constant acts merely as a basepoint; a different basepoint gives a curve that is a translation of the first. Thus the basepoint does not affect the geometry. We say our parameterization is *stripped* if (or alternatively if each ). Each polynomial parameterized curve is then a translate of a stripped curve, so we first consider those. We *strip* a polynomial parameterized curve by dropping all the constant terms.

We now create a stripped coefficient matrix from the stripped polynomial. If is the degree of the polynomial, is the matrix with rows . Consider the following equation where points are column vectors.

This shows that every point on the parameterized curve is in the vector space spanned by the columns of the coefficient matrix. So Theorem A is true for a stripped parameterization, but adding back the constant simply moves this subspace to a linear subset.

To describe the smallest linear set containing a finite set of points in terms of a system of equations, here is a short routine.

A longer version of this with error detecting is in [4].

Example 1:

We know a linear set containing this curve must be of dimension no greater than three, since this set is contained in , so it is generated as a linear set by four or fewer points. Therefore it is enough to take four random points on this curve and calculate the smallest linear set containing them.

Here are the four random points.

Here is the linear expression for the linear set.

A linear set defined by one linear equation in three variables is of dimension two. This curve lies in the linear set defined by setting the linear expression to zero.

The central concept in this article is the built-in Wolfram Language function **. **When we say *transformation function* we mean a function given by . Basically these are affine versions of *projective linear* transformations, which can include translations along with the usual transformations of linear algebra. They appeared in Lecture 2 of Abhyankar [5] and much of the author’s work [4, 6] as *fractional linear transformations*; they are also known in the literature as *linear fractional transformations*. Our major use of these transformations is to be able to access projective geometry where points are cosets of -tuples, while working in affine geometry where points are merely -tuples, which are easy to manipulate computationally.

A transformation function can be described by an matrix. The matrix of the associated projective linear transformation is called the *transformation matrix* in the Wolfram Language. Thus the of an matrix takes an affine -tuple, appends 1 to represent this in projective -space, applies the projective linear transformation defined by and then specializes by dividing by the component.

Here is an example.

These transformations in the special case are discussed in detail in Chapter 6 of my book [4].

A transformation function is *affine* if the last row is ; the denominators are always 1, the upper-left submatrix gives a linear transformation and the first entries of the last column describe a translation.

In particular, the domain of an affine transformation is all of . Otherwise we call the transformation function *projective*. If the last row of the transformation matrix is , then the hyperplane of given by is not in the domain of the transformation function. In the context of an affine transformation, it is understood that the equation defines the empty set.

In this article we assume that a rational parametric curve has coordinates that are quotients of two polynomials in . We insist that the parametric curve be given with a common denominator , so, for example, is of the form

(1) |

for polynomials . The degrees of may be greater than, equal to or less than the degree of . In particular, could be the constant polynomial 1, in which case is a polynomial curve that we can treat as a special case of a rational curve. The degree of is the largest degree of .

The advantage of writing polynomials in the parameter in descending degree is that writing a transformation matrix for a rational function is easy. Suppose in equation (1) that for , where we write . Then the transformation matrix for is

(2) |

Example 2.

Both [2] and [5] mention the fact that every rationally parameterized curve is a projective transformation applied to a polynomially parameterized curve. In particular, [2] notes that this polynomial curve can be the rational normal curve of degree

Before we state Theorem B, we note that every linear transformation can be factored into a projection on some coordinates followed by an embedding. This is accomplished in a special way using Mathematica by the following matrix reduction algorithm we call . This takes an matrix of rank and outputs an matrix and an matrix consisting of rows of such that . This implies that rows of are what the Wolfram Language calls ; that is, contains an identity matrix as a submatrix.

In the code, the functions and defined in the statements invert the lists and viewed as functions from their index sets. The tests whether is in the domain of .

We can now state and prove our main theorem; we write . It may seem counterintuitive that we can strip the constant off the denominator, in particular for polynomially parameterized curves (so stripping it gives ). But projectively the denominator is just another coordinate so we can still do that. So if is the matrix from the previous section and , where is the of , then the *projective stripped coefficient matrix* of is just the submatrix of with the last column removed.

**Theorem B**

Let be a parametric curve in of degree . Suppose the projective stripped coefficient matrix of has rank . Then there are components of defining a stripped polynomial parametric curve in and a transformation function taking to .

**Proof**

We apply the algorithm to the projective stripped coefficient matrix of , obtaining a list of rows forming a basis of the row space of and a matrix of size , where the rows corresponding to this basis are replaced by rows of the identity matrix. Multiplying by the vector gives the parametric function . Appending a last column to with the constant terms of the original gives a transformation matrix . By the above comments it is easy to see that the defined by takes to .

One can paraphrase this theorem as: *Given a parametric curve ** of degree *, *there is an *, *a stripped parametric polynomial curve ** in ** and a * *so that the following diagram commutes*.

We ask the reader not to take this diagram literally in the case of a rational parameterization, as the domains of , , may not be the full spaces indicated. But if is a polynomial parameterization, then the domains are the full spaces and is an embedding.

Example 3: We illustrate this proof by fully working out the following degree-two curve in .

The decomposition can be easily done by hand.

So we add the constant row; remember that the constant in the last row is 1.

Theorem B tells us the composition of and is .

So this curve is contained in a plane.

Example 2 (continued): We now consider the rational parameterization of example 2.

We check that this lies in a two-dimensional plane in .

The step in the proof of Theorem B where we use to obtain the curve from the rational normal curve can also be done by using an affine transformation function obtained by adding the row and column to . In Example 2 we have the following.

This gives:

**Theorem C**

Let be a rational (or polynomial) curve parameterization of degree . Suppose the projective stripped coefficient matrix of has rank . Then the transformation function in Theorem B can be decomposed into transformation functions as in the following diagram.

Here is an affine transformation function of onto and is a possibly projective transformation of into . In particular, the parametric curve given by lies in a linear subset of of dimension less than or equal to the minimum of , , .

**Construction**

As in Theorem B, we let be the projective stripped matrix of and apply to to get of sizes and , respectively. Appending a row of zeros and then a column of zeros with last component 1 to make into an affine transformation matrix of size , let be the of . Appending the column of constants to , we get a transformation matrix of size . Then is the of . One can check that .

This recovers the known result [2] that every rational parameterization is a projective linear transformation of the rational normal curve, but here we have a constructive approach.

Example 4: For an easy but nontrivial (i.e. not conic) example we use the *piriform* [7].

Here , . Here is the stripped projective matrix.

A trivial application of in that is of full rank gives the following.

Notice here that , and In this case, the curve lies in , a two-dimensional space. The numbers , , are important values in describing a rational parameterized curve. Even though the transformation matrix for contains the identity matrix, it is not injective, which is typical in the case of a rational parameterization, even when , but this does not occur for a polynomial parameterization.

The recognition problem is: *given a parameterized curve ** and a point ** in *,* is ** in the curve*; that is, does there exist with ?

There are two obvious methods to solve this problem. The first is to directly solve the over-determined system using . This works surprisingly well, failing mostly with poorly conditioned systems for which the other methods following may not work well either. The biggest problem with this approach is that when it does not work, it gives a false negative to the recognition problem. One can, of course, solve component by component and see if any solutions are numerically close.

Example 2 continued.

So the first point is on the curve but the second point is not. In general, finding a common zero of a set of polynomial or rational equations is an interesting problem, but we do not consider that here.

The second method is to find a system of equations whose solution set is the Zariski closure of the point set . All that then needs to be done, in principle, is to evaluate this system at and check that the value is 0. We consider this issue in Section 5.

As we have seen, a parameterized curve in may lie in a linear subset of dimension less than Using Theorem C and the algorithm, we can get some additional information about the problem and perhaps reduce this to a problem in a smaller .

Example 5.

We would like to find out which, if any, of the following points are on this curve.

We first find the transformation functions. Here is the projective stripped coefficient matrix.

Apply .

Augment these matrices to get transformation matrices.

Generate some random points.

This says is not contained in any proper subspace, but the image of lies in a three-dimensional subspace of .

The points and do lie in the image of , so may be points on the curve, but we can eliminate . We find the fibers (preimage) of and in .

These conveniently are singleton points. Thus we have reduced this rational recognition problem in to a polynomial problem recognition problem in .

So but is not on the curve.

As mentioned, the motivation for this article is my work on implicitization of rational parametric space curves. In this section I only sketch my algorithms; details are in [4]. The key here is that by the material discussed, especially Theorem C, every such curve is simply a fractional linear transformation of the rational normal curve.

By *implicitization* I mean describing these parametric curves by way of algebraic equations. A problem that arises is that while one expects a curve in to be given by equations, this is often not enough to fully describe the curve pointwise or algebraically. The standard counterexample is the *twisted cubic*, which is just the rational normal curve of degree three, . A system of three equations in the variables , , describing the twisted cubic, given in [2], is . An exercise in [2] is to show that the zero set of any pair of these three equations contains not only the twisted cubic, but also a line, but note that the extra line in the last pair lies in the infinite plane of projective three-space.

Any implicitization problem has infinitely many possible answers, but the best answers are systems of equations that form an H-basis. This idea goes back to F. S. Macaulay in 1916, who was studying homogeneous equations, hence the “H”; basically in our context it means that any equation of total degree containing the parametric curve in its zero set is a polynomial combination , where the are in the H-basis and the are polynomials so that each term has total degree at most Thus for an H-basis, the ideal membership problem reduces to linear algebra.

If one has a system with zero set describing the parametric curve , then the Gröbner basis with respect to a degree ordering is an H-basis, perhaps larger than necessary. In practical terms one can simply use the following format.

In the case of the rational normal curve of degree , Harris [2] claims that using quadratic equations is sufficient, so we can proceed as follows: we first give a procedure finding the total degree of a polynomial of several variables.

Then we use the following code, say for .

This defines .

Likewise we get the following for .

The size of the H-basis is , which gets much larger than . The numbers are binomial coefficients and can be enumerated recursively; however, does not contain , so there is no obvious recursive construction of these bases.

In [4] I construct, for the fractional linear transformation given by an transformation matrix A, a transformation . This takes the system in with variables given by the list X to a system in with variable list Y such that for a solution of then is a solution of . Unfortunately, this works numerically and the user must provide a number that bounds the degrees of the polynomials used and a small tolerance , but for an appropriate choice of these parameters the system is often an H-basis if is.

Thus, a possible method for finding an implicit system describing the rational parameterized curve is to write it in the form , where is a , and use .

We use Example 4 to illustrate this.

Example 4 continued; define and so on.

Then we get the implicitization directly using a related function (fractional linear transformation, i.e. ) that takes not points to points but equation systems to equation systems. In [6] this is simple, because all transformation matrices used are invertible. In this context the 3×5 transformation matrix is not invertible, so finding the equation system for the image of a transformation function becomes quite involved. Essentially this is the subject of all of Chapter 2 in [4]. For instance, in this case we are compacting six equations into one.

The following non-executable code and result are copied from [4, Section 3.1]. For executable code, see the GlobalFunctionsMD.nb notebook of [4].

Once this is done we can check that this works.

We have shown how the Wolfram function simplifies the study of rational parameterized curves.

[1] | J. Cardano, The Great Art (T. R. Whitmer, trans.), Boston: MIT Press, 1968. |

[2] | J. Harris, Algebraic Geometry, A First Course, Springer Graduate Texts in Mathematics 133, New York: Springer, 1992. |

[3] | S. Wolfram, An Elementary Introduction to the Wolfram Language, Champaign, IL: Wolfram Media, 2015. See also reference.wolfram.com/language. |

[4] | B. H. Dayton. “Space Curve Book.” http://barryhdayton.space/SpaceCurves/spindex.html. (Sep 4, 2020). Code in barryhdayton.space/SpaceCurves/GlobalFunctionsMD.nb. |

[5] | S. Abhyankar, Algebraic Geometry for Scientists and Engineers, Providence, RI: American Mathematical Society, 1990. |

[6] | B. H. Dayton, A Numerical Approach to Real Algebraic Curves with the Wolfram Language, Champaign, IL: Wolfram Media, 2018. |

[7] | E. W. Weisstein. “Piriform Curve” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/PiriformCurve.html. |

B. Dayton, “Degree versus Dimension for Rational Parametric Curves,” The Mathematica Journal, 2020. https://doi.org/10.3888/tmj.22–3. |

Barry Dayton is the author of *A Numerical Approach to Real Algebraic Curves with the Wolfram Language* and is Professor Emeritus at Northeastern Illinois University in Chicago, IL. He lives in Ridgefield CT.

**Barry H. Dayton**

Department of Mathematics

Northeastern Illinois University

Chicago, Illinois 60625-4699

*barryhdayton.space*

*barry@barryhdayton.us*

The Wolfram Language has numerous knowledge-based built-in functions to support financial computations. This article introduces many built-in and other financial functions that are based on concepts and models covered in undergraduate-level finance courses. Examples are taken from a wide range of finance areas. They emphasize importing and visualization of data from many sources, valuation, capital budgeting, analysis of stock returns, portfolio optimization and analysis of bonds and stock options. We hope that all the functions selected in this article are very useful for analyzing real-world financial data. All examples provide a unique set of tools for users to engage with real-world financial data and solve practical problems. The feature of automatic data retrieval from online sources and its analysis makes all results reproducible without any modifications in the code. We hope this feature will attract new users from the finance community.

Finance is computational in nature and often involves the analysis and visualization of complex data, optimization, simulation and use of data for risk management. Without the proper use of technology, it is almost impossible to analyze these functions of modern finance. Moreover, the field of finance has become far more driven by data and technology since the 1990s, which has made large-scale data analysis the norm. Data-driven decision-making and predictive modeling are now the heart of every strategic financial decision. Since the publication of Varian [1, 2], Shaw [3] and Stojanovic [4], there have been many updates and new functions, but no new articles or books have been written to cover wide areas of computational finance. This article provides a comprehensive overview of functions related to finance and introduces many functions that are useful for real-world financial data analysis using Mathematica 12.

We have provided all the custom functions in the text so that users can make changes as they learn how to program in the Wolfram Language. Furthermore, we minimize the explanation of any financial concepts in this article as our focus is on introducing financial application of the Wolfram Language.

We begin by defining some symbols that are frequently used as input arguments of custom functions in the article. The article uses or , , and as arguments in many functions defined in this article. Most of these symbols are used as input in the built-in function . All these arguments must be specified in the format acceptable in the function. We use or to represent a company’s or companies’ stock ticker symbol or symbols. It could be a string or a list of strings. The format represents the start date of the sample period specified and represents the last date of the analysis period. Both must be specified as date objects in any date format supported by . Similarly, represents data frequency. It may include , , or . In the subsequent functions defined in this article, we will not describe them when they are used as arguments.

The article is organized into 13 sections:

1. this Introduction

2. importing and visualizing data from different sources

3. capital budgeting and business valuation

4. functions for the analysis of security returns

5. rolling-window performance analysis

6. financial application of optimization

7. decomposing the risk of a portfolio into its components

8. importing factor data and running factor models

9. computing different types of portfolio performance measures

10. technical analysis of stock prices

11. bond analysis

12. analyzing derivative products

13. concluding remarks

The most commonly used built-in functions for retrieving company-specific financial data are , and . For example,

imports Facebook’s financial statement data and

imports Facebook’s price-related data.

Similarly, the function can be used to get data about stocks and other financial instruments. or can be used to chart prices against time. or can be used to make interactive plots with additional features of adding different technical indicators. Other functions such as , , or can also be used to visualize financial data. In the remaining part of this section, we are going to show you how to import data from different sources and visualize it.

We download Apple’s return on assets (ROA), return on equity (ROE) and revenue growth over the period January 1, 2001, to January 1, 2019, and plot them.

Similarly, we define the function to compare any specified property of different companies. The function takes a list of stock symbols, beginning period, end period and a property to consider as its arguments.

We plot the revenue growth of Apple, Facebook, Walmart and Bank of America over the period January 1, 2000, to January 1, 2019.

Second, we import and visualize data from the Federal Reserve Bank of St. Louis, as it is one of the most important data sources when it comes to economic data. The built-in function can be used to request the data from the Federal Reserve Economic Data API. Its argument structure is:

where is a series ID or a list of IDs. It returns a time series containing data for the specified series. It is often of interest to plot the economic time series with the recession dates.

The function downloads and plots the selected series along with the shaded recession period. The function takes series ID, start date, end date and title as inputs and returns a graph. It uses recession indicators based on USREC (US recession) data from the National Bureau of Economic Research (NBER) for the United States from the period following the peak through the trough to indicate the recession period. The Federal Reserve Bank of St. Louis may require the API key to download its data. The API key can be obtained freely by creating a user account at https://fred.stlouisfed.org (click “my account” and follow the instructions).

Now we can download any series and plot it. For example, we download and plot the Leading Index for the United States (USSLIND) over the period January 30, 1990, to January 30, 2019. Please use the API key 207071a5f2e90e7816259d3c32c1ab81 if needed. The shaded regions indicate recession periods.

We download and plot the historical real S&P 500 prices by month (MULTPL/SP500_REAL_PRICE _MONTH) over the period March 31, 1975, to May 30, 2019.

Finally, we show how to create a dataset. The built-in function is very useful for organizing large or small sets of data.

The function can be used to get a company’s fundamental data. After the data is stored, we can organize and analyze the data.

For example, we download return on assets (), return on equity () and revenue growth () for Apple Inc. (AAPL) over the period January 1, 2000, to January 1, 2019, and make a dataset. After the dataset is constructed, we can pull data and do further analysis using a rich set of built-in Wolfram knowledge.

Basic concepts used in common financial decision making are important for learning and understanding the finance discipline. Many functions such as , , , and are directly related to finance. Other functions such as and can be used to find one of the unknowns when relevant information is given. All these functions are useful for solving time value of money, capital begetting and business valuation problems. The Mathematica documentation provides numerous examples of how to use these functions. In this section, we are going to focus on a few examples concerning loan amortization, capital budgeting and business valuation.

A loan amortization table is often used to visualize periodic payments of the loan, loan balance and payment breakdown into principle payment and interest payment. The function returns an amortization table given its input arguments. It takes four arguments:

: current value of loan amount

: loan term in years

: annual percentage interest rate

: frequency of loan payment per year: 12 for monthly payment, 1 for annual payment, and so on

: (an optional argument) future value of the loan amount; if no value for is provided, the future value of the loan is assumed to be zero

Using this function, we compute an amortization table for a loan of $40,000 with 1-year loan term, 5% APR paid monthly.

The most commonly used decision tools in capital budgeting are net present value (NPV), internal rate of return (IRR), modified internal rate of return (MIRR) and profitability index (PI). These are defined in terms of the cash flows , , the discount rate and the reinvestment rate by:

We use the built-in Mathematica functions , and in the function to compute these measures. It takes cash flows (a list), discount rate and reinvestment rate as its arguments.

We illustrate the use of the function with an example. Say a project requires a $50,000 initial investment and is expected to produce, after tax, a cash flow of $15,000, $8,000, $10,000, $12,000, $14,000 and $16,000 over the next six years. The discount rate is 10% and the reinvestment rate is 11%. We compute the project’s NPV, IRR, MIRR and PI.

One of the most widely used business valuation models is the discounted cash flow model, in which the value of any asset is obtained by discounting the expected cash flows on that asset at a rate that reflects its riskiness. In its most general form, the value of a company is the present value of the expected free cash flows the company can generate in perpetuity. Because we cannot estimate free cash flows (FCF) in perpetuity, we generally allow for a period where FCF can grow at extraordinary rates, but we allow for closure in the model by assuming that the growth rate will decline to a stable rate that can be sustained forever at some point in the future. If we assume that the discount rate is the weighted average cost of capital (WACC), FCF grows at the rate of per year and that the last year’s free cash flow is , then the value of the firm can be defined as

If we assume that FCF grows at the rate for the next years and at the rate thereafter, then the value of the firm can be written as

(1) |

We implement formula (1) with that takes five arguments:

1. , last year’s free cash flows

2. , the annual growth rate of free cash flows in the first growth period

3. , the number of years in the first growth period

4. , the stable growth rate

5. , the weighted average cost of capital

Suppose that a company has $100 as its past year’s CFC, its FCF is expected to grow at 5% for the next five years and 2.5% thereafter and that its WACC is 9.5%. We compute its value.

The most practical approach is to assume that a company goes through three phases: growth, transition and maturity. First, a company’s growth increases. In the transition phase, the growth rate decreases. In the mature phase, a company grows at the same rate as that of the overall economy.

Assume that the FCF is positive and grew at the rate last year. Assume further that it grows at the higher rate each year for the next years and that after years, it declines at the rate each year for the next years. Finally, assume the company grows at the stable positive rate per year after years and that the cost of capital is represented by WACC. Then the value of the firm can be written as

(2) |

We implement formula (2) with that takes eight arguments:

1. , last year’s free cash flows

2. , last year’s FCF growth rate

3. , the incremental growth rate in the high growth period

4. , the number of years in the high growth period

5. , the declining growth rate in the translational growth period

6. , the number of years in the translational growth period

7. , the stable growth rate in the maturity growth period

8. , the weighted average cost of capital

To apply the function, consider a company in the early stage of its life cycle, assuming that the company experienced 10 percent growth in the past year. The company is expected to grow by 8% more each year for the next 7 years and its growth will start to decline by 5% each year after the seventh year for 5 years. After 12 years, the company is expected to grow at the same rate as that of the overall economy, which is 2.5% per year. Suppose the past year’s CFC was $100 million and the weighted average cost of capital is 9.5%. We compute the value of the company.

When using , we assume that the growth rate in the stable phase is never negative. Therefore, do not assume the declining growth rate to be too high. Otherwise, the growth rate of FCF in the maturity phase may be negative.

There are various methods in analyzing historical stock prices and returns data. We can use different kinds of charts and graphs as well as descriptive statistics. Similarly, it is also important to understand whether the stock returns distribution is normal. In the next two subsections, we explain some of the most common charts and descriptive measures.

Commonly used charts for historical performance analysis are time series plot of prices, normalized prices (historical prices divided by the price at the beginning), continuous draw-downs (cumulative continuous returns) and cumulative returns. The function l takes four arguments as defined in Section 1 and returns four different plots: historical prices, the normalized price, continuous draw-downs and cumulative returns.

We can apply l to any symbol and period. For example, we plot the historical stock prices and returns of Walmart Inc. (WMT) over the period October 10, 2018, to June 7, 2019.

A histogram and an empirical plot of kernel density estimates are often used to describe the general shape of the data. The function takes four arguments as defined in Section 1 and returns density and histogram plots of returns.

Using the function, we download the daily closing price of the S&P 500 index over the period October 1, 2000, to October 1, 2019, and plot the histogram, the empirical density function and the density function of a normal distribution with the same mean and variance.

Many built-in functions can be used to compute different descriptive statistics. These descriptive statistics describe properties of distributions, such as location, dispersion and shape. The most common measures are computed by :

• holding period return

• average return

• geometric mean return

• cumulative returns

• standard deviation

• minimum return

• maximum return

• skewness

• kurtosis

• historical value at risk

• historical conditional value at risk

For example, we compute the descriptive statistics for Walmart Inc. (WMT) using monthly returns over the period October 10, 2010, to June 6, 2019.

It is also informative to examine the historical performance of an individual stock. In such an analysis, we calculate monthly statistics using daily returns and report them on a monthly basis. We define the function to download historical stock prices, compute desired statistics and return a dataset. The function takes four arguments: stock ticker symbol, start date, end date and a statistical function such as , or , and returns a dataset. For the function to work, it requires more than two years of data.

For example, we compute the monthly cumulative returns for Walmart Inc. (WMT) using daily returns over the period October 1, 2010, to June 30, 2019.

Once we compute the statistics, we can take specific columns by specifying their names.

We are often interested in knowing whether returns data follows a normal distribution because understanding whether stock returns are normal or not is very important in investment management. One way to check whether returns are normally distributed or not is to compare the empirical quantiles of the data with normal distribution. The function can be used to produce quantile-quantile plots. Many other built-in functions can help to assess whether returns are normally distributed.

The function can be used to test whether data is normally distributed and can also be used to assess the goodness of fit of data to any distribution.

As a majority of financial data is multivariate, it is advantageous to perform comparative analysis of multiple security returns. In some cases, one has to compare one series with another. In other cases, many variables might have to be simultaneously measured to capture the complex nature of the relationship among variables. Comparing the complexities of these factors gives the analyst a more detailed account of the relationships between selected returns, thus allowing for a better interpretation of their values and behaviors. In this section, we first compare the performance of one asset with another using graphs, then compute descriptive statistics as well as correlation matrices.

Two most commonly used graphs for comparing historical performance of more than one stock/ETF are time series plots of normalized prices and cumulative returns. The next two functions take four arguments as defined in Section 1 and compute normalized prices and cumulative returns.

We get normalized prices and cumulative returns and plot them for three stocks (Facebook, Inc. (FB), Costco Wholesale Corporation (COST) and Walmart Inc. (WMT)) over the period May 1, 2000, to May 30, 2019.

Besides graphs, we can also compute descriptive statistics and compare their performance. We define a function for that purpose. It takes four arguments as defined in Section 1 and returns a table with different types of descriptive statistics.

For example, we download historical data and compute different descriptive statistics for three stocks (Walmart Inc. (WMT), Apple Inc. (AAPL) and Microsoft Corporation (MSFT) ) using monthly data over the period January 1, 2010, to March 30, 2019.

Similarly to how we calculated an individual stock’s monthly statistics in Section 4.1, we define the function to compute monthly statistics for more than one stock given the arguments: stock ticker symbols, start date, end date and a statistical function such as , or .

For example, we compute the monthly cumulative returns for four stocks (Walmart Inc. (WMT), Apple Inc. (AAPL), Microsoft Corporation (MSFT) and Netflix, Inc. (NFLX)) over the period January 1, 2010, to June 30, 2019, and create a dataset. The first column represents year and month, the first four digits for the year and the last two digits for the month.

Similarly, box-and-whisker charts, paired histograms, paired smooth histograms and matrix scatterplots are often used to examine multivariate data. The function can be used to make a box plot that gives a glimpse of the distribution of the given dataset. You can see the statistical information by hovering over the boxes in the plot. The and functions are used to create paired histogram and smooth distribution plots. They can be used to compare how two datasets are distributed. The function from the Statistical Plots package can be used to make scatter plots of multivariate data. It creates scatter plots comparing the data in each column against other columns. More complex analysis of multivariate data can be done using functions from the Multivariate Statistics package. The package contains functions to compute descriptive statistics for multivariate data and distributions derived from the multivariate normal distribution. All these functions are well explained in the official documentation.

Rolling-window performance analysis is a simple technique to access variability of the statistical performance measures. For example, if we want to access the stability of mean or standard deviation of returns on a stock over time, we can choose rolling window (the number of consecutive observations per rolling window), estimate the mean or standard deviation and plot series of the estimates. A little fluctuation is normal but large fluctuations indicate a shift in the values of the estimate. Built-in functions such as , and are useful for rolling-window performance analysis. In this section, we are going to show a few examples of how to compute rolling-window-based performance statistics.

We define that can be used to plot rolling-window statistics given its five inputs: stock ticker symbol, start date, end date, size of window in days and function to apply, which can be any built-in or user-defined function.

For example, we compute and plot the 90-day rolling mean to standard deviation ratio on Walmart’s daily stock returns over the period January 1, 2001, to March 30, 2019.

Similarly, we define the function to compute the rolling correlation of two series and apply it together with to the desired data.

We use the function to plot the 90-day rolling correlation of daily returns on two stocks, WMT and COST, for the period from March 30, 2009, to March 30, 2019.

Sometimes, it is also useful to store these time-varying descriptive statistics as a dataset so that we can use them in the subsequent analysis. The function , given its input, computes the geometric mean, standard deviation and the ratio of the arithmetic mean to the standard deviation on a rolling-window basis.

For example, we compute the 90-day rolling-window geometric mean (GM), standard deviation (Std. Dev.) and arithmetic mean to standard deviation ratio (AM/Std. Dev.) of Walmart’s daily stock returns over the period July 1, 2018, to October 30, 2019. You can scroll through the dataset.

Mean-variance analysis is one of the foundations of financial economics. Portfolio optimization is essential, whether it be in professional or personal financial planning. In this section, we are going to show how to implement the most commonly used optimization techniques in finance using historical returns. We want to point out that future returns on investment depend on expected returns and other conditioning information, not on the past returns. Past returns are used only for illustration and do not guarantee future returns.

Define the following variables:

• is the risk-free rate

• is the proportion of wealth invested in security

• is the average return on security

• is the variance of security

• is the covariance between securities and

• is the correlation between securities and

Then we define the vectors of mean returns and weights and the covariance matrix:

The formulas for the portfolio mean and variance are and , respectively. The corresponding Mathematica code is and .

In order to compute portfolio statistics, we need returns data. We can use the function to download historical returns data. It takes four arguments as defined in Section 1 and gives a matrix of returns. Most functions in this section use the function, so run it before you run other functions.

To compute basic portfolio statistics such as portfolio mean, variance, standard deviation and Sharpe ratio, we can use , which takes six arguments. The first four arguments are as defined in Section 1 and the other two are a list of weights and the optional risk-free rate.

For example, we compute the portfolio mean, variance, standard deviation and Sharpe ratio for the portfolio that consists of the stock returns of five companies: Apple (AAPL), Walmart (WMT), Boeing (BA), 3M and Exxon Mobil (XOM), using monthly returns over the period January 1, 2009, to May 30, 2019.

The function plots the Markowitz portfolio frontier; it takes a matrix of returns obtained from as its only argument. The function uses the concept that any two efficient portfolios are enough to establish the whole portfolio frontier, as first proved by Black [5]. It accepts any option.

For example, we plot the portfolio frontier for the portfolio that consists of stock returns of five companies: Apple (AAPL), Walmart (WMT), Boeing (BA), 3M and Exxon Mobil (XOM), using monthly returns over the period January 1, 2009, to May 30, 2019.

Next, we solve two kinds of portfolio problems: global minimum variance portfolio and tangency portfolio. In terms of the notation defined earlier in this section, the global minimum variance portfolio can be obtained by minimizing subject to and solving for . Its solution can be obtained with the built-in Mathematica function .

The function computes weights, returning the portfolio allocation on stocks considered for a global minimum variance portfolio.

We compute the global minimum variance portfolio weights using the monthly stock returns of five companies: Apple (AAPL), Walmart (WMT), Boeing (BA), 3M and Exxon Mobil (XOM), over the period January 1, 2009, to May 30, 2019.

Similarly, the tangency portfolio can be obtained by maximizing , where is the risk-free rate (a constant in this case), subject to and solving for . The solution uses the built-in Mathematica function . The function computes the tangency portfolio weights given its five inputs, four as defined in the Section 1 and the risk-free rate.

Assuming a monthly risk-free rate of 0.1667 percent and using monthly data over the period January 1, 2009, to May 30, 2019, we calculate the tangency portfolio weights for our portfolio of five stocks: (Apple (AAPL), Walmart (WMT), Boeing (BA), 3M and Exxon Mobil (XOM)).

Portfolio optimization using the Wolfram Language is very flexible. We can formulate any kind of portfolio and use built-in functions such as , or to get numerical solutions to the portfolio problem.

In this section, we concentrate on how to decompose a measure of portfolio risk (portfolio standard deviation) into risk contribution from individual assets included in the portfolio. It helps to see how individual assets influence portfolio risk. When risk is measured by standard deviation, we can use Euler’s theorem for decomposing risk into asset-specific risk contribution. Euler’s theorem provides an additive decomposition of a homogeneous function. For reference, see Campolieti and Makarov [6]. Using Euler’s theorem, we can define the percentage contribution to portfolio standard deviation of an asset as , where is the marginal contribution of the asset, is the weight of the asset, and is the portfolio standard deviation.

We define the function for portfolio risk decomposition. It takes five arguments, four arguments as defined in Section 1 and a list of portfolio weights; it returns a bar chart representing the individual asset’s risk contribution to the portfolio standard deviation.

We calculate the risk contribution of each asset in a portfolio that consists of five stocks using the historical monthly returns over the period January 30, 2010, to May 30, 2019 and make a bar chart.

Currently, factor models are widely accepted and used in finance to construct portfolios, to evaluate portfolio performance and for risk analysis. Factor models are regression models. We can use the built-in function to estimate and evaluate the appropriateness of the regression models. In addition, we can download all factor data directly from Prof. Kenneth French’s data library. Before we apply factor models to real-world data, we need data in the form

where

is a matrix of values of independent variables and

is a vector of values of a dependent variable.

Commonly used factor models are summarized in Table 1. We can find more about the factor models in Fama and French [7]. We use the following notation: for excess return on the security or portfolio, MKT for excess return on the value-weighted market portfolio, SML for return on a diversified portfolio of small-capitalization stocks minus the return on a diversified portfolio of large-capitalization stocks, HML for difference in the returns on diversified portfolios of high-book-to-market stocks and low-book-to-market stocks, MOM for difference in returns on diversified portfolios of the prior year’s winners and losers, RMW for difference between the returns on diversified portfolios of stocks with robust and weak profitability, CMA for difference between the returns on diversified portfolios of the stocks of low and high investment firms, for risk-adjusted return on security or portfolio and , , , , and for betas or factor loadings.

Before we estimate these models using real data, we need factors data. We download five factors and the momentum factor data from Kenneth French’s website (http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) and define variables to store them.

The function takes only two arguments, the start date and end date, and returns time series of all factors data. The start date and end date must be specified as date objects.

Similarly, the function can be used to get a stock’s monthly returns data. It takes three arguments, a ticker symbol of any publicly traded company, and the start and end dates for the analysis period.

Using and , we define to combine factors and returns data. The takes four arguments (the symbol of the stock for which we want to estimate the factor model, the start date, the end date and an integer that represents the number of factors) and returns a data matrix suitable for .

Next, we estimate different factor models for Apple’s stock using monthly data from October 1, 2008, to March 30, 2019. We estimate the capital asset pricing model (CAPM) with market factor (MKT).

We estimate the Fama–French three-factor model with market, size and value factors (MKT, SMB, HML).

Similarly, we estimate the Carhart four-factor model with market, size, value and momentum factors (MKT, SMB, HML, MOM).

Finally, the Fama–French five-factor model with market, size, value, profitability and investment factors (MKT, SMB, HML, RMW, CMA) is estimated as follows.

Once the model is estimated, we must access different properties related to data and fitted models. To assess how well the model fits the data and how well the model meets the assumptions, there are many built-in functions.To learn more about obtaining diagnostic information, see the properties of .

There are various measures to evaluate the performance and risk of portfolios. Most of these measures are used to evaluate a portfolio of interest against a chosen benchmark, by taking a snapshot of the past or considering the entire historical picture. We will compute some common metrics often employed by investors while analyzing performance measures. All the computations are based on the formulas developed in Bacon [8]. Most common measures are summarized in the function . The function takes four arguments:

• ticker symbols for the test assets and the benchmark asset

• periodic risk-free rate

• time period

• frequency of data

We repeat the definition of the function defined earlier to make this section self-contained.

The next example uses stocks, although these measures are used to evaluate the performance of portfolios, mutual funds and exchange-traded funds. We evaluate the performance of (Walmart Inc. (WMT), Apple Inc. (AAPL) and Microsoft Corporation (MSFT) ) against the S&P 500 index (^SPX) using 0.0016 as a monthly risk-free rate and month as the data frequency over the period January 1, 1995, to March 30, 2019.

computePortfolioPerformanceTable[{"WMT","AAPL","MSFT"},"^SPX",0.0016, {1995,01,01},{2019,03,30},"Month"]//Text

Short-term traders commonly use interactive graphics and technical indicators of stock prices to profit from stocks that may be overbought or oversold. Much is based on market sentiment, but also market timing. When a stock is oversold, the price is low and people want to buy. In comparison, when a stock is overbought, the price is between its normal range and higher, and people do not want to buy, or may want to short sell. Many technical indicators are used to determine a given stock’s peak or bottom price and how to take advantage of that information. Three of the most useful functions for technical analysis of stocks are , and . The documentation provides a comprehensive set of examples on how to use them.

We show one example of how to use the function and one example of how to use .

You can choose the chart type and from over 100 technical indicators, which are divided into eight groups:

• basic

• moving average

• market strength

• momentum

• trend

• volatility

• band

• statistical

displays all the available technical indicators.

Here is the basic format:

Alternatively, use:

The time series data must be of the form:

The historical open, high, low, close and volume data retrieved from can also be used as data input. The function has many options that can be used to enhance graphics. We produce a chart using historical prices of Apple’s stock and volume over the period January 1, 2018, to March 30, 2019.

The top of the chart shows a plot of historical prices and 50-day and 200-day moving averages. The second part shows the historical volume. The last two parts show plots of two indicators, the commodity channel index () and the relative strength index ().

A good introduction to technical indicators can be found in standard references, including at Fidelity Learning Center [9].

The function provides a point-and-click interactive chart, with a similar setup:

Alternatively, use:

For example, we make a chart showing prices, volume and indicators for historical data of Apple’s stock over the period January 1, 2018, to March 30, 2019. The function provides a user-friendly environment where you can drag a slider to view different parts of the chart or you can choose different indicators with point-and-click.

A bond is a long-term debt instrument in which a borrower agrees to make payments of principal and interest, on specific dates, to the holders of the bond. When it comes to analysis and pricing of bonds and computing returns, convexity and duration are important concepts. When a bond is traded between coupon payment dates, its price has two parts: the quoted price and the accrued interest. The quoted price is net of accrued interest, and does not include accrued interest. Accrued interest is the proportion or share of the next coupon payment. The full price is the price of a bond including any interest that has accrued since issue of the most recent coupon payment. Similarly, yield to maturity is the rate of return earned on a bond if it is held to maturity. Duration is a measure of the average length of time for which money is invested in a coupon bond. Convexity estimates the change in the bond price given a change in the yield to maturity, assuming a nonlinear relationship between the two. The built-in functions and can be used to compute various properties including value of the bond, accrued interest, yield, duration, modified duration, convexity, and so on. This section provides a few examples of how to use and how the concepts of bond convexity and duration can be used in bond portfolio management.

We discuss zero-coupon bonds first. The zero-coupon bond does not make coupon payments. The only cash payment is the face value of the bond on the maturity date. The yield to maturity () for a zero-coupon bond with periods to maturity, current price and face value can be obtained by solving .

For example, we compute the yield to maturity of a zero-coupon bond with a $10,000 face value, time to maturity 4 years and current price $9,662 using and .

Similarly, can also be used to compute the yield to maturity of a nonzero coupon payment bond.

For example, we compute the yield to maturity of a $1,000 par value 10-year bond with 5% semiannual coupons issued on June 20, 2013, with maturity date of June 20, 2023, selling for $920 on September 15, 2018.

can also be used to compute the price, duration, modified duration and convexity of a bond. For example, we compute those values for a bond with 8% yield, 8% annual coupons, 10-year maturity and $1,000 face value.

There are different approaches to bond portfolio management. We concentrate here on a liability-driven portfolio strategy, in which the characteristics of the bonds that are held in the portfolio are coordinated with those of the liabilities the investor is obligated to pay. The matching techniques can range from an attempt to exactly match the levels and timing of the required cash payments to more general approaches that focus on other investment characteristics, such as setting the average duration or convexity of the bond portfolio equal to that of the underlying liabilities. One specific example would be to construct the portfolio so that the duration of the bond portfolio is equal to the duration of cash obligation and the total money invested in the bond portfolio today is equal to the present value of the future cash obligations.

To illustrate the concept of bond portfolio management, assume that we have an obligation to pay $1,000,000 in 10 years and there are two bonds available for investment. The first bond matures in 30 years with $100 face value and annual coupon payment of 6%. The second bond matures in 10 years with $100 face value and annual coupon payment of 5%. The yield to maturity is 9% on both bonds. We can decide on how much to invest in each bond so that the overall portfolio is immunized against changes in the interest rate.

We compute the duration of each bond using , which gives that the duration of bond 1 is 11.88 and that of bond 2 is 6.75. Assuming that the proportion of money invested in bonds 1 and 2 is and , the immunized portfolio is found by solving the simultaneous equations:

These two equations can be solved using .

The result shows how much money should be allocated to each bond. More examples can be found in Benninga [10]. A more general approach to bond portfolio management can be solved by using linear programming. It is beyond the scope of this article to introduce linear programming.

The most popular options pricing models are the binomial model and the Black–Scholes–Morton option pricing formulas for European options. In the next two subsections, we discuss these models and their implementation.

Following the notation from Hull [11], define:

In terms of these variables, we can define:

• time per period

• up factor

• down factor

• probability of an up move

• probability of a down move

• stock price at node

• payoff from a European call

• payoff from a European put

In the risk-neutral world, the price of the call and put using the -period binomial options pricing model can be computed as:

The functions and calculate the prices of European call and put options; they output the option price. Each function takes six arguments as defined at the beginning of this subsection.

We use these functions to find the prices of call and put options when the current stock price is $50, the strike price is $45, the annual volatility is 40%, the risk-free rate is 10%, the time to maturity is half a year and the total number of up and down moves is 500.

Similar to the binomial option pricing formula defined in the last section, we follow Hull [11] to explain the Black–Scholes–Morton option pricing formulas. Define the variables:

Furthermore, assume that is the standard normal density function (where the base of the natural logarithms) and let be the standard normal cumulative distribution function, so that denotes the probability that a random variable drawn from a standard normal distribution is less than . Then the call and put values can be computed as

,

,

where

,

.

Table 2 summarizes the price sensitivity measures of call and put options (denoted by Greek symbols) with respect to their major price determinants; here stands for value of the option.

The built-in Mathematica function computes the values and other price sensitivity measures for common types of derivative contracts. The function can compute the value of an option, any of delta, gamma, theta and vega, as well as the implied volatility of the contract. The Mathematica documentation provides many examples of how to use . Here are the first 10 of a list of 101 available contracts.

For example, we compute the price and Greeks of the European-style put option with strike price $50, expiration date 0.3846 years, interest rate 5%, annual volatility 20%, no annual dividend and current price $49.

Similarly, we compute the implied volatility of an American-style call option with the same values of the parameters.

One interesting application of is to get real-world data and compute related measures related to options. We define that computes the theoretical value of options and their Greeks, which takes five arguments:

• ticker symbol ()

• strike price ()

• expiration date ()

• risk-free rate ()

• either or ()

We compute the European-style option parameters for Boeing (BA), assuming that the option expires on December 28, 2020, with exercise price $145 and risk-free rate 0.0187. Make sure that the expiration date is later than the current date, since uses historical data.

We strongly encourage you to explore the built-in or online documentation for the powerful function.

The article provides a brief overview of built-in functions and introduces many functions especially designed for analysis of financial data. In particular, we have focused on functions that are more relevant to introductory computational financial concepts. We emphasize importing company fundamental data and its visualization, analysis of individual stocks and portfolio returns, factor models and the use of built-in functions for bond and financial derivative analysis. The functions we have provided are just a few examples. The Wolfram Language can do much more than what we have shown in this article. Interested readers can start exploring the Wolfram Language via Mathematica’s extensive documentation.

[1] | H. R. Varian, ed., Economic and Financial Modeling with Mathematica, New York: Springer-Verlag, 1993. |

[2] | H. R. Varian, ed., Computational Economics and Finance Modeling and Analysis with Mathematica, New York: Springer-Verlag, 1996. |

[3] | W. Shaw, Modelling Financial Derivatives with Mathematica, Cambridge, UK: Cambridge University Press, 1998. |

[4] | S. Stojanovic, Computational Financial Mathematics using MATHEMATICA Optimal Trading in Stocks and Options, Boston: Birkhäuser, 2003. |

[5] | F. Black, “Capital Market Equilibrium with Restricted Borrowing,” Journal of Business, 4(3), 1972 pp. 444–455. www.jstor.org/stable/2351499. |

[6] | G. Campolieti and R. Makarov, Financial Mathematics: A Comprehensive Treatment, London: Chapman and Hall/CRC Press, 2014. |

[7] | E. F. Fama and K. R. French, “A Five-Factor Asset Pricing Model,” Journal of Financial Economics, 116(1), 2015 pp.1–22. doi:10.1016/j.jfineco.2014.10.010. |

[8] | C. R. Bacon, Practical Risk-Adjusted Performance Measurement, 2nd ed., Hoboken: Wiley, 2013. |

[9] | Fidelity Learning Center. “Technical Indicator Guide.” (Jul 29, 2020) www.fidelity.com/learning-center/trading-investing/technical-analysis/technical-indicator-guide/overview. |

[10] | S. Benninga, Financial Modeling, 4th ed., Massachusetts: The MIT Press, 2014. |

[11] | J.C. Hull, Options, Futures and Other Derivatives, 10th ed., New York: Pearson Education Limited, 2018. |

R. Adhikari, “Foundations of Computational Finance,” The Mathematica Journal, 2020. https://doi.org/10.3888/tmj.22–2. |

Ramesh Adhikari is an assistant professor of finance at Humboldt State University. Prior to coming to HSU, he taught undergraduate and graduate students at the Tribhuvan University and worked at the Central Bank of Nepal. He was also a research fellow at the Osaka Sangyo University, Osaka, Japan. He earned a Ph.D. in Financial Economics from the University of New Orleans. He is interested in the areas of computational finance and high-dimensional statistics.

**Ramesh Adhikari**

*School of Business, Humboldt State University
1 Harpst Street
Arcata, CA 95521*

The metric structure on a Riemannian or pseudo-Riemannian manifold is entirely determined by its metric tensor, which has a matrix representation in any given chart. Encoded in this metric is the sectional curvature, which is often of interest to mathematical physicists, differential geometers and geometric group theorists alike. In this article, we provide a function to compute the sectional curvature for a Riemannian manifold given its metric tensor. We also define a function to obtain the Ricci tensor, a closely related object.

A *Riemannian manifold* is a differentiable manifold together with a Riemannian metric tensor that takes any point in the manifold to a positive-definite inner product function on its *tangent space*, which is a vector space representing geodesic directions from that point [1]. We can treat this tensor as a symmetric matrix with entries denoted by representing the relationship between tangent vectors at a point in the manifold, once a system of local coordinates has been chosen [2, 3]. In the case of a parameterized surface, we can use the parameters to compute the full metric tensor.

A classical parametrization of a surface is the standard parameterization of the sphere. We compute the metric tensor of the standard sphere below.

This also works for more complicated surfaces. The following is an example taken from [4].

Denoting the coordinates by , we can then define , where the are functions of the coordinates ; this definition uses Einstein notation, which will also apply wherever applicable in the following. From this surprisingly dense description of distance, we can extract many properties of a given Riemannian manifold, including *sectional curvature*, which will be given an explicit formula later. In particular, two-dimensional manifolds, also called *surfaces*, carry a value that measures at any given point how far they are from being flat. This value can be positive, negative or zero. For intuition, we give examples of each of these types of behavior.

The sphere is the prototypical example of a surface of positive curvature.

Any convex subspace of Euclidean space has zero curvature everywhere.

The monkey saddle is an example of a two-dimensional figure with negative curvature.

Sectional curvature is a locally defined value that gives the curvature of a special type of two-dimensional subspace at a point, where the two dimensions defining the surface are input as tangent vectors. Manifolds may have points that admit sections of both negative and positive curvature simultaneously, as is the case for the Schwarzchild metric discussed in the section “Applications in Physics.” An important property of sectional curvature is that on a Riemannian manifold it varies smoothly with respect to both the point in the manifold being considered and the choice of tangent vectors.

Sectional curvature is given by

where .

In this formula, represents the purely covariant Riemannian curvature tensor, a function on tangent vectors that is completely determined by the . Both and the are treated more thoroughly in the following section, as well as in [1]. Some immediate properties of the curvature formula are that is symmetric in its two entries, is undefined if the vectors and are linearly dependent, and does not change when either vector is scaled. Moreover, any two tangent vectors that define the same subspace of the tangent space give the same value. This is important because curvature should only depend on the embedded surface itself and not how it was determined.

While we are primarily concerned with Riemannian manifolds, it is worth noting that all calculations are valid for pseudo-Riemannian manifolds, in which the assumption that the metric tensor is positive-definite is dropped. This generalization is especially important in areas such as general relativity, where the metric tensors that represent spacetime have a different signature than that of traditional Riemannian manifolds. We explore this connection more in the section “Applications in Physics.”

For a differentiable manifold, an *atlas* is a collection of homeomorphisms, called *charts*, from open sets in Euclidean space to the manifold, such that overlapping charts can be made compatible by a differentiable transition map between them. Via these homeomorphisms, we can define coordinates in an open set around any point by adopting the coordinates in the corresponding Euclidean neighborhood. By convention, these coordinates are labelled , and unless important, we omit the point giving rise to the coordinates. In some cases of interest, it is possible to adopt a coordinate system that is valid over the whole manifold.

From such a coordinate system, whether local or global, we can define a basis for the tangent space using a *coordinate frame* [5]. This will be the basis consisting of the partial derivative operators in each of the coordinate directions, that is, . Considering the tangent space as a vector space, this set is sometimes referred to in mathematical physics as a *holonomic basis* for the manifold. We use this expression then to define the symmetric matrix by the following expression for :

From here, we define one more tensor of interest for the purposes of calculating curvature. Using Einstein notation, the Riemannian curvature tensor is

The various are the *Christoffel symbols*, for which code is presented in the next section. In light of these definitions, we recall sectional curvature once again from the introduction as the following, now considering the special case of the tangent vectors being chosen in coordinate directions:

.

The norm in the denominator is the norm of the tangent vector associated to that partial derivative in the holonomic basis, which is induced by the associated inner product from .

We now create functions to compute these tensors and sectional curvature itself. These values depend on a set of coordinates and a Riemannian metric tensor, so that will be the information that serves as the input for these functions. Coordinates should be a list of coordinate names like , and should be a square symmetric matrix whose size matches the length of the coordinate list. Some not inconsiderable inspiration for the first half of this code was taken from Professor Leonard Parker’s Mathematica notebook “Curvature and the Einstein Equation,” which is available online as a supplement to [6].

We can now define a function for the Christoffel symbols from the previous section. This calculation consists of taking partial derivatives of the metric tensor components and one tensor operation. In Mathematica, the dot product, typically used for vectors and matrices, is also able to take tensors and contract indices.

We can now use the formulas stated in the second section to define both the covariant and contravariant forms of the Riemannian curvature tensor.

We perform one more tensor operation using the dot product to transform our partially contravariant tensor into one that is purely covariant. Both of these will be called at various points later.

The full function to return the sectional curvatures consists of computing a scaled version of the covariant Riemannian metric tensor.

The output consists of a symmetric matrix with zero diagonal entries representing curvatures in the coordinate directions. These diagonal values should not be taken literally, as curvature is undefined given two linearly dependent directions. While this of course does not give all possible sectional curvatures, one may perform a linear transformation on the basis in order to obtain a new metric tensor with arbitrary (linearly independent) vectors as basis elements. From here, the new tensor may be used for computation.

Here is an example with diagonal entries that are functions of the last coordinate.

Any good computation in mathematics must stand to scrutiny by known cases, so we evaluate our function with the input of hyperbolic 3-space. The two in the exponent should be imagined as the squaring of the exponential function.

Checking with [7] verifies that this is indeed a global metric tensor for hyperbolic 3-space. As such, we know that it has constant sectional curvature of (recall the diagonal entries do not represent any curvature information).

Continuing with the hyperbolic space metric tensor, it is a well-known result in hyperbolic geometry that one is able to scale these first two dimensions to vary the curvature and produce a *pinched curvature* manifold.

If we allow for new constant coefficients in the exponents for positive real numbers and , then we should see explicit bounds on the curvatures.

In this vein, the Riemannian structure for *complex hyperbolic space* is similar to the real case, except for a modification to allow for complex variables.

In this setting, a formula for the metric tensor valid over the entire manifold is available from [8], among other places.

One can verify that, although not constant, the entries in the upper-left block are always bounded between and for positive . This result agrees with sectional curvature in complex hyperbolic space, and so serves as an example of sectional curvature computation where the underlying tensor is not diagonal. A careful review of [8] reminds us that this metric is only well-defined up to rescaling, which can change the values of the sectional curvature. What does not change, however, is the ratio of the largest and smallest curvatures, which are always exactly 4. The introduction in [9] takes considerable care to remind us that definitions change between curvatures in , and even .

Perhaps the most interesting applications of differentiable manifolds and curvature to physics lie in the area of relativity. This discipline uses the idea of a *Lorentzian manifold*, which is defined as a manifold equipped with a Lorentzian metric that has signature instead of the signature for four-dimensional Riemannian manifolds. As noted in the introduction, however, this has no impact on the computations of sectional curvature. Examples of such Lorentzian metrics include the *Minkowski flat spacetime metric*; is the familiar constant speed of light.

Justifying the name of *flat* spacetime, our curvature calculation guarantees all sectional curvatures are identically zero.

More generic Lorentzian manifolds may have nonzero curvature. To this end, we examine the *Schwarzschild metric*, which describes spacetime outside a spherical mass such that the gravitational field outside the mass satisfies Einstein’s field equations. This most commonly is viewed in the context of a black hole and how spacetime behaves nearby. More details on the following tensor can be found in [10].

In the following, , and are standard spherical coordinates for three-dimensional space and represents time. With this setup, we can calculate the sectional curvature of spacetime for areas outside such a spherical mass.

This result indicates that the sectional curvature is directly proportional to the mass and inversely proportional to the distance from the object. In particular, there is a singularity at , indicating that curvature “blows up” near the center of the mass. Indeed, these results are in line with Flamm’s paraboloid, the graphical representation of a constant-time equatorial slice of the Schwarzchild metric, whose details can be found in [11].

In fact, the calculations we have done already allow us to compute one further object of interest for a Riemannian or pseudo-Riemannian manifold: the Ricci curvature. The Ricci curvature is a tensor that contracts the curvature tensor and is computable when one has the contravariant Riemannian curvature tensor. Below we use a built-in function for tensors to contract the first and third indices of the contravariant Riemannian curvature tensor to obtain a matrix containing condensed curvature information (see [12] for more information).

The values 1 and 3 above refer to the dimensions we are contracting. In general, the corresponding indices must vary over sets of the same size; here all dimensions have indices that vary over a set whose size is the number of coordinates. We compute the Ricci curvature for some of the previous examples.

The fact that the Ricci curvature vanishes for the above solution to the Einstein field equation is a consequence of its types of symmetries. In general, the Ricci curvature for other solutions is nonzero. Notice for the example (and the , trivially), all information from the Ricci tensor is contained in the diagonal elements. This is always the case for a diagonal metric tensor [12]. As such, we may sometimes be interested only in these values, so we take the diagonal in such a case.

The supervising author would like to thank Dr. Nicolas Robles for suggesting the submission of this article to *The Mathematica Journal*. We would also like to thank Leonard Parker, who authored the notebook file available at [6], which greatly illuminated some of the calculations. We are also very grateful to the referee and especially the editor, whose contributions have made this article much more accurate, legible and efficient.

[1] | M. do Carmo, Differential Geometry of Curves & Surfaces, Mineola, NY: Dover Publications, Inc., 2018. |

[2] | J. M. Lee, Introduction to Smooth Manifolds, Graduate Texts in Mathematics, 218, New York: Springer, 2003. |

[3] | C. Stover and E. W. Weisstein, “Metric Tensor” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/MetricTensor.html. |

[4] | “ParametricPlot3D,” ParametricPlot3D from Wolfram Language & System Documentation Center—A Wolfram Web Resource. reference.wolfram.com/language/ref/ParametricPlot3D.html. |

[5] | F. Catoni, D. Boccaletti, R. Cannata, V. Catoni, E. Nichelatti and P. Zampetti, The Mathematics of Minkowski Space-Time, Frontiers in Mathematics, Basel: Birkhäuser Verlag, 2008. |

[6] | J. B. Hartle, Gravity: An Introduction to Einstein’s General Relativity, San Francisco: Addison-Wesley, 2003. web.physics.ucsb.edu/~gravitybook/math/curvature.pdf. |

[7] | J. G. Ratcliffe, Foundations of Hyperbolic Manifolds, 2nd ed., Graduate Texts in Mathematics, 149, New York: Springer, 2006. |

[8] | J. Parker, “Notes on Complex Hyperbolic Geometry” (Jan 10, 2020). maths.dur.ac.uk/~dma0jrp/img/NCHG.pdf. |

[9] | W. M. Goldman, Complex Hyperbolic Geometry, Oxford Mathematical Monographs, Oxford Science Publications, New York: Oxford University Press, 1999. |

[10] | R. Adler, M. Bazin and M. Schiffer, Introduction to General Relativity, New York: McGraw-Hill, 1965. |

[11] | R. T. Eufrasio, N. A. Mecholsky and L. Resca, “Curved Space, Curved Time, and Curved Space-Time in Schwarzschild Geodetic Geometry,” General Relativity and Gravitation, 50(159), 2018. doi:10.1007/s10714-018-2481-2. |

[12] | L. A. Sidorov, “Ricci Tensor,” Encyclopedia of Mathematics (M. Hazewinkel, ed.), Netherlands: Springer, 1990. www.encyclopediaofmath.org/index.php/Ricci_tensor. |

E. Fairchild, F. Owen and B. Burns Healy, “Sectional Curvature in Riemannian Manifolds,” The Mathematica Journal, 2020. https://doi.org/10.3888/tmj.22-1. |

Elliott Fairchild is a high-school student at Cedarburg High School. He particularly enjoys problems in analysis, and is always looking for more research opportunities.

Francis Owen is an undergraduate student at the University of Wisconsin-Milwaukee. His major is Applied Mathematics and Computer Science, and he is eager to find new programming opportunities.

Brendan Burns Healy is a Visiting Assistant Professor at the University of Wisconsin-Milwaukee. Though a geometric group theorist and low-dimensional topologist by training, he also enjoys problems of computation and coding.

**Elliott Fairchild**

*Department of Mathematical Sciences
University of Wisconsin-Milwaukee
3200 N. Cramer St.
Milwaukee, WI 53211
*

**Francis Owen**

*Department of Mathematical Sciences
University of Wisconsin-Milwaukee
3200 N. Cramer St.
Milwaukee, WI 53211*

**Brendan Burns Healy, PhD**

Department of Mathematical Sciences

University of Wisconsin-Milwaukee

3200 N. Cramer St.

Milwaukee, WI 53211

*www.burnshealy.com*