When my son brought home a paper from school with a grid of numbers on it, I was immediately interested. The goal: cover the puzzle with all the dominoes from the “bone pile,” making sure that each number of the puzzle is covered by the same number on a domino. Many similar puzzles can be found online and in puzzle collections: see [1, 2, 3, 4, 5] for several online resources, which are the source of some of the examples considered here.

**Figure 1.** A partially solved domino grid, with almost half of the 28 dominoes placed on the underlying puzzle grid.

Our first task is to represent the board.

Next, we need the bone pile, the list of available dominoes. In this case, the bone pile consists of all 28 dominoes from the double zero to double six, but the definition is generally valid for any non-negative number , for a total of dominoes.

Find possible locations for a given piece .

This is the workhorse of the entire solution, first dividing the puzzle into pairs along each row and looking for matches to the given pair, then repeating the process on the transposed matrix (i.e. along the columns of the original grid) and noting the locations of any matches found. The location of the pair in the partition gives the location of the first half domino in the original grid, but adding the appropriate offset gives the location of the second half domino as well, and both are included as a domino location in the list of locations found.

Now for functions to highlight the dominoes within a puzzle. The function `frameDomino` generates the options to include in the `Frame` option of `Grid`.

The function `displayPuzzle` accepts a puzzle grid (a matrix) and a domino list (a list of location pairs) and displays the puzzle grid with frames around the dominoes indicated in the list.

For example, there are two possible locations for the domino in the `m9` puzzle.

A `puzzle` object takes three arguments.

- The matrix
`m`contains the puzzle to be solved, a 2D array of integers. - The filled locations list
`filled`is a list of coordinate pairs: , where either and or and . - The bone pile
`bones`, the list of unplayed dominoes, consists of a list of pairs of integers.

The `Format` command defines how to format a puzzle: the puzzle matrix has its filled list of dominoes framed, and a tooltip shows the bone pile, if any.

This section shows examples of various puzzles; mouse over a puzzle to see the bone pile. In this puzzle, no dominoes have been played yet.

Here two pieces have been removed from the bone pile and placed on the board.

To ensure that the squares filled by already placed pieces are no longer included, make a version of the board with the affected squares blanked out.

This function finds the forced locations; only one piece can possibly go into a forced location.

Find the forced locations after two particular dominoes have been played.

The forced locations are shown empty.

- Select the pieces that fit in forced locations.
- Use
`find`to return a list of all possible locations for playable pieces, and select the pieces that have only one possible location:

In this artificial case, there are two forced locations: in each, only one piece can be placed.

The function `step` finds the forced locations and fills them in with the appropriate dominoes taken from the bone pile. Mouse over to see that these dominoes have been removed from the bone pile.

At the beginning, there are no forced locations, but there are four forced pieces: pieces that can only be placed in one location in the puzzle: , , , and . The `step` function plays all four at once.

We are ready to solve the whole puzzle. The next command prints the current state, takes one step, and repeats until the bone pile is empty.

Along the way, multiple partial solutions had to be considered when no forced locations or forced pieces were found, but in the end all but one solution were dropped because of inconsistency. The comments were left in to show the forced locations or forced pieces at each step, but now we turn them off.

There is no reason not to make a prettier display function to show the dominoes with their customary pips (or dots), rather than showing only the grid numbers. We can represent the pip positions by matrices, some of which can be easily created by built-in matrix commands. Since the pip positions of double-9 and double-6 domino sets are consistent, let us build the larger set here. (A double-12 set would require adjusting the pip positions.)

The other matrices could be built by hand or using `SparseArray` or `Table` with appropriate criteria, but it is easy to create them by addition and subtraction.

A pip will be placed on a half-domino square wherever the matrix had a 1.

The function `displayDottedPuzzle` creates a graphical display of the puzzle, optionally replacing numbers by half-domino faces for any locations listed in the “filled list,” outlining any placed dominoes in a way similar to `displayPuzzle`.

The method described here can be thought of as “human-type,” since it uses intelligently chosen criteria for deciding which step to perform and which option to try next. The criteria used can be summarized as follows:

- Seek forced locations: if any locations can take none of the available dominoes, abandon the partial solution currently being constructed; if any locations can take exactly one available domino (and not the same one), fill all of these “forced locations.”
- Else seek forced dominoes: if any of the available dominoes cannot be placed on the board, abandon the partial solution currently being constructed; if any of the available dominoes can only be placed in one location on the board, play all of these “forced dominoes.”
- Else for a minimal case, place one domino in all possible locations, making separate copies of the puzzle for each case.
- Repeat until no further changes occur.

A human can make more complicated arguments eliminating some options; for examples, see the explanations at the sites [1, 2, 3, 4, 5]. (But not all suggested solving strategies turn out to be useful. One common idea, placing the “double” dominoes first, can easily be defeated by a clever puzzle designer.) The order is arbitrary and might be modified, but is far faster than the more simplistic, brute-force method presented in the following section.

Here is a list of all possible locations of all dominoes in our original puzzle.

The number of options for the pieces varies wildly.

(You can easily verify that in this puzzle, all the double dominoes have between four and eight possible placement options, making “place doubles first” a poor strategy in this case.) Taking all possible options for all the pieces gives a very large number.

Too many cases to consider! But this method would work, theoretically: Use `Outer` to get all combinations of choices of these options and then use `Select` on those that have no overlapping dominoes. Here do only the first three dominoes.

Using the first three dominoes, there are possibilities, reduced to 19 after elimination of conflicts. Placing the first 13 dominoes involves considering 653184 cases, of which only four have no conflicts.

So the following code should work, but will take an unreasonable amount of time and memory. It is beautifully simple and short, but do not run it, as it probably would not finish in our lifetimes!

“To a hammer, everything looks like a nail.” A few years ago, I worked out an exhaustive search-and-collision detection algorithm based on a idea of a generalized odometer, and since then I have seen applications for it everywhere. It works here, too.

Create a 28-digit generalized odometer, whose digit refers to which option we are trying for the domino. All digits start as 1; incrementing the odometer does not in general occur at the right end, but at the first digit (from the left) whose domino placement conflicts with that of any previous domino. A digit “rolls over” when it is incremented past its maximum value and must be reset to 1. Whenever a digit rolls over, also increment the digit to its left, just as in a real odometer. Each odometer digit has a separate maximum determined by the number of options available for that domino. When the first digit finally rolls over, all solutions have been found. We also accelerate the procedure by sorting the domino option list in increasing length.

Notice that the first four odometer digits can only be 1; each starts at 1 and has a maximum of 1.

To see or use the parts of options specified by the odometer, we use the function `MapThread`.

Here is the program that more or less immediately returns the answer(s).

As expected, there is only one odometer reading that works; that is, only one choice of domino placements solves the puzzle. The generalized odometer method works best for situations with a large number of variables taking on values that can be calculated in advance, particularly if the possible values are the same for all variables or vary in a way that can be easily specified. Here the options have to be recomputed for each new puzzle, making it less efficient than the previous method.

A “quadrilles” puzzle [5], an idea credited to French mathematician Edouard Lucas, can be divided into blocks, each containing the same number. Since the following figure does not completely fill a rectangular array, we add empty strings.

This particular quadrille has only one solution. At each step there are a large number of forced locations or pieces, and all 28 dominoes are placed in only four iterations.

Now for a puzzle with so many different ways to solve it that one feels that almost anything will work [5]!

If a puzzle is nonrectangular or has intentional gaps in it, such as the one shown below [4], simply embed it in a larger rectangle, and indicate the gaps by empty strings.

It seems likely that the online or downloadable domino puzzle generators effectively lay out the dominoes to create a grid that is guaranteed to be solvable. But even if all puzzles presented can be solved, a number of questions spring to mind:

For given grid dimensions, how many different solutions are there? (The three methods derived above solve individual puzzles, but what if the numbers are rearranged in a given grid in all possible ways?)

For given grid dimensions, what fraction of the possible puzzles has only one solution, and in general, for all , what fraction of the puzzles has solutions? What is the largest number of solutions possible?

Bear in mind that in the sense of the functions developed here, a “solution” is a merely a list of domino locations, so different puzzles of the same dimensions can have the same solution just by permuting the underlying grid numbers or rearranging them in other valid ways. In the interest of increased clarity, define a *solution schema* as a layout of dominoes face-down on a board. Now we can talk about the number of possible distinct schemas for a given puzzle grid.

What about writing a program that generates all solution schemas for a given board, ignoring the numbers? This could be done by modifying either the function `solvePuzzle` or the function `odometerSolve`, neither of which can quite do the job as written. (Yes, I did try them on a board filled with 0 entries, but they would need to be tweaked to expect a bone pile of double-zero dominoes.)

Finally, it is interesting that the first solution method worked so well, basically following how a human would decide which domino to play next. The code for the brute-force method is the simplest, but impractical without massive parallel processing. The odometer method works well, but here not as fast as the “human” method, and in any case may not be as transparent to the reader. There is more than one way to solve a puzzle! And if you spend much time thinking about a puzzle, other methods and other questions will probably occur to you.

I thank my colleagues at Southern Adventist University who have encouraged me, the folks at Wolfram Research who have occasionally helped me, and Claryce, who has put up with me in all my most puzzling moods.

[1] | E. W. Weisstein. “Domino Tiling” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/DominoTiling.html. |

[2] | Domino-Games.com. “Domino Puzzles.” (Sep 4, 2014) www.domino-games.com/domino-puzzles.html. |

[3] | “Dominosa.” (Sep 4, 2014) www.puzzle-dominosa.com. |

[4] | Yoogi Games. “Domino Puzzle Puzzles.” (Sep 4, 2014) syndicate.yoogi.com/domino. |

[5] | J. Köller. “Domino Puzzles.” (Sep 4, 2014) www.mathematische-basteleien.de/dominos.htm. |

K. E. Caviness, “Three Ways to Solve Domino Grids,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-10. |

Ken Caviness teaches at Southern Adventist University, a liberal arts university near Chattanooga. Since obtaining a Ph.D. in physics (relativity and nuclear physics) from the University of Massachusetts Lowell, he has taught math and physics in Rwanda, Texas, and Tennessee. His interests include both computer and human languages (including Esperanto), and he has used *Mathematica* since Version 1, both professionally and recreationally.

**Kenneth E. Caviness**

*Department of Physics & Engineering
Southern Adventist University
PO Box 370
Collegedale, TN 37315-0370*

Evaluating molecular integrals has been an active field since the middle of the last decade. Efficient algorithms have been developed and implemented in various programs. Detailed accounts of molecular integrals can be found in the references of [1]. In this article, the third in a series describing algorithms for evaluating molecular integrals, we detail the evaluation of the nuclear-electron attraction energy integrals from a more didactic point of view, following the approach of Rys, Dupuis, and King [2] as implemented in the OpenMol program [3].

The energy caused by the attraction between an electron in the region described by the overlap of the orbitals , and a nuclear of charge located at is expressed by the nuclear-electron attraction integral

(1) |

in which is an unnormalized Cartesian Gaussian primitive.

Using the Gaussian product (see, for example [1]) and defining the angular part as :

(2) |

The pole problem can be solved by the Laplace transform

(3) |

which turns the integral into

(4) |

where for now, we have ignored the factor . In the following steps, we will make certain modifications, knowing in advance that they will help simplify the expressions later on. We first reduce the upper limit of to unity by making the changes of variable (recall from the Gaussian product that ):

(5) |

and

(6) |

Replace , , and in , to get

(7) |

We now multiply by the factor :

(8) |

Again, by inserting , we get

(9) |

Having arrived at the desired form, we reinsert the value of the angular part into the expression and separate the term enclosed by the curly brackets into three components , , and :

(10) |

Defining as the function of the component inside the bracket,

(11) |

and similarly for and , we rewrite the integral as

(12) |

We will show that the integrand in the expression for is in fact an overlap between two one-dimensional Gaussians, and we may use the results that have been developed in [1]. First, we expand the exponential parts of the integrand

(13) |

regrouping in terms of and , we have

(14) |

which becomes

(15) |

where and . These definitions let us compare this equation with the result of the Appendix, in which we see that equation (15) is simply

(16) |

where

(17) |

Substituting and ,

(18) |

into equation (13), we have

(19) |

Substitute this result into the definition of to get

(20) |

The integral has the same form as a one-dimensional overlap integral where the integrand is a Gaussian function centered at with an exponential coefficient .

From the observation above, we make use of the results developed for overlap integrals in [1]. For example, for ,

(21) |

In particular, we have the transfer equations

(22) |

The and functions take similar forms. The product is a polynomial in , and if we replace , then the integral in equation (12) is

(23) |

where is the said polynomial. The integral is a combination of the Boys function (see, for example, Reference 4 of [1])

(24) |

a strictly positive and decreasing function.

Aside from the obvious choice of using *Mathematica* to evaluate the Boys function, there are several ways of evaluating the integral. In practice, most programs store pretabulated values of the function at different intervals and interpolation is done as needed (e.g. by Chebyshev polynomials). Here we use the Gauss-Chebyshev quadrature numerical integration [4]. For simplicity, we have adopted almost verbatim the F77 code in [4, p. 46].

The function `Nea` evaluates the nuclear-electron attraction integral of two Gaussian primitives; here `alpha`, `beta`, `RA`, `RB`, `LA`, and `LB` are , , , , , and as defined earlier; `RR` is the nuclear position.

As in our two earlier articles [1, 5], we use the same data for the water molecule (, , the geometry optimized at the HF/STO-3G level). The molecule lies in the - plane with Cartesian coordinates in atomic units.

In the STO-3G basis set, each atomic orbital is approximated by a sum of three Gaussians; here are their unnormalized primitive contraction coefficients and orbital exponents.

Here are the basis function origins and Cartesian angular values of the orbitals, listed in the order , , , , , , and .

Specifically, for the nuclear-electron attraction energy integral between the first primitive of the orbital of hydrogen atom 1, , the first primitive of the orbital of the oxygen atom, , and atom 1 () is

(25) |

We have

From the Gauss-Chebyshev quadrature, the integral in equation (23) yields . The nuclear-electron integral (25) is . This is calculated as follows.

We would first need the normalization factor before evaluating the nuclear-electron energy matrix.

We have provided a didactic introduction to the evaluation of nuclear-electron attraction-energy integrals involving Gaussian-type basis functions by use of recurrence relations and a numerical quadrature scheme. The results are sufficiently general so that no modification of the algorithm is needed when larger basis sets with more Gaussian primitives or primitives with larger angular momenta are employed.

Consider the Gaussian product: . Combine and expand the coefficients to get

Let and substitute in the exponent to get

The first three terms inside the second bracket factor to , and the last two can be reduced to . The original Gaussian product is thus

Here is a verification.

[1] | M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals I,” The Mathematica Journal, 14(3), 2012. doi:10.3888/tmj.14-3. |

[2] | J. Rys, M. Dupuis, and H. F. King, “Computation of Electron Repulsion Integrals Using the Rys Quadrature Method,” Journal of Computational Chemistry, 4(2), 1983 pp. 154-157. doi:10.1002/jcc.540040206. |

[3] | G. H. F. Diercksen and G. G. Hall, “Intelligent Software: The OpenMol Program,” Computers in Physics, 8(2), 1994 pp. 215-222. doi:10.1063/1.168520. |

[4] | J. Pérez-Jorda and E. San-Fabián, “A Simple, Efficient and More Reliable Scheme for Automatic Numerical Integration,” Computer Physics Communications, 77(1), 1993 pp. 46-56. doi:10.1016/0010-4655(93)90035-B. |

[5] | M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals II,” The Mathematica Journal, 15(1), 2013. doi:10.3888/tmj.15-1. |

M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-9. |

Minhhuy Hô received his Ph.D. in theoretical chemistry at Queen’s University, Kingston, Ontario, Canada, in 1998. He is currently a professor at the Centro de Investigaciones Químicas at the Universidad Autónoma del Estado de Morelos in Cuernavaca, Morelos, México.

Julio-Manuel Hernández-Pérez obtained his Ph.D. at the Universidad Autónoma del Estado de Morelos in 2008. He has been a professor of chemistry at the Facultad de Ciencias Químicas at the Benemérita Universidad Autónoma de Puebla since 2010.

**Minhhuy Hô**

*Universidad Autónoma del Estado de Morelos
Centro de Investigaciones Químicas
Ave. Universidad, No. 1001, Col. Chamilpa
Cuernavaca, Morelos, Mexico CP 92010
*

**Julio-Manuel Hernández-Pérez
**

Facultad de Ciencias Químicas

Ciudad Universitaria, Col. San Manuel

Puebla, Puebla, Mexico CP 72570

A cellular automaton (CA) is a dynamical system with arbitrarily complex global behavior, despite being governed by very simple local rules [1]. In order to better understand how that kind of complex behavior emerges, many explorations have been made in the context of the power implicit in CA rules. For instance, classical benchmark problems have been used for this, including the density classification task [2, 3] and the parity problem [4]. The density classification task tries to discover the most frequent bit in the initial configuration of the lattice; the parity problem tries to find the parity of the number of 1s in the initial configuration of the lattice. One of the approaches in these contexts is to evaluate every possible CA of a given family in terms of its capabilities to solve the target problem. This approach is possible in small CA families, like the elementary space (composed of 256 CAs), but is not feasible in larger families, like the one-dimensional binary CA family with radius 3, composed of rules.

As a strategy to search for CAs in large rule families, evolutionary computation has been extensively used, relying on measures of properties of the candidate rules, such as their degree of internal symmetry, so as to discard or keep candidates according to these property values. This was a key aspect, for instance, that led to finding WdO, currently the best one-dimensional radius-3 rule for the density classification task [5].

An alternative is to constrain the search space to only the CAs that are known to present specific properties. The challenge here is how to constrain the space without the need to enumerate the entire subspace of interest. Here, we introduce the concept of a CA template as a possible way to achieve this goal. A CA template is a data structure associated with the rule tables of the members of a CA family that relies on the use of variables. The introduction of these variables makes it possible for a CA template to represent a set of rules, unlike the standard -ary rule table representation that can only represent one individual CA. By making use of *Mathematica*’s built-in equation-solving capabilities and algorithms that allow finding equality relations among CAs with a given property, we are able to create templates that represent number-conserving CAs (those that, in a sense, preserve the number of states of the initial configuration; more details below), as well as those with maximal internal symmetry (those displaying invariance under some transformations in their rule tables; also to be explained below). These two cases are given here as examples of the applicability of the template idea, but other properties can also be accounted for.

In the following section, basic notions about CAs are given, followed by a section that presents details about important properties related to the density classification task. Section 4 explains the notion of *template* and presents the implemented algorithms. Section 5 concludes the text, with a discussion on the advantages and limitations of using templates, and gives some ideas for future work.

Cellular automata constitute a class of decentralized dynamical systems, usually discrete in space, time, and states [1]. As systems governed by relatively simple rules, CAs represent a meaningful model for tackling the issue of how interaction among simple components can lead to the solution of global problems.

CAs are composed of a regular lattice of cells whose states change through time, according to a local rule. The lattice can be deployed in any number of dimensions (most commonly one, two, or three) and may have an infinite or fixed number of cells. Cells’ states are commonly represented by numbers or colors out of possibilities ranging from 0 to . The local rule of the CA acts on the neighborhood of every cell, which is the set of neighboring cells meant to influence its subsequent states. The neighborhood is usually expressed by its radius (or range) , meaning the range of cells on each side affecting the one in question. By defining values for these two parameters, a CA rule space or family is defined. The values of and in the one-dimensional case (i.e. a neighborhood has three cells; a cell has two possible states) give rise to the elementary rule space, which is the most well-studied family, due to its small size of only 256 rules but extremely rich phenomenology [1].

For present purposes, whenever we refer to cellular automata, we mean one-dimensional, binary () CAs, with a fixed number of cells in the lattice and periodic boundary conditions (i.e. the lattice is closed at its ends, like a ring).

Every CA is governed by a rule that relates the neighborhood of a cell to the state it takes on at the next time step. Its most common representation is the rule table, which is an explicit listing of every possible state configuration of the neighborhoods, lexicographically ordered, and a corresponding cell state for each. Here we use Wolfram’s lexicographical ordering, where the leftmost neighborhood is formed by the neighborhood configuration where all cells are in the () state, all the way down to the rightmost neighborhood with all cells in the 0 state.

As an illustration, this is the rule table of the elementary CA for rule 184.

This is the ordered set of output cell states from that rule table, the -ary form.

By converting the binary sequence that defines the -ary form into a decimal representation, one obtains the CA rule number, which serves as a unique identifier of a CA in a given rule space [1].

In order to handle operations concerning rule tables, various *Mathematica* functions are defined. So, given a rule table in its -ary form, the function `RuleTableFromkAry` transforms it to its classical representation.

The function `kAryFromRuleTable` reverses the process.

Given a CA’s rule number, `RuleTableFromRuleNumber` determines its rule table.

The inverse function `RuleNumberFromRuleTable` yields the rule number from the rule table.

`WellFormedRuleTableQ` is a predicate that checks whether a rule table in -ary form is valid according to its values of and .

`RuleOutputFromNeighbourhood` is a utility function to get the output corresponding to a particular neighborhood in a rule table.

Finally, `AllNeighbourhoods` is a utility function giving all possible neighborhoods of a certain rule space.

All these functions are handy to perform rule table manipulation and are used throughout this article.

In the one-dimensional case, it is possible to visualize the system’s evolution using a space-time diagram, in which time goes from top to bottom, and cell states are represented by colors. For binary CAs, white cells are in the 0 state and black cells in the 1 state. In order to obtain and plot the space-time diagram resulting from a rule execution on a given lattice, one can use *Mathematica*’s built-in functions `CellularAutomaton` and `ArrayPlot`.

In order to better understand the computational power implicit in a CA rule, benchmark problems have been defined for it to tackle; among them, the most common is the density classification task (DCT). In the classical definition of DCT, a one-dimensional binary CA has to lead an arbitrary initial odd-sized configuration into a fixed-point state of all blacks, if the initial condition has a larger number of black cells, or into a fixed-point state of all whites otherwise.

It has been proved that in order to solve the DCT perfectly, a CA would need to be number conserving, that is, it should not change the number of cells in each state from any given initial condition [6]. This fact stands as a contradiction against the classical definition of the DCT, since in order for it to evolve to an all-black or all-white configuration, it would obviously need to change the number of cells in each state throughout time. This means that DCT is unsolvable when formulated according to its classical definition [2, 3].

Currently, the best imperfect DCT solver (known as Wd0) was found in [5], by means of a sophisticated evolutionary algorithm that used, among other important properties, the internal symmetry of a rule in its fitness function. In tune with the fact that a perfect DCT solver would need to be number conserving, Wd0 and other good DCT solvers are known to have a very small Hamming distance from number-conserving rules of the same rule space [7].

All in all, number conservation and internal symmetry are two important properties when determining the ability of a CA to solve the DCT, and serve as good examples for the notion of CA templates. Both are described in detail in the following subsections. But notice, upfront, that these two properties are amenable to being addressed in templates, since they derive from well-established relations among state transitions.

Number conservation is a property presented by some CAs, in which the sum of the states of the individual cells in any initial configuration does not change during the space-time evolution; in particular, for binary CAs, this means that the number of 1s always remains the same. This kind of CA is useful, for instance, to model systems like car traffic, in which a car cannot appear or disappear as time goes by [7]. Elementary CA 184 is an example of a number-conserving CA.

In order for a one-dimensional CA rule to be number conserving, it is established in [8] that the local rule with neighborhood size must respect the following necessary and sufficient conditions for every state transition:

where corresponds to a sequence of 0s of length .

A simplification of the original algorithm from [8] is provided in [9]. Basically, it was shown that for any given rule, it suffices to analyze the state transitions associated with the neighborhood made up of only 0s and the neighborhoods not starting with 0. This is a total, therefore, of neighborhoods instead of , as stated in [8]. This is the condition we employ to obtain templates that represent number-conserving CAs, as will be shown below.

Apart from number conservation, a rule’s internal symmetry also plays an important role in solving the DCT. In order to fully understand how this property works, an explanation about rule transformations and dynamically equivalent rules is required; the presentation is restricted to binary rules, even though this notion extends to the arbitrary -ary case.

Given the rule table of a CA, one can apply three types of transformations on it that will result in dynamically equivalent rules. For the binary case, `BlackWhiteTransform` is obtained by switching the state of all cells in a rule table. The second type of transformation, `LeftRightTransform`, is obtained by reversing the bits of the neighborhoods in a rule table and reordering the set of state transitions. The composition in either order of the latter two transformations (they commute) yields the third type, `LeftRightBlackWhiteTransform` or `BlackWhiteLeftRightTransform`.

Here is how they work on rule 110.

This checks the first one, `BlackWhiteTransform`.

With these transformations, it becomes straightforward to see which CAs in a given space have equivalent dynamical behavior. For instance, by applying the three transformations on a given CA, say elementary rule 110, elementary rules are obtained. These four rules are said to be in the same dynamical equivalence class. It is easy to see why, by looking at their space-time diagrams.

By comparing the rule table of a CA with the one that resulted from its equivalent rule obtained out of a given transform, it is possible to count the number of state transitions they share. In a sense, this provides a measure of the amount of *internal symmetry* of a CA with respect to that transformation, whichever it is. For instance, elementary CA 110 has an internal symmetry value of 2 with respect to the black-white transformation, since it shares two state transitions with its black-white symmetrical rule, which is elementary rule 137.

Repeating this process with rule 150, on the other hand, yields a different result. Rule 150 has an internal symmetry value of 8 according to the black-white transformation. This is the maximum possible value of this measure with elementary CAs. This is quite predictable, as the black-white transformation of rule 150 is rule 150 itself. In fact, any of the three transformations applied to rule 150 yields rule 150 itself, indicating it has the maximum internal symmetry value according to any of the three transformations.

The degree of internal symmetry of a rule can be a relevant measure in any context where a property is shared among all members of a class of dynamical equivalence. In [5] and [7], for instance, rules with maximal internal symmetry with the composite transformation were key for their findings related to DCT.

A CA template is an enhancement over the rule table representation, obtained by allowing it to have variables in the place of simple cell states as its results. As a consequence, a CA template has the power to represent whole subsets of CA rule spaces, instead of only a single rule.

As a simple example, consider the template . It represents the subset of the elementary CAs with fixed bits at positions 1, 3, 5, 6, and 8 in the list, free variables at positions 2 and 4, and complement bits at positions 2 and 7.

Using *Mathematica*’s built-in transformation rules, one can obtain the four CAs represented by this template, as well as their corresponding rule numbers.

The function `RuleTemplateVars` lists the variables in a template.

Extracting the variables from a template and applying a value to each, the template is transformed into one of its represented rule tables. Every template has a number of possible substitutions equal to ; however, as will be seen later, some of those may not be valid.

The function `ExpandTemplate` performs this operation by applying values to each variable of a given template. It may receive as an optional argument an integer called `ithSubstitution` in the range `0` to , representing which substitution should be made. If omitted, it performs all the possible substitutions for a given template.

After the expansion, one can obtain the list of valid rules represented by the template by using the function `RuleNumbersFromTemplate`.

With *Mathematica*’s built-in symbolic computation features, it is easy to create templates that represent a whole space. The space of elementary CAs would be represented by the following template.

In [4], the authors analytically found which transitions needed to be fixed, variable, or dependent on other transitions in a CA rule table, in order to have a chance to solve the parity problem perfectly. By fixing those transitions, they restrained the rule space of one-dimensional, binary, radius-2 CAs, composed of 4,294,967,296 rules, to only 16 candidates for perfect parity solvers. Although they used the de Bruijn graph as the primary structure to represent this rule space subset, it could have been easily represented with CA templates.

Empowered by *Mathematica*’s built-in equation-solving capabilities, algorithms can be developed that find the fixed, variable, and dependent state transitions on a rule table, thus leading to templates that are representatives of CAs that share the properties of number conservation and maximal internal symmetry; these are shown below.

In [8], Boccara and Fukś established necessary and sufficient conditions that a CA rule table must meet in order to be conservative (which is another way to say number conserving). These conditions can be translated into an algorithm `BFConservationTemplate` that finds a set of equations that, when solved by *Mathematica*, yields the equivalent of a template that represents all conservative CAs of a determined space.

By running this function for the elementary space, the following template is obtained.

When expanded, the latter yields the following representations.

However, it is clear that not all -ary representations above are valid, since some of them rely on state values outside the range , namely, the states 2 and . Hence, by discarding those three, we get the complete set of five number-conserving rules of the elementary space.

It is important to notice that this kind of strategy can only be employed on properties that derive directly from the CA rule table.

As the internal symmetry of a CA is also a property that derives directly from its rule table, it is a valid candidate to be generalized into a template. By listing a CA rule table along with its respective transformations, it is possible to establish equality relations between them that, when solved by *Mathematica*, yield a template that represents all CAs that have the maximal possible value of internal symmetry, according to any subset of the three transformations.

By establishing that all of the results of the rule tables have to be the same in both the CA and its transformed counterpart, the following function `MaxSymmTemplate` achieves the goal of finding a template that represents all CAs of a given space that present the *maximum* value of internal symmetry, according to a list of transformations received as arguments.

In order to find a template that represents all elementary CAs with maximum symmetry according to the black-white transformation, it suffices to run `MaxSymmTemplate`, then expand the template to generate the rule numbers.

The verification of this result can be achieved by guaranteeing that all these rule numbers yield the same rule tables when transformed.

We can analogously obtain a template representing all CAs with maximum symmetry according to all transformations, from which their expansions also lead to the corresponding rule numbers.

And again, their validity can be checked.

Both the `BFConservationTemplate` and the `MaxSymmTemplate` functions can take another template as an optional argument, which is meant to be used as the starting point of the algorithms. This is the current way to compose the intersection of templates that share a common structure. For instance, in order to generate all the elementary conservative CAs with maximum internal symmetry values according to the black-white transformation, it becomes straightforward to use the template for number-conserving rules of the elementary space as the starting point of `MaxSymmTemplate`. This leads to a template that, once again, can be expanded so as to yield the target rule numbers.

Alternatively, the template with maximal internal symmetry could be used as the starting point of the `BFConservationTemplate` algorithm to obtain the same result.

The concept of CA templates was introduced, a rule table enhancement capable of representing a subset of a CA rule space, where the rules in the set can share a common property. Although the examples used for illustration only referred to one-dimensional, binary rules (the elementary space), the idea seems readily applicable to larger CAs with a larger number of states and more dimensions.

We have shown some of the operations applicable to CA templates, as well as some cases of use, in the form of *Mathematica* functions that yield templates representing subsets of the elementary space of CAs with properties related to number conservation and maximum internal symmetry. With respect to the latter, templates can be derived for any subset of the three symmetry-related transformations.

Templates for the rules in the same dynamical class in the elementary space have appeared previously in the CA literature, such as in [10]. But in these cases, the notion was not at all couched in the conceptual framework we have put forward, which allows templates to be effectively defined for rules having maximal internal symmetry value, let alone the possibility of representing further CA properties.

The properties used as examples here can be couched in terms of well-established relations among the state transitions of the CA, which are a necessary condition for a property to be addressed in the form of templates. As a counterpoint, the notion of reversibility of one-dimensional rules does not seem to be, at least in principle, amenable to template representation, since it is currently not known how to characterize reversibility in terms of the rule table of a CA.

It stands as future work to find new algorithms that would allow template representations of other properties, as well as the enhancement of the current algorithm related to internal symmetry templates, so as to extend the current constraint of only generating maximal internal symmetry toward also allowing the generation of templates with specific values of internal symmetry, not necessarily maximal.

Currently, because of computational demands, template expansion does not scale up well to very big templates; this should also be addressed in a follow-up. In particular, it might be worth defining operations of union and intersection of templates, which might be used to preprocess a template before the operation of template expansion.

Pedro de Oliveira thanks FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo). Maurício Verardo is thankful for a fellowship provided by CAPES, the Brazilian agency of its Ministry of Education.

[1] | S. Wolfram, A New Kind of Science, Champaign, IL: Wolfram Media Inc., 2002. |

[2] | P. P. B. de Oliveira, “Conceptual Connections around Density Determination in Cellular Automata,” Cellular Automata and Discrete Complex Systems (Lecture Notes in Computer Science), 8155, 2013 pp. 1-14. doi:10.1007/978-3-642-40867-0_ 1. |

[3] | P. P. B. de Oliveira, “On Density Determination with Cellular Automata: Results, Constructions and Directions,” Journal of Cellular Automata, forthcoming. |

[4] | H. Betel, P. P. B. de Oliveira, and P. Flocchini, “Solving the Parity Problem in One-Dimensional Cellular Automata,” Natural Computing, 12(3), 2013 pp. 323-337. doi:10.1007/s11047-013-9374-9. |

[5] | D. Wolz and P. P. B. de Oliveira, “Very Effective Evolutionary Techniques for Searching Cellular Automata Rule Spaces,” Journal of Cellular Automata, 3(4), 2008 pp. 289-312. |

[6] | H. Fukś, “A Class of Cellular Automata Equivalent to Deterministic Particle Systems,” in Hydrodynamic Limits and Related Topics, (S. Feng, A. T. Lawniczak, and S. R. S. Varadhan, eds.), Providence, RI: American Mathematical Society, 2000 pp. 57-69. |

[7] | J. Kari and B. Le Gloannec, “Modified Traffic Cellular Automaton for the Density Classification Task,” Fundamenta Informaticae, 116 (1-4), 2012 pp. 141-156. doi:10.3233/FI-2012-675. |

[8] | N. Boccara and H. Fukś, “Number-Conserving Cellular Automaton Rules,” Fundamenta Informaticae, 52(1-3), 2002 pp. 1-13. |

[9] | A. Schranko and P. P. B. de Oliveira, “Towards the Definition of Conservation Degree for One-Dimensional Cellular Automata Rules,” Journal of Cellular Automata, 5(4-5), 2010 pp. 383-401. |

[10] | W. Li and N. Packard, “The Structure of the Elementary Cellular Automata Rule Space,” Complex Systems, 4(3), 1990 pp. 281-297. www.complex-systems.com/pdf/04-3-3.pdf. |

P. P. B. de Oliveira and M. Verardo, “Representing Families of Cellular Automata Rules,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-8. |

Pedro de Oliveira has been a faculty member since 2001 of the School of Computing and Informatics and of the Postgraduate Program in Electrical Engineering at Mackenzie Presbyterian University, São Paulo, Brazil. His research interests are cellular automata, evolutionary computation, and cellular multi-agent systems. Pedro is an alumnus of the 2003 NKS Summer School.

Maurício Verardo is a post-graduate student in Electrical Engineering at Mackenzie Presbyterian University, working with cellular automata ever since his undergraduate senior project for his computer science degree from Mackenzie. Maurício is an alumnus of the 2011 NKS Summer School.

**Pedro P. B. de Oliveira**

*Faculdade de Computação e Informática & Pós-Graduação em Engenharia Elétrica
Universidade Presbiteriana Mackenzie
Rua da Consolação, 930
São Paulo, 01302-907 – Brazil
*

**Maurício Verardo**

*Pós-Graduação em Engenharia Elétrica*

Universidade Presbiteriana Mackenzie

Rua da Consolação, 930

São Paulo, 01302-907 – Brazil

*mauricio.verardo@gmail.com*

Among its many interpretations, the term reliability most commonly refers to the ability of a device or system to perform a task successfully when required. More formally, it is described as the probability of functioning properly at a given time and under specified operating conditions [1]. Mathematically, the reliability function is defined by

where is a nonnegative random variable representing the device or system lifetime.

For a system composed of at least two components, the system reliability is determined by the reliability of the individual components and the relationships among them. These relationships can be depicted using a reliability block diagram (RBD).

Simple systems are usually represented by RBDs with components in either a series or parallel configuration. In a series system, all components must function satisfactorily in order for the system to operate. For a parallel system to operate, at least one component must function correctly. Systems can also contain components arranged in both series and parallel configurations. If an RBD cannot be reduced to a series, parallel, or series-parallel configuration, then it is considered a complex system.

This article deals with the generation of an exact analytical expression for the reliability of a complex system. The demonstrated method relies on finding all paths between the source and target vertices in a directed acyclic graph (i.e., RBD), as well as the inclusion-exclusion principle for probability.

**A Note on Timings**

The timings reported in this article were measured on a custom workstation PC using the built-in function `Timing`. The system consists of an Intel® Core i7 CPU 950 @ 4 GHz and 24 GB of DDR3 memory. It runs Microsoft® Windows 7 Professional (64-bit) and scores 1.32 on the *MathematicaMark9* benchmark.

We begin by considering a directed graph that consists of a finite set of vertices together with a finite set of ordered pairs of vertices called directed edges. The built-in function `Graph` can be used to construct a graph from explicit lists of vertices and edges.

This two-dimensional grid graph, labeled , can be constructed much more efficiently by using the built-in function `GridGraph`. Throughout this section, we utilize it to illustrate our functions.

Now, for a vertex , we define the set of out-neighbors as

where is taken to mean a directed edge from to . This is implemented in the function `VertexOutNeighbors`.

`VertexOutNeighbors` behaves similarly to the built-in function `VertexOutDegree`. That is, given a graph and a vertex , the function returns a list of out-neighbors for the specified vertex.

If, however, only the graph is specified, the function will give a list of vertex out-neighbors for all vertices in the graph.

The order in which the out-neighbors are displayed is determined by the order of vertices returned by `VertexList`.

We can implement similar functions to obtain the set of in-neighbors by simply changing to .

The next step toward our goal is to consider a method of traversing a graph. One common approach of systematically visiting all vertices of a graph is known as depth-first search (DFS). In its most basic form, a DFS algorithm involves visiting a vertex, marking it as “visited,” and then recursively visiting all of its neighbors [2]. The function `DepthFirstSearch` implements this algorithm for directed graphs.

Given a graph and a starting vertex , `DepthFirstSearch` returns a list of vertices in the order in which they are visited.

We compare this with the result of the built-in function `DepthFirstScan`.

Next, let us define the function `DirectedAcyclicGraphQ`.

If the graph is both directed and acyclic, `DirectedAcyclicGraphQ` yields `True`. Otherwise, it yields `False`.

Finally, we consider the problem of finding all paths in a directed acyclic graph between two arbitrary vertices . Typically, we refer to as the source and as the target. A path in is defined as a sequence of vertices such that for . Since we have constrained ourselves to a directed acyclic graph, all paths are simple. That is to say, all vertices in a path are distinct.

By modifying the depth-first search algorithm, we arrive at a solution.

Like the original DFS algorithm, we visit a vertex and then recursively visit all of its neighbors. However, instead of checking if a vertex has been marked “visited,” we compare the current vertex to the target. If they do not match, we continue to traverse the graph. Otherwise, the target has been reached and we store the path for later output.

For a given directed acyclic graph , a source vertex , and a target vertex , `FindPaths` returns a list of all paths connecting to .

In this particular instance, the function takes approximately 0.85 milliseconds to return the result.

`FindPaths` works for any pair of vertices.

If no path is found, the function returns an empty list.

Up to this point, we have been working with graphs in an abstract, mathematical sense. We now make the transition from directed acyclic graph to reliability block diagram by associating vertices with components in a system and edges with relationships among them.

Consider a single component in an RBD. Let us imagine a “flow” moving from a source, through the component, to a target. The component is deemed to be functioning if the flow can pass through it unimpeded. However, if the component has failed, the flow is prevented from reaching the target.

The “flow” concept can be extended to an entire system. A system is considered to be functioning if there exists a set of functioning components that permits the flow to move from source to target. We define a path in an RBD as a set of functioning components that guarantees a functioning system. Since we have chosen to use a directed acyclic graph to represent a system’s RBD, all paths are minimal. That is to say, all components in a path are distinct.

Once the minimal paths of a system’s RBD have been obtained, the principle of inclusion-exclusion for probability can be employed to generate an exact analytical expression for reliability. Let be the set of all minimal paths of a system. At least one minimal path must function in order for the system to function. We can write the reliability of the system as the probability of the union of all minimal paths:

This is implemented in the function `SystemReliability`.

Given a system’s RBD (represented by a directed acyclic graph ), a source vertex , and a target vertex , `SystemReliability` returns an exact analytical expression for the reliability.

Consider the RBD of a simple system with four components in a series configuration.

The reliability of the system is given in terms of the reliability of its four components.

Consider the RBD of a simple system with four components in a parallel configuration.

The “start” and “end” components are not part of the actual system. They are added to ensure the RBD meets the criteria for a directed acyclic graph.

Furthermore, these nonphysical components are taken to have perfect reliability, that is, . Since they have no effect on the system’s reliability, they can be safely removed from the resulting analytical expression. To do so, we simply define a list of replacement rules and apply it to the result of `SystemReliability`.

The reliability of the system is given in terms of the reliability of its four components.

Next, we examine the RBDs of two simple systems with components in a series-parallel configuration.

Component is in series with component , and both components are in parallel with component .

As in previous examples, we use `SystemReliability` to obtain an exact analytical expression for the reliability.

Finally, we examine the RBDs of two complex systems.

The reliability of the system is given in terms of the reliability of its six components.

The result is returned after approximately 0.59 milliseconds.

The reliability of the system is given in terms of the reliability of its fourteen components.

The result is returned after approximately 0.33 seconds.

We now turn our attention to the derivation of a time-dependent expression for the reliability of a complex system based on information contained within its reliability block diagram.

Let us imagine that we have a generic system composed of six subsystems and we know the reliability relationships among them. In addition, the underlying statistical distributions and parameters used to model the subsystems’ reliabilities are known.

We begin by creating the system’s RBD.

In defining the RBD, we have made use of the `Property` function to store information associated with each subsystem. For instance, the custom property `"Distribution"` is used to store a parametric statistical distribution. Labels, images, and other properties can also be specified.

Next, we use `SystemReliability` to generate an exact analytical expression for the reliability.

Now, the reliability function of the subsystem is given by

where is the corresponding cumulative distribution function (CDF). For each subsystem, we use `PropertyValue` to extract the symbolic distribution stored in the RBD, and then use the built-in function `CDF` to construct its reliability function.

We extract additional information, for example, subsystem labels, from the RBD and combine it with the reliability functions to create plots for comparison.

In order to transform our static analytical expression into a time-dependent function, we first define a list of replacement rules.

Next, we apply the list of rules to the expression for system reliability.

The result is a time-dependent reliability function for the complex system described by the RBD.

Finally, we generate a plot of the system’s reliability over time.

We have demonstrated a method of generating an exact analytical expression for the reliability of a complex system using a directed acyclic graph to represent the system’s reliability block diagram. In addition, we have shown how to convert an analytical expression for system reliability into a time-dependent function based on statistical information stored in an RBD. While our focus has been on the analysis of complex systems, we have also shown that the combination of path finding and the inclusion-exclusion principle is equally applicable to simple systems in series, parallel, or series-parallel configurations.

Knowing the static analytical expression or time-dependent solution of a system allows us to perform a more advanced reliability analysis. For instance, we can easily calculate the Birnbaum importance

of the component using the result of `SystemReliability`. Similarly, we can derive the hazard function, or failure rate, from the system’s time-dependent reliability function.

There are several ways in which the functionality demonstrated in this article can be improved and expanded:

- Increase the efficiency of
`SystemReliability`by implementing improvements to the classical inclusion-exclusion principle [3]. - Add functions related to common tasks in reliability analysis, for example, reliability importance, failure rate, and so on.
- Add support for -out-of- structures, that is, redundancy.
- Add the ability to export and import complete RBDs.
- Add a mechanism, for example, a graphical user interface (GUI), to facilitate the construction and modification of RBDs.

Finally, the code can be combined into a user-friendly package with full documentation.

[1] | W. Kuo and M. Zuo, Optimal Reliability Modeling: Principles and Applications, Hoboken, NJ: John Wiley & Sons, 2003. |

[2] | S. Skiena, The Algorithm Design Manual, 2nd ed., London, UK: Springer-Verlag, 2008. |

[3] | K. Dohmen, “Improved Inclusion-Exclusion Identities and Inequalities Based on a Particular Class of Abstract Tubes,” Electronic Journal of Probability, 4, 1999 pp. 1-12. doi:10.1214/EJP.v4-42. |

T. Silvestri, “Complex System Reliability,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-7. |

Todd Silvestri received his undergraduate degrees in physics and mathematics from the University of Chicago in 2001. As a graduate student, he worked briefly at the Thomas Jefferson National Accelerator Facility (TJNAF) where he helped to construct and test a neutron detector used in experiments to measure the neutron electric form factor at high momentum transfer. From 2006 to 2011, he worked as a physicist at the US Army Armament Research, Development and Engineering Center (ARDEC). During his time there, he cofounded and served as principal investigator of a small laboratory focused on improving the reliability of military systems. He is currently working on several personal projects.

**Todd Silvestri**

*New Jersey, United States*

*todd.silvestri@optimum.net*

The aim of canonical correlation analysis is to find the best linear combination between two multivariate datasets that maximizes the correlation coefficient between them. This is particularly useful to determine the relationship between *criterion measures* and the set of their *explanatory factors*. This technique involves, first, the reduction of the dimensions of the two multivariate datasets by projection, and second, the calculation of the relationship (measured by the correlation coefficient) between the two projections of the datasets.

While the correlation coefficient measures the relationship between two simple variables, canonical correlation analysis measures the relationship between two *sets* of variables. Although the correlation measure employed for both techniques is the same, namely

(1) |

the distinction between the two techniques must be clear: while for the correlation coefficient and must be -dimensional vectors containing realizations of the random variables, for canonical correlation analysis (CCA) has to be an and an matrix, with and at least 2. In the latter case, is the number of realizations for all random variables, where is the number of random variables contained in the set and is the number of random variables in the set .

This article calculates, through CCA, the relationship between stock markets of developed and developing countries and performs Bartlett’s test for the statistical significance of the canonical correlation found.

For an introduction to statistics in financial markets, see [1].

The data employed for the CCA in the present work was obtained directly from *Mathematica*’s function. The variables are divided into two groups: the ETFs representing developed nations and the ETFs representing developing countries. The first group is treated as independent variables and the second group as dependent variables. The idea here is to analyze the relationship between stock markets in these two groups of countries through ETFs traded at the New York Stock Exchange (NYSE).

Although there are several country-specific ETFs traded on the NYSE, not all of them were chosen. The idea is to select, for each group, those ETFs representing countries with large stock markets according to a market capitalization criterion. The market capitalization of all stock markets was obtained from the website of the World Federation of Exchanges (www.world-exchanges.org/statistics). All countries with stock markets greater than 500 billion US dollars in December 2012 were chosen, and only one ETF per country was selected.

These six ETFs were included in the group of developed nations: EWA (Australia), EWC (Canada), EWG (Germany), EWJ (Japan), EWU (UK), and SPY (USA).

Eight ETFs were included in the group of developing countries: EWZ (Brazil), FXI (China), EPI (India), EWW (Mexico), RSX (Russia), EWS (Singapore), EWY (South Korea), and EWT (Taiwan).

These are the monthly returns for the five-year period between March 2008 and February 2013 (60 months).

This checks the number of observations for each variable. Evaluate the previous command again if the lengths are not all 60.

This plots the data for all the variables.

This plots the price behavior of the six ETFs representing developed countries for the 60-month period.

This plots the price behavior of the eight ETFs representing developing countries for the 60-month period.

According to [2], “to use canonical correlation analysis safely for descriptive purposes requires no distributional assumptions.” However, they still state that “to test the significance of the relationships between canonical variates, (…), the data should meet the requirements of multivariate normality and homogeneity of variance” ([2], p. 339). Is the data normally distributed in this sense?

As can be seen, the null hypothesis of normality cannot be rejected for all variables at the 5% confidence level.

In order to perform the canonical correlation analysis, it is necessary to organize the data into two groups of variables: (representing the developed countries) and (representing the developing countries);

where to represent the developed countries’ ETFs and to represent the developing countries’ ETFs.

In canonical correlation analysis, and , and the problem is to find the “most interesting” linear combinations

for the two sets of variables, that is, those values that maximize

(2) |

Let be the concatenation of the matrices and ,

so

where and are the (empirical) variance-covariance matrices and and are the mean vectors of and , respectively. represents the covariance matrix of and , and is its transpose.

From equation (1) and from the properties

(3) |

(4) |

where and are conformable and is a constant,

(5) |

where

CCA can be performed either on variance-covariance matrices or on correlation matrices. If the random variables and are standardized to have unit variance, the variance-covariance matrix becomes a correlation matrix.

After partitioning the variance-covariance matrix, and given equation (5), the main objective is to solve

(6) |

subject to

To solve this problem, define:

(7) |

A singular value decomposition of gives

(8) |

where

(9) |

(10) |

(11) |

and are column orthonormal matrices , and is a diagonal matrix with positive elements, namely, the eigenvalues of . (For detailed information about singular value decomposition, see [3].) From the property

and from equation (7),

For this solution procedure, the largest eigenvalue of is the canonical correlation of our analysis. and can also be found through

(12) |

(13) |

The problem in this case is to solve the following canonical equations [2, 4]:

(14) |

and

(15) |

where is the identity matrix and is the largest eigenvalue for the characteristic equations

(16) |

and

(17) |

The largest eigenvalue of the product matrices

is the squared canonical correlation coefficient. Furthermore, it can be shown that

(18) |

and

(19) |

which means that only one of the characteristic equations needs to be solved in order to find or .

This transposes the data.

This checks the dimensions of `Z`; it has 60 rows (months) and 14 columns (ETFs).

There are 14 random variables (six in the first set and eight in the second); the dimensions of the submatrices are 60×6 for , 60×8 for , 6×6 for , 6×8 for , 8×6 for , and 8×8 for .

Define `M1` to be the variance-covariance matrix of `Z`. Here are the first seven columns of `M1`.

Partition `M1` into the four submatrices , , , and .

To better understand the relationship between the random variables, here is `M2`, the correlation matrix of `Z`.

This defines `K`.

This performs the singular value decomposition on `K`.

This is the largest eigenvalue of `K`.

This checks by computing the square root of the eigenvalues of

and

according to the second solution procedure. (`Chop` replaces numbers that are close to zero by the exact integer 0.)

Performing a spectral decomposition on and and calculating the square roots of their eigenvalues is another check of the canonical correlation coefficient.

The checks agree.

The last step in this analysis is to find the canonical correlation vectors, which maximize the correlation between the canonical variates. According to equations (12) and (13), this computes the canonical correlation vectors.

The canonical correlation matrix ` B` is computed using , not , because

Given that

the canonical correlation vectors and are the columns of and .

In terms of the canonical correlation vectors, the canonical variates are

where, as before,

Given that

(20) |

only and are needed in order to find . Thus, the only canonical variates needed are and .

The interpretation of canonical correlation coefficients, canonical correlation vectors, and canonical variates is one of the most difficult tasks in the whole analysis. CCA would be better understood relating the original data matrix to the matrix computed using the canonical correlation vectors, which is simply a reduction of the data matrix through linear combinations of its elements. It should be easier to understand that the canonical correlation coefficient is merely the ordinary Bravais-Pearson correlation between the two columns of the reduced matrix.

In principle, one can say that the highest canonical correlation coefficient that was found is the maximum possible correlation between the two columns of the reduced matrix. In this case, it is usual to say that this coefficient represents the relationship between the two datasets, and , in the sense of a correlation measure. Thus, if is the matrix containing the explanatory factors of , the matrix containing the criterion measures (or criterion variables), it is possible to say that the explanatory factors would perfectly explain the criterion variables if . If , the explanatory factors have no influence on the criterion variables, and any value between 1 and 0 is merely an interpolation of these extreme cases.

In the next inputs we will compute and show (partially) the reduced data matrix. In order to demonstrate the validity of the CCA theory, we also compute the correlation for the other (not so interesting for our analysis) canonical variates. We start by defining and .

The first column of our reduced data matrix is .

The first value of , for instance, refers to the linear combination between EWA, EWC, EWG, EWJ, EWU, and SPY for March 2008, such that

We can also define .

Thus, after assigning the values to the canonical variates, , , , and , we have four vectors with the values of the linear combinations of and . Now we can simply compute the Bravais-Pearson correlation between all the canonical variables.

We also verify equation (20).

The correlation between the canonical variates can be better interpreted graphically. First we show the reduced matrix computed using the canonical correlation vectors and , whose canonical correlation coefficient is .

Now we show the reduced matrix computed using the canonical correlation vectors and , whose canonical correlation coefficient is .

Finally, we compute the *canonical loadings*, that is, the correlation between every single ETF and its respective canonical variate.

We can also compute the *canonical cross-loadings*, that is, the correlation between every single ETF and its opposite canonical variate.

It might be of interest to compute the canonical loadings for the *second canonical variate*, that is, the linear combination of variables with correlation coefficient .

Finally, we compute the canonical cross-loadings for the second canonical variate, that is, the linear combination of variables with correlation coefficient .

It is possible to compute canonical loadings and cross-loadings for all the six canonical variates. However, only the first two are shown here for descriptive purposes.

In this section we test the hypothesis of no correlation between the two sets and . An approximation for large was provided in [5]:

(21) |

where

We can also test the hypothesis that the individual canonical correlation coefficients are different from zero:

(22) |

where is a parameter to select the canonical correlation coefficient to be tested.

This defines the Bartlett variable.

This assigns values to the .

We calculate Bartlett’s statistic (equation (21)) to test if the two sets of variables and are uncorrelated. Our hypotheses are:

This computes the 99% quantile of the chi-square distribution with 48 () degrees of freedom, .

**Test Conclusion**: The hypothesis of no correlation between the two sets has to be rejected once the Bartlett statistic (here 249.415) is greater than the 99% quantile of the chi-square distribution with 48 degrees of freedom (here 73.6826).

This article analyzed the relationship between two sets of variables, namely financial assets represented by NYSE-traded country-specific ETFs. The ETFs were divided into two sets representing developed and developing countries. In the first set a total of six ETFs (representing developed countries) were included, while in the second set a total of eight ETFs were included (representing developing countries). Using monthly return data for a five-year period it was possible to show, through canonical correlation analysis (CCA), that there is a significant relationship between these two sets of ETFs. The highest correlation coefficient found in the present study was and, in an analogous manner to statistics in regression analysis, we could interpret its squared value as the explanatory power of the canonical correlation analysis. In other words, the squared canonical correlation coefficient indicates the proportion of variance a dependent variable linearly shares with the independent variable generated from the observed variable’s set (i.e., the canonical variates).

[1] | J. Franke, W. Härdle, and C. Hafner, Einführung in die Statistik der Finanzmärkte, Berlin: Springer Verlag, 2001. |

[2] | W. R. Dillon and M. Goldstein, Chap. 9 in Multivariate Analysis: Methods and Applications, New York: Wiley, 1984. |

[3] | K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis, London: Academic Press, 1979. |

[4] | T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 3rd ed., New York: Wiley, 2003. |

[5] | M. S. Bartlett, “A Note on Tests of Significance in Multivariate Analysis,” Proceedings of the Cambridge Philosophical Society, 35(2), pp. 180-185, 1939. doi:10.1017/S0305004100020880. |

R. L. Malacarne, “Canonical Correlation Analysis,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-6. |

Rodrigo Loureiro Malacarne is a professor of financial mathematics and financial management at the Faculdades Integradas Espirito Santenses (FAESA). His areas of research include statistics of financial markets and financial time series analysis.

**Rodrigo Loureiro Malacarne
**

Faculdades Integradas Espirito Santenses (FAESA)

Av. Vitória, 2.220 – Monte Belo

Vitória, ES, Brazil – CEP 29.053-360

malacarne@gmail.com

Motivated by the computational advantages offered by *Mathematica,* I decided some time ago to embark on collecting and implementing properties of the fascinating geometric figure called the arbelos. I have since been impressed by the large number of surprising discoveries and computational challenges that have sprung out of the growing literature concerning this remarkable object. I recall its resemblance to the lower part of the iconic canopied penny-farthing bicycle of the 1960s TV series *The Prisoner*, Punch’s jester cap (of *Punch and Judy* fame), and a yin-yang symbol with one arc inverted; see Figure 1. There is now an online specialized catalog of Archimedean circles (circles contained in the arbelos) [1] and important applications outside the realm of mathematics and computer science [2] of arbelos-related properties.

Many famous names are involved in this fascinating theme, among them Archimedes (killed by a Roman soldier in 212 BC), Pappus (320 AD), Christian O. Mohr (1835-1918), Victor Thébault (1882-1960), Leon Bankoff (1908-1997), and Martin Gardner (1914-2010). Recently, they have been succeeded by Clayton Dodge, Peter Y. Woo, Thomas Schoch, Hiroshi Okumura, and Masayuki Watanabe, among others.

Leon Bankoff was the person who stimulated the extraordinary attention on the arbelos over the last 30 years. Schoch drew Bankoff’s attention to the arbelos in 1979 by discovering several new Archimedean circles. He sent a 20-page handwritten note to Martin Gardner, who forwarded it to Bankoff, who then gave a 10-chapter manuscript copy to Dodge in 1996. Due to Bankoff’s death, a planned joint work was interrupted until Dodge reported some discoveries [3]. In 1999 Dodge said that it would take him five to ten years to sort all the material in his possession, then filling three suitcases. Currently this work is still forthcoming. Not surprisingly, like Volume 4 of *The Art of Computer Programming*, it appears that important work needs a substantial time to be developed.

**Figure 1.** *The Prisoner’*s penny-farthing bicycle, Punch and Judy, a physical arbelos.

The arbelos (“shoemaker’s knife” in Greek) is named for its resemblance to the blade of a knife used by cobblers (Figure 1). The arbelos is a plane region bounded by three semicircles sharing a common baseline (Figure 2). Archimedes appears to have been the first to study its mathematical properties, which he included in propositions 4 through 8 of his *Liber assumptorum* (or* Book of Lemmas*). This work might not be entirely by Archimedes, as was recently revealed through an Arabic translation of the *Book of Lemmas* that mentions Archimedes repeatedly without fully recognizing his authorship (some even believe this work to be spurious [4]). The *Book of Lemmas* also contained Archimedes’s famous *Problema Bovinum* [5].

This article aims at systematically enumerating selected properties of the arbelos, without attempting to be exhaustive. Our purpose is to develop a uniform computational methodology in order to tackle those properties in a pedagogical setting. A sequence of properties is arranged and subsequently verified by testing the computationally equivalent predicates. This work includes some discoveries and extensions contributed by the author.

We refer to the largest semicircle as the *top arc* and the two small ones as the left and right *side* *arcs,* or just the *side* *arcs* when there is no need to distinguish them. We use and to denote their respective radii (the top arc thus has radius ). A *segment* between two points is an undirected line segment going from one point to the other, while a *line* through two points is the infinite straight line through the two points. A traditional abuse of notation uses for both the line segment joining the points and and the length of the segment, depending on the context; modern usage is to write for the length of the segment.

This function displays the arbelos.

This draws the basic arbelos.

**Figure 2.** The arbelos.

**Property 1**

In other words, the total length of the side arcs equals the length of the top arc. This property is related to an intriguing paradox [6].

**Property 2**

This was lemma 4 of the *Book of Lemmas *(see Figure 3) [7, 8].

These two properties are easily verified by simultaneously testing two equalities.

The function `drawpoints` is used to display specific points as red disks.

**Figure 3.** The area of the circle of diameter (the radical circle) is equal to the area of the arbelos.

The circle in Figure 3 is called the *radical circle* of the arbelos and the line is its *radical axis* (this terminology will be clarified in Generalizations). To illustrate properties 3-11 and 25, 26, we draw and label points and show some coordinates, lines, and circles in Figure 4.

**Figure 4.** Labels, coordinates, lines, and circles referred to in properties 3 through 11 and 25, 26.

**Property 3**

The lines and are orthogonal and intersect the side arcs at points and , joining a common tangent to the side arcs.

To verify the orthogonality of the lines and , we take the inner product of the vectors and .

We employ the following result to obtain the slopes at the points and .

**Theorem 1**

The function `PQ` finds the coordinates of the tangent points and by solving a system of four equations, which places them on the arcs and sets their tangent slopes according to theorem 1.

Besides `PQ`, other definitions in this article for points and quantities are: `VWS`, `HK`, `U`, `EF`, `IJr`, and `LM`.

The function `dSq` computes the square of the distance between two given points.

**Property 4**

As is a diameter of the radical circle, we only need to verify the equality of the distances of and to the center of the radical circle, namely the point .

**Property 5**

Let the line intersect the top arc at points and . Then and lie on a circle with center and radius .

We get the coordinates of the points and by solving a system of equations that places them on the top arc and on the line .

This verifies property 5 by checking that the distances of and to are the same as the distance from to .

**Property 6**

This is equivalent to the fact that the determinant (cross product) of the vectors and is zero.

**Property 7**

This is equivalent to the fact that the inner product of the vectors and is zero.

Let us use the notation for a circle with center and radius .

**Property 8**

The inversion of a point in the circle , is defined to be the unique point such that [9]. The function `inversion` implements this idea.

This verifies property 8, recalling the coordinates of are .

**Property 9**

Let be the circle of inversion. The points , , invert to themselves. The segment inverts to the arc and the segment inverts to the arc . The arcs and invert to themselves. The radical circle inverts to the line .

**Property 10**

This is the same as claiming that the corresponding arcs are orthogonal to the radical circle. By property 8, the arcs are orthogonal to the circle with diameter as they pass through inverse pairs [10, 11].

**Property 11**

This is one of Bankoff’s surprises [12, 13, 14]. As all four points are on the radical circle, we need to verify only that bisects .

The following `Manipulate` illustrates properties 3-11. The easiest way to define the points `P`, `Q`, `H`, `K` is to copy and paste the formulas for them.

Now consider the circle tangent to the side arcs and the top arc, the *incircle* with tangent points , , and as shown in Figure 5 [15, 16]. We also consider points and at the tops of the side arcs.

**Figure 5.** The incircle and coordinates, lines, and points referred to in properties 12 through 15.

Proposition 6 of the *Book of Lemmas* included the value of , the radius of the incircle. The function `U` calculates the coordinates of the center and the radius .

The coordinates of the tangent points , , and are obtained as the intersections of the lines joining the centers of the three arcs of the arbelos and the incircle.

**Property 12**

The points , , and are collinear. The points , , and are collinear. The lines and intersect in a point lying on the incircle.

Using the criterion of the determinant to check for collinearity, we verify the first two claims.

Let be the point of intersection of the lines and . Confirming that its distance to is equal to verifies the third claim.

**Property 13**

The points , , , and are on a circle with center . Similarly, the points , , , and are on a circle with center .

The following `Manipulate` illustrates property 13 [17]. The option for showing the Bankoff circle as the incircle of the triangle joining the center of the arcs and the incircle corresponds to property 23.

**Property 14**

Let be the diameter of the incircle parallel to and let be the projection of onto . The rectangle between the segments and is a square.

This property is illustrated in the next `Manipulate` and is readily verified here.

**Property 15**

Let and be the intersections of the lines and with the side arcs. Then is a square of almost the same size as the one mentioned in property 14.

First we obtain points and as the intersections of their respective lines and their respective arcs, and keep the result in the variable `replaceEF`.

We verify property 15 by setting to be equal to the vector obtained by rotating around by 90° and setting to be equal to the vector obtained by translating by .

Assuming and, the following plot compares the sizes of the two squares.

This `Manipulate` illustrates properties 14 and 15.

Consider the two gray circles tangent to the radical axis, a side arc, and the top arc in Figure 6. They are called *the twins*, or the *Archimedean circles*. Due to the following remarkable property, they have been extensively studied. We collect many of their extraordinary occurrences in our list of properties [3, 18, 19].

**Figure 6.** The twins.

**Property 16**

The two circles tangent to the radical axis, the top arc, and one of the side arcs of an arbelos have the same radius.

This property appeared as proposition 5 in the *Book of Lemmas*. Solving the following system of six equations finds the values of the radii, verifies they are equal, and computes the centers , .

These four solutions give the centers in pairs: , , , , where and are the reflections of and in the diameter of the arbelos; only the last expression is valid. The result also shows that the twins are indeed of the same radius . Any circle with radius equal to the twins’ radius is called *Archimedean*. A nice interpretation of arises when considering and as resistances: then is the resistance resulting from connecting and in parallel; that is, . The function `IJr` computes the value of the centers and the common value of the radius of the twins.

**Property 17**

Consider a circle tangent to both twins, with center at point and radius . Then there are two possible values of .

To find the extrema of , we set the derivative of each of the above expressions to zero and solve for .

So the centers of the smallest and largest circles tangent to the twins lie on the radical axis. Moreover, they are concentric, as this result confirms.

Thus, by using property 2, we confirm that the largest tangent circle, which is the smallest enclosing the twins, satisfies property 17. The following `Manipulate` shows the circles tangent to the twins as you vary the radius of the left side arc.

The following plot compares the radii of the two circles tangent to the twins with centers on the radical axis.

**Figure 7.** Labels and lines referred to in properties 18 through 24.

**Property 18**

The common tangent of the left arc and its tangent twin at passes through . Similarly, the common tangent of the right arc and its tangent twin at passes through (see Figure 7).

This computes the tangent points and .

By using theorem 1, we verify both claims.

**Property 19**

We verify both claims simultaneously.

However, the points , , and are not on a circle centered at , nor are the points , , and on a circle centered at ; otherwise, the following expression would be zero.

**Property 20**

As the length of the segment is the ordinate of and the length of the segment is the ordinate of , we only need to verify that the midpoints of those segments lie on the mentioned lines by checking slopes.

**Property 21**

Those circles are the fourth and fifth Archimedean circles discovered by Bankoff [20]. In order to verify this property, we use the following result [21]:

**Theorem 2**

This directed distance is positive if the triangle is traversed counterclockwise and negative otherwise. The function `dAB` implements this.

Let and be the center and radius of the blue circle on the left side of point in Figure 7. Solving the following system finds the value of .

Similarly, this calculates the radius of the blue circle to the right of , which equals .

Thus, both circles are Archimedean as claimed. The following `Manipulate` shows the twins and these two other circles.

**Property 22**

Archimedes discovered the original twins; Bankoff improved on this by discovering this third circle in 1950 [22]. The coordinates of the center of the Bankoff circle are obtained by equating the distances of to the points , , and .

**Property 23**

The Bankoff circle is the incircle of the triangle formed by joining the centers of the side arcs and the center of the incircle of the arbelos.

Using theorem 2 to compute the distance of to the sides of the triangle, we verify this property (as `dAB` computes a directed distance, the order of the arguments describing the line is important).

**Property 24**

This computes the values of and .

The circle is the one where the ordinate of is positive. Note that is not on the radical axis.

**Property 25**

The circles and tangent to the radical axis, one passing through and the other passing through the point , are both Archimedean (see Figure 4).

**Property 26**

A circle with center and radius tangent to the line is such that the distance from to is

, so this equation holds:

Because the circle passes through ,

Because the circle is tangent to the top arc,

This input uses explicit expressions for , , and that satisfy these three equations.

**Property 27**

Consider the two (red) segments connecting the center of the top arc to the top points and of the left and right arcs of the arbelos. These segments have the same length and are orthogonal. The tangent circles and at and to those lines and the top arc are Archimedean (see Figure 8).

This property was discovered in the summer of 1998 [23].

**Figure 8.** The two pairs of Archimedean circles from property 27.

We have seen that there are some Archimedean circles other than the twins, namely the Bankoff circle and those mentioned in properties 21 through 27. There are also *non-Archimedean twins*, that is, pairs of circles of the same radius, different than that of the twins, appearing at significant places within the arbelos.

The discovery of the *slanted twins *arose from the initial assumption that, besides being tangent to either side arc and the top arc, the two circles-to-be-twins could be tangent to themselves and not necessarily to the radical axis. Clearly there are an infinite number of solutions if we do not require these circles to be of equal radius. The idea was that if we started by assuming they are of equal radius, we might end up discovering they are tangent to the radical axis. This turned out not to be the case. Let us consider circles with centers at the points and with common radius . The value of can be obtained by solving a system of five equations.

These expressions involve square roots differing in sign. The ones using the plus sign diverge at and are rejected.

The other one converges.

We conclude that the slanted twins are indeed congruent and that their common radius is

The following comparison between the radii of the twins and the slanted twins shows that their difference turns out to be very small.

This gives the coordinates of the centers of the slanted twins.

The following `Manipulate` shows the slanted twins and, optionally, the twins, as you vary .

In this section we generalize the shape of an arbelos by allowing the arcs to cross and by considering a 3D version. To set the context of the first of those generalizations, we need the concept of the *radical axis of two circles*.

Let be a point and be the circle . The *power* of with respect to is defined to be the real number . The power of is positive, zero, or negative depending on whether lies outside, on, or inside [12]. Let ; if the points of satisfy the equation , then an alternative way to define the power of is to evaluate . (A similar result applies if , when the circle degenerates to a line, in which case the sign of indicates whether is above, on, or below the line.)

Here is a very interesting property of the power of a point. Given a circle and a point , choose an arbitrary line through meeting the circle at points and . Then the product depends only on —it is independent of the choice of line through . This product is equal to the power of .

In the following `Manipulate`, drag the four locators to vary the size of the circle, the position of , and the slope of the line through .

Given two circles with different centers, their *radical axis* is defined to be the line consisting of all points that have equal powers with respect to each of the two circles. Proofs of the following can be found in [10].

**Theorem 3**

If two circles intersect at two points and , then their radical axis is the common secant . If two circles are tangent at , then their radical axis is their common tangent at .

**Corollary 1**

Given three circles with noncollinear centers, the three radical axes of the circles taken in pairs are distinct concurrent lines.

**Theorem 4**

The radical axis of two circles is the locus of points from which tangents drawn to both circles have the same length.

The following `Manipulate` shows two circles; one is fixed, and you can vary the center and size of the other one by dragging the locator or changing its radius with the slider. You can use the other slider to move the red point on the radical axis to illustrate theorem 4.

The following `Manipulate` illustrates two generalizations.

**Property 28**

The inscribed circles tangent to the radical axis of the side arcs and the top arc and either of the arcs of the generalized arbelos have the same radius.

Let be the length of the *gap* between the bases (so that the diameter of the top arc is ) and let be the abscissa of the intersection of the radical axis with the axis, assuming the origin is at the leftmost point of the arbelos [10].

**Theorem 5**

With the help of this theorem, we compute the value of .

We can assume without loss of generality that , , and ( can be negative). Let the inscribed circles be and . The values of these parameters are obtained as follows.

Then, although some centers can be disregarded, the radius is the same in all cases.

Finally, here are three more properties of the arbelos. See if you can guess what property is involved by experimenting with the controls [24, 25].

This first `Manipulate` lets you move the side arcs in a systematic way.

This second `Manipulate` lets you rotate a line around the point of tangency of the side arcs.

Finally, the third `Manipulate` shows an infinite family of twins.

[1] | F. van Lamoen. “Online Catalogue of Archimedean Circles.” (Jan 22, 2014) home.planet.nl/~lamoen/wiskunde/arbelos/Catalogue.htm. |

[2] | S. Garcia Diethelm. “Planar Stress Rotation” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/PlanarStressRotation. |

[3] | C. W. Dodge, T. Schoch, P. Y. Woo, and P. Yiu, “Those Ubiquitous Archimedean Circles,” Mathematical Magazine, 72(3), 1999 pp. 202-213. www.jstor.org/stable/2690883. |

[4] | H. P. Boas, “Reflection on the Arbelos,” American Mathematical Monthly, 113(3), 2006 pp. 236-249. |

[5] | H. D. Dörrie, 100 Great Problems of Elementary Mathematics: Their History and Solution (D. Antin, trans.), New York: Dover Publications, 1965. |

[6] | J. Rangel-Mondragón. “Recursive Exercises II: A Paradox” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/RecursiveExercisesIIAParadox. |

[7] | R. B. Nelsen, “Proof without Words: The Area of an Arbelos,” Mathematics Magazine, 75(2), 2002 p. 144. |

[8] | A. Gadalla. “Area of the Arbelos” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/AreaOfTheArbelos. |

[9] | J. Rangel-Mondragón, “Selected Themes in Computational Non-Euclidean Geometry. Part 1. Basic Properties of Inversive Geometry,” The Mathematica Journal, 2013. www.mathematica-journal.com/2013/07/selected-themes-in-computational-non-euclidean-geometry-part-1. |

[10] | D. Pedoe, Geometry: A Comprehensive Course, New York: Dover, 1970. |

[11] | M. Schreiber. “Orthogonal Circle Inversion” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/OrthogonalCircleInversion. |

[12] | M. G. Welch, “The Arbelos,” Master’s thesis, Department of Mathematics, University of Kansas, 1949. |

[13] | L. Bankoff, “The Marvelous Arbelos,” The Lighter Side of Mathematics (R. K. Guy and R. E. Woodrow, eds.), Washington, DC: Mathematical Association of America, 1994. |

[14] | G. L. Alexanderson, “A Conversation with Leon Bankoff,” The College Mathematics Journal, 23(2),1992 pp. 98-117. |

[15] | S. Kabai. “Tangent Circle and Arbelos” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/TangentCircleAndArbelos. |

[16] | G. Markowsky and C. Wolfram. “Theorem of the Owl’s Eyes” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/TheoremOfTheOwlsEyes. |

[17] | P. Y. Woo, “Simple Constructions of the Incircle of an Arbelos,” Forum Geometricorum, 1, 2001 pp. 133-136. forumgeom.fau.edu/FG2001volume1/FG200119.pdf. |

[18] | B. Alpert. “Archimedes’ Twin Circles in an Arbelos” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/ArchimedesTwinCirclesInAnArbelos. |

[19] | J. Rangel-Mondragón. “Twins of Arbelos and Circles of a Triangle” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/TwinsOfArbelosAndCirclesOfATriangle. |

[20] | H. Okumura, “More on Twin Circles of the Skewed Arbelos,” Forum Geometricorum, 11, 2011 pp. 139-144. forumgeom.fau.edu/FG2011volume11/FG201114.pdf. |

[21] | E. W. Weisstein. “Point-Line Distance—2-Dimensional” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/Point-LineDistance2-Dimensional.html. |

[22] | L. Bankoff, “Are the Twin Circles of Archimedes Really Twins?,” Mathematics Magazine, 47(4), 1974 pp. 214-218. |

[23] | F. Power, “Some More Archimedean Circles in the Arbelos,” Forum Geometricorum, 5, 2005 pp. 133-134. forumgeom.fau.edu/FG2005volume5/FG200517.pdf. |

[24] | A. V. Akopyan, Geometry in Figures, CreateSpace Independent Publishing Platform, 2011. |

[25] | H. Okumura and M. Watanabe, “Characterizations of an Infinite Set of Archimedean Circles,” Forum Geometricorum, 7, 2007 pp. 121-123. forumgeom.fau.edu/FG2007volume7/FG200716.pdf. |

J. Rangel-Mondragón, “The Arbelos,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-5. |

Jaime Rangel-Mondragón received M.Sc. and Ph.D. degrees in applied mathematics and computation from the University College of North Wales in Bangor, UK. He has been a visiting scholar at Wolfram Research, Inc. and has held positions in the Faculty of Informatics at UCNW, the College of Mexico, the Center for Research and Advanced Studies, the Monterrey Institute of Technology, the Queretaro Institute of Technology, and the University of Queretaro in Mexico, where he is presently a member of the Faculty of Informatics. His current research includes combinatorics, the theory of computing, computational geometry, urban traffic, and recreational mathematics.

**Jaime Rangel-Mondragón**

*UAQ, Facultad de Informatica
Queretaro, Qro. Mexico*

Generally, to carry out a regression procedure one needs to have a model , an error definition , and the probability density function of the error . Considering the set as measurement points, the maximum likelihood approach aims at finding the parameter vector that maximizes the likelihood of the joint error distribution. Assuming that the measurement errors are independent, we should maximize (see eg. [1])

(1) |

Instead of maximizing this objective, we minimize

(2) |

Consider the Gaussian-type error distribution as ; then our estimator is

(3) |

In our case the model is a line,

(4) |

It can be seen that (in the case of Gaussian-type measurement noise) only the type of the error model determines the parameter values, since we should always minimize the least squares of the errors. There are different error models, which can be applied to fitting a line in a least-squares sense. The error model frequently employed, assuming an error-free independent variable , is the ordinary least squares model ()

(5) |

Similarly, one may also consider an error-free dependent variable . Then the error model () is

(6) |

These approaches are called the *algebraic approach*.

Another error model considers the geometrical distance between the data point and the line to be fitted. This type of fitting is also known as *orthogonal regression*, since the distances of the sample points from the line are evaluated by computing the orthogonal projection of the measurements on the line itself. The error in this case [2] is

(7) |

This *geometrical approach* or *total least squares* () approach can also be considered as an optimization problem with constraints; namely, one should minimize the errors in both variables [3]:

(8) |

under the conditions

(9) |

In addition, one can also combine and to construct an error model. The first possibility is to consider the geometric mean of these two types of errors,

(10) |

These error models are illustrated in Figure 1.

**Figure 1.** The different error models in the case of fitting a straight line.

This model is also called the *least* *geometric mean deviation* approach or model (see [4)]. As a second possibility, one may consider and as competing functions of the parameters and find their Pareto-front representing a set of optimal solutions for the parameters . Since this multi-objective problem is convex, the objective can be expressed as a linear combination of these error functions, namely

(11) |

where is a parameter, , and the set of optimal solutions of the parameters belonging to the Pareto-front is . You can choose the value of depending on your trade-off preference between and [5].

Symbolic computation can be used to avoid direct minimization and to get an explicit formula for the estimated parameters. We apply the *Mathematica* function `SuperLog` developed in [6], which uses pattern matching that enhances *Mathematica*’s ability to simplify expressions involving the natural logarithm of a product of algebraic terms.

Let us activate this function.

Then this is the ML estimator for Gaussian-type noise.

Now let us consider the problem.

Here are the necessary conditions for the optimum.

Let us introduce the following constants:

(12) |

(13) |

(14) |

(15) |

(16) |

In those terms, here are the necessary conditions for the optimum.

Then this is the optimal solution of the parameters.

Although the equation system for the parameters of is linear, for other error models we get a multivariable algebraic system. Now consider the problem. Here is the maximum likelihood function.

Therefore here is the equation system to be solved.

Since , the conditions are as follows.

A Gröbner basis solves this system, eliminating .

Since the second equation is linear, it is reasonable to compute first, then .

The error model also leads to a second-order polynomial equation system. Now here is the ML estimator.

Consequently, here is the system to be solved for the parameters.

Assume .

Again a Gröbner basis gives a second-order system.

When is known, the other parameter can be computed.

In the case of the Pareto approach, the system is already fourth order.

Here is the system.

Here is the system in compact form.

Here is the Gröbner basis for the first parameter.

Assume that .

After solving this polynomial for , the other parameter can be solved from the second equation, which is linear in .

Consider some data on rainfall (in mm) and the resulting groundwater level changes (in cm) from a landslide along the Ohio River Valley near Cincinnati, Ohio [7].

There are 14 measurements.

This displays the measured data.

**Figure 2.** The measured data: rainfall versus water level change in dimensional form.

The constants , , , , and in equations (12) to (16) are needed.

This separates the data.

This transforms the data into dimensionless form.

**Figure 3.** The measured data: rainfall versus water level change in dimensionless form.

Now the constants can be computed.

Here are the estimated parameters employing the explicit solutions.

This checks the result.

Figure 4 shows the estimated line with the sample points.

**Figure 4.** The sample points with the line estimated with .

Here are the first and second parameters.

Here is a check of this result on the basis of the definition. Equation (8) gives the objective function.

The constraints are .

The unknown variables are not only the parameters, but the adjustments as well.

This uses a built-in global optimization method. (This takes a long time to compute.)

The estimation gives a result quite different from the model; see Figure 5.

**Figure 5.** The lines estimated with the (red) and (green) models.

Since the constraints are linear, the optimization can be written in unconstrained form, reducing the original number of variables to .

Now here is the first parameter.

This uses the result.

Here is a numerical check of the objective.

Figure 6 shows this result together with the and models.

**Figure 6.** The lines estimated with the (red), (green), and (blue) models.

The first parameter is a fourth-order polynomial.

The best trade-off between and is to let .

This is the real positive solution.

Using this value gives the second parameter.

We compute the solution using direct global minimization. Here is the objective.

This gives the result.

Figure 7 shows this solution with the results of the other models.

**Figure 7.** The lines estimated with the (red), (green), and (blue) models, and the Pareto approach with (magenta).

The numerical computations show that the formulas developed by an ML estimator via symbolic computation to determine the parameters of a straight line to be fitted provide correct results and require considerably less computation time than the direct methods based on global minimization of the residuals. Our examples also illustrate that the , , and Pareto approaches give more realistic solutions than the traditional , since Figure 7 shows there are at least two outliers in the sample set.

[1] | W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C, 2nd ed., Cambridge: Cambridge University Press, 1992. |

[2] | M. Zuliani. “RANSAC for Dummies.” (Jan 10, 2014) vision.ece.ucsb.edu/~zuliani/Research/RANSAC/docs/RANSAC4Dummies.pdf. |

[3] | B. Schaffrin, “A Note on Constrained Total Least-Squares Estimation,” Linear Algebra and Its Applications, 417(1), 2006 pp. 245-258. doi:10.1016/j.laa.2006.03.044. |

[4] | C. Tofallis, “Model Fitting for Multiple Variables by Minimising the Geometric Mean Deviation,” in Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications (S. Van Huffel and P. Lemmerling, eds.), Dordrecht: Kluwer, 2002. |

[5] | B. Paláncz and J. L. Awange, “Application of Pareto Optimality to Linear Models with Errors-in-All-Variables,” Journal of Geodesy, 86(7), 2012 pp. 531-545.doi:10.1007/s00190-011-0536-1. |

[6] | C. Rose and M. D. Smith, “Symbolic Maximum Likelihood Estimation with Mathematica,” The Statistician, 49(2), 2000 pp. 229-240. www.jstor.org/stable/2680972. |

[7] | W. C. Haneberg, Computational Geosciences with Mathematica, Berlin: Springer, 2004. |

B. Paláncz, “Fitting Data with Different Error Models,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-4. |

Béla Paláncz received his D.Sc. degree in 1993 from the Hungarian Academy of Sciences and has wide-ranging experience in teaching and research (RWTH Aachen, Imperial College London, DLR Köln, and Wolfram Research). His main research fields are mathematical modeling and symbolic-numeric computation.

**Béla Paláncz**

*Department of Photogrammetry and Geoinformatics,
Budapest University of Technology and Economics
1521 Budapest, Hungary *

The Karush-Kuhn-Tucker equations (under suitable assumptions) provide necessary and sufficient conditions for the solution of the problem of maximizing (minimizing) a concave (convex) function.

For an excellent reference, see the tutorial in [2]. Here we modify the code of [1] by correcting minor typos, simplifying, and letting the user specify restrictions on the exogenous parameters of the model.

The inputs of `KT` are the objective function to be maximized, the list of constraints, and the list of choice variables. Here is an example from consumer choice theory: maximize a utility function, subject to a budget constraint.

Several of the solutions do not make economic sense, because they do not use the fact that the income, price of good , and price of good are all positive. However, `KT` lets the user specify restrictions on the exogenous parameters of the model.

An important advantage of `KT` over other optimization functions (such as `Maximize` or `Minimize`) is that `KT` returns the value of the Kuhn-Tucker multipliers. These multipliers have an important economic interpretation: they are shadow prices for the constrained resources. In the above example, for instance, the value of is the “infinitesimal” increment in the utility function of the consumer that is generated when the budget constraint is relaxed by increasing the consumer’s income by an “infinitesimal amount.”

Nash equilibrium is the main solution concept in game theory. It is a crucial tool for economics and political science models. Essentially, a Nash equilibrium is a profile of strategies (one strategy for each player), such that if a player takes the choices of the others as given (i.e. as parameters), then the player’s strategy must maximize his or her payoff.

The function `Nash` takes as input the payoff function of player 1, the payoff function of player 2, and the actions available to players 1 and 2. It returns the entire set of Nash equilibria.

There are many versions of Colonel Blotto’s game; this is a simple one taken from [3]. General A (row player) has three divisions to defend a city; she has to choose how many divisions to place at the north road and how many divisions at the south road. General B (column player) has two divisions to try to invade the city; he also has to choose how many divisions to be assigned to the north road and how many to the south road. If General A has at least as many divisions as General B at a given road, General A wins the battle there (defense is favored in the case of a tie). To win the game, however, A must defeat B on both battlefields. Thus, A has four possible strategies and B has three strategies. The table below summarizes the players’ strategies and payoffs (, for the whole campaign). For example, in the first row and first column the entry is , which means A won and B lost; A chose three divisions for the north road and none for the south road; B chose two for the north and none for the south. Because and , A won both battles.

A Nash equilibrium for this game is a probability distribution over strategies; use `P` for the probabilities chosen by General A and `Q` for the probabilities chosen by General B.

The game has many Nash equilibria, but we still can make predictions: General B is never going to spread his forces evenly (the probability of his second strategy is zero in any equilibrium, ); with probability , B’s two divisions are placed at the north road () and with probability , they are placed at the south road (). As for General A, the probability that she places all of her three divisions on one front is less than half (i.e. and ). Also, the probability that General A places two or more divisions at the north (or south) is always equal to half (i.e. and ).

This game is also borrowed from [3]. A deck has two cards, one high and one low. Each player places one dollar into the pot. Player 1 gets one card from the deck. Player 2 does not see Player 1’s card. Player 1 decides whether to raise (by placing another dollar in the pot) or not raise. Player 2 observes 1’s action and then has to decide whether to match the bet or fold. If Player 2 folds, then Player 1 wins the contents of the pot. However, if Player 2 matches, Player 2 places another dollar into the pot if Player 1 had previously raised. Player 1 reveals her card. If it is the high card, Player 1 wins the pot; otherwise, Player 2 wins it.

See Figure 1 for the corresponding game tree. We introduce a fictitious player, Nature, who randomly decides if the card is high or low. We depict the bimatrix representation of the game. Player 1 has four strategies: always raise (RR), always not raise (NN), raise if the card is high and not otherwise (RN), and not raise if the card is high and raise otherwise (NR). Player 2 also has four strategies: always match (MM), always fold (FF), match only if Player 1 raised (MF), and fold only if Player 1 raised (FM). For simplicity, in the bimatrix representation, we write the expected payoffs of Player 1 and omit Player 2’s payoffs (this is without loss of generality in zero-sum games).

**Figure 1.** Game tree of the card game.

In this case, the Nash equilibrium delivers a sharp prediction. When Player 1 has the high card, she always raises (), but when she has the low card, she bluffs with probability (the probability of RR is ). When Player 1 does not raise, Player 2 always matches (). If Player 1 raises, Player 2 still may match, but with probability (the probability of always matching MM is ).

We extended the code of [1] to solve for Kuhn-Tucker conditions with additional assumptions on parameters and, more importantly, using the Kuhn-Tucker equations we provide a program to compute all the Nash equilibria of finite bimatrix games.

We presented a program to compute the set of all Nash equilibria in finite bimatrix games. Its intended goal is as a classroom tool for students and instructors. Needless to say, the code is not efficient. For larger inputs (say bimatrix games with five or more actions per player), `Reduce` often fails to solve the system of Kuhn-Tucker equations. For optimizing algorithms, we suggest [4]. Nevertheless, with continuous improvement of hardware and algorithms for solving semialgebraic systems (see [5]), these methods may become useful for research applications sooner than we think. Finally, as algorithmic game theory courses become more popular in computer science departments, it seems that the time to bring computational methods and algorithms to economics departments is already overdue.

[1] | F. J. Kampas, “Tricks of Using Reduce to Solve Khun-Tucker Equations,” The Mathematica Journal, 9(4), 2005 pp. 686-689.www.mathematica-journal.com/issue/v9i4/contents/Tricks9-4/Tricks9-4_ 2.html. |

[2] | M. J. Osborne. “Optimization: The Kuhn-Tucker Conditions for Problems with Inequality Constraints,” from Mathematical Methods for Economic Theory: A Tutorial. (Jan 8, 2014)www.economics.utoronto.ca/osborne/MathTutorial/MOIF.HTM. |

[3] | M. Osborne, “An Introduction to Game Theory,” New York: Oxford University Press, 2004. |

[4] | R. D. McKelvey, A. M. McLennan, and T. L. Turocy. “Gambit: Software Tools for Game Theory.” (Jan 8, 2014) www.gambit-project.org. |

[5] | Wolfram Research, “Real Polynomial Systems” from Wolfram Mathematica Documentation Center—A Wolfram Web Resource.reference.wolfram.com/mathematica/tutorial/RealPolynomialSystems.html. |

S. O. Parreiras, “Using Reduce to Compute Nash Equilibria,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-3. |

Sérgio O. Parreiras is an associate professor at the UNC-Chapel Hill Department of Economics. His research focus is on game theory and its applications to auctions, mechanism design, and contests. He is also interested in computational economics, general equilibrium theory, algorithmic game theory, and evolutionary anthropology.

**Sérgio O. Parreiras**

*UNC, Department of Economics
Gardner Hall, 200B
Chapel Hill, N.C. 27599-3305
*

The infinite Fibonacci word,

is certainly one of the most studied words in the field of combinatorics on words [1-4]. It is the archetype of a Sturmian word [5]. The word can be associated with a fractal curve with combinatorial properties [6-7].

This article implements *Mathematica* programs to generate curves from and a set of drawing rules. These rules are similar to those used in L-systems.

The outline of this article is as follows. Section 2 recalls some definitions and ideas of combinatorics on words. Section 3 introduces the Fibonacci word, its fractal curve, and a family of words whose limit is the Fibonacci word fractal. Finally, Section 4 generalizes the Fibonacci word and its Fibonacci word fractal.

The terminology and notation are mainly those of [5] and [8]. Let be a finite alphabet, whose elements are called symbols. A word over is a finite sequence of symbols from . The set of all words over , that is, the free monoid generated by , is denoted by . The identity element of is called the empty word. For any word , denotes its length, that is, the number of symbols occurring in . The length of is taken to be zero. If and , then denotes the number of occurrences of in .

For two words and in , denote by the concatenation of the two words, that is, . If , then ; moreover, by denote the word ( times). A word is a subword (or factor) of if there exist such that . If , then and is called a prefix of ; if , then and is called a suffix of .

The reversal of a word is the word and . A word is a palindrome if .

An infinite word over is a map , written as . The set of all infinite words over is denoted by .

**Example 1**

The word , where if is a prime number and otherwise, is an example of an infinite word. The word is called the characteristic sequence of the prime numbers. Here are the first 50 terms of .

**Definition 1**

There is a special class of words with many remarkable properties, the so-called Sturmian words. These words admit several equivalent definitions (see, e.g. [5], [8]).

**Definition 2**

Let . Let , the complexity function of , be the map that counts, for all integer , the number of subwords of length in . An infinite word is a Sturmian word if for all integer .

For example, .

Since for any Sturmian word, , Sturmian words have to be over two symbols. The word in example 1 is not a Sturmian word because .

Given two real numbers , with irrational and , , define the infinite word as . The numbers and are the slope and the intercept, respectively. This word is called mechanical. The mechanical words are equivalent to Sturmian words [5]. As a special case, gives the characteristic words.

**Definition 3**

On the other hand, note that every irrational has a unique continued fraction expansion

where each is a positive integer. Let be an irrational number with and for . To the directive sequence , associate a sequence of words defined by , , , .

Such a sequence of words is called a standard sequence. This sequence is related to characteristic words in the following way. Observe that, for any , is a prefix of , which gives meaning to as an infinite word. In fact, one can prove that each is a prefix of for all and [5].

**Definition 4**

Fibonacci words are words over defined inductively as follows: , , and , for . The words are referred to as the finite Fibonacci words. The limit

(1) |

It is clear that , where is the Fibonacci number, recalling that the Fibonacci number is defined by the recurrence relation for all integer and with initial values . The infinite Fibonacci word is a Sturmian word [5]; exactly, , where is the golden ratio.

Here are the first 50 terms of .

**Definition 5**

The Fibonacci word satisfies and for all .

Here are the first nine finite Fibonacci words.

**Definition 6**

The following proposition summarizes some basic properties about the Fibonacci word.

**Proposition 1**

- The words 11 and 000 are not subwords of the Fibonacci word.
- Let be the last two symbols of . For , if is even and if is odd.
- The concatenation of two successive Fibonacci words is almost commutative; that is, and have a common prefix of length , for all .
- is a palindrome for all .
- For all , , where ; that is, exchanges the two last symbols of .

The Fibonacci word can be associated with a curve using a drawing rule. A particular action follows on the symbol read (this is the same idea as that used in L-systems [9]). In this case, the drawing rule is called “*the odd-even drawing rule*” [7].

**Definition 7**

The Fibonacci curve, denoted by , is the result of applying the odd-even drawing rule to the word . The Fibonacci word fractal is defined as

The program `LShow` is adapted from [10] to generate L-systems.

Figure 1 shows an L-system interpretation of the odd-even drawing rule.

**Figure 1. **Interpretation of the odd-even drawing rule.

Here are the curves for .

The next proposition about properties of the curves and comes directly from the properties of the Fibonacci word from Proposition 1. More properties can be found in [7].

**Proposition 2**

- is composed only of segments of length 1 or 2.
- The number of turns in the curve is the Fibonacci number .
- The curve is similar to the curve .
- The curve is symmetric.
- The curve is composed of five curves: , where is the result of applying the odd-even drawing rule to the word .

The next figure shows the curve and the five curves; here .

The Fibonacci word and other words can be derived from the dense Fibonacci word, which was introduced in [7].

**Definition 8**

(2) |

Given a drawing rule, the global angle is the sum of the successive angles generated by the word through the rule. With the natural drawing rule, , , , then .

For a drawing rule, the resulting angle of a word is the function that gives the global angle. A morphism preserves the resulting angle if for any word , ; moreover, a morphism inverts the resulting angle if for any word , .

The dense Fibonacci word is strongly linked to the Fibonacci word fractal because can generate a whole family of curves whose limit is the Fibonacci word fractal [7]. All that is needed is to apply a morphism to that preserves or inverts the resulting angle.

Here are some examples.

Here are some examples with other angles.

This section introduces a generalization of the Fibonacci word and the Fibonacci word fractal [11].

**Definition 9**

The 2-Fibonacci word is the classical Fibonacci word. Here are the first six -Fibonacci words.

The following proposition relates the Fibonacci word to .

**Proposition 3**

(3) |

**Definition 10**

The -Fibonacci numbers are the Fibonacci numbers and the -Fibonacci numbers are the Fibonacci numbers shifted by one. The following table shows the first terms in the sequences and their reference numbers in the On-Line Encyclopedia of Sequences (OIES) [12].

**Proposition 4**

- The word 11 is not a subword of the -Fibonacci word, .
- Let be the last two symbols of . For , if is even and if is odd, .
- The concatenation of two successive -Fibonacci words is almost commutative; that is, and have a common prefix of length for all and .
- is a palindrome for all .
- For all , , where .

**Theorem 1**

For the proof, see [11]. This theorem implies that -Fibonacci words are Sturmian words.

Note that

where is the golden ratio.

**Definition 11**

The Fibonacci curve, denoted by , is the result of applying the odd-even drawing rule to the word . The -Fibonacci word fractal is defined as

Here are the curves for .

**Proposition 5**

- The Fibonacci fractal is composed only of segments of length 1 or 2.
- The curve is similar to the curve .
- The curve is composed of five curves: .
- The curve is symmetric.
- The scale factor between and is .

This section applies the above ideas to generate new curves from characteristic words (see

Definition 3).

**Conjecture 1**

Here are seven examples.

The first author was partially supported by Universidad Sergio Arboleda under grant number USA-II-2012-14. The authors would like to thank Borut Jurčič-Zlobec from Ljubljana University for his help during the development of this article.

[1] | J. Cassaigne, “On Extremal Properties of the Fibonacci Word,” RAIRO—Theoretical Informatics and Applications, 42(4), 2008 pp. 701-715. doi:10.1051/ita:2008003. |

[2] | W. Chuan, “Fibonacci Words”, Fibonacci Quarterly, 30(1), 1992 pp. 68-76. www.fq.math.ca/Scanned/30-1/chuan.pdf. |

[3] | W. Chuan, “Generating Fibonacci Words,” Fibonacci Quarterly, 33(2), 1995 pp. 104-112. www.fq.math.ca/Scanned/33-2/chuan1.pdf. |

[4] | F. Mignosi and G. Pirillo, “Repetitions in the Fibonacci Infinite Word,” RAIRO—Theoretical Informatics and Applications, 26(3), 1992 pp. 199-204. |

[5] | M. Lothaire, Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications), Cambridge: Cambridge University Press, 2005. |

[6] | A. Blondin Massé, S. Brlek, A. Garon, and S. Labbé, “Two Infinite Families of Polyominoes That Tile the Plane by Translation in Two Distinct Ways,” Theoretical Computer Science, 412(36), 2011 pp. 4778-4786. doi:10.1016/j.tcs.2010.12.034. |

[7] | A. Monnerot-Dumaine, “The Fibonacci Word Fractal,” preprint, 2009. hal.archives-ouvertes.fr/hal-00367972/fr. |

[8] | J.-P. Allouche and J. Shallit, Automatic Sequences: Theory, Applications, Generalizations, Cambridge: Cambridge University Press, 2003. |

[9] | P. Prusinkiewicz and A. Lindenmayer, The Algorithmic Beauty of Plants, New York: Springer-Verlag, 1990. |

[10] | E. Weisstein. “Lindenmayer System” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/LindenmayerSystem.html. |

[11] | J. Ramírez, G. Rubiano, and R. de Castro, “A Generalization of the Fibonacci Word Fractal and the Fibonacci Snowflake,” 2013. arxiv:1212.1368v2. |

[12] | OEIS Foundation, Inc. “The On-Line Encyclopedia of Integer Sequences.” (Aug 9, 2013) oeis.org. |

J. L. Ramírez and G. N. Rubiano, “Properties and Generalizations of the Fibonacci Word Fractal,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-2. |

**José L. Ramírez**

*Instituto de Matemáticas y sus Aplicaciones
Universidad Sergio Arboleda
Calle 74 no. 14 – 14 Bogotá, Colombia*

**Gustavo N. Rubiano **

*Departamento de Matemáticas
Universidad Nacional de Colombia
AA 14490, Bogotá, Colombia*

Bayesian statistics are an orderly way of finding the likelihood of a model from data, using the likelihood of the data given the model. From spam detection to medical diagnosis, spelling correction to forecasting economic and demographic trends, Bayesian statistics have found many applications, and even praise as mental heuristics to avoid overconfidence. However, at first glance Bayesian statistics suffer from an apparent limit: they can only make inferences about known factors, bounded to conditions seen within the data, and have nothing to say about the likelihood of new phenomena [3]. In short, Bayesian statistics are apparently withheld to inferences about the parameters of the model they are provided.

Instead of taking priors over factors of the model itself, we can say that we are taking priors over factors in the process involving how the data was generated. These stochastic process priors give the modeler a way to talk about factors that have not been directly observed. These nonobservable factors include the likely rate at which further factors might be seen, given further observation and underlying categories or structures that might generate the data being observed. For example, in statistics problems we are often presented with drawing marbles of different colors from a bag, and given randomly drawn samples, we might talk about the most likely composition of the bag and the range of likely compositions. However, suppose we had a number of bags, and we drew two marbles each from three of them, discovering two red marbles, two green marbles, and two yellow marbles [4]. If we were to draw marbles from yet another bag, we might expect two marbles identical in color, of a color we have not previously observed. We do not know what this color is, and in this sense we have made a nonparametric inference about the process that arranged the marbles between bags.

The ability to talk about nonobserved parameters is a leap in expressiveness, as instead of explicitly specifying a model for all parameters, a model utilizing infinite processes expands to fit the given data. This should be regarded similarly to the advantages afforded by linked data structures in representing ordinary data. A linked list has a potentially infinite capacity; its advantage is not that we have an infinite memory, but an abstract flexibility to not worry too much about maintaining its size appropriately. Similarly, an infinite prior models the growth we expect to discover [5].

Here are two specific processes that are useful for a number of different problems. These two processes are good for modeling unknown discrete categories and sets of features, respectively. In both of these processes, suppose that we can take samples so that there are no dependencies in the order that we took them, or in other words that the samples are exchangeable. Both of these processes also make use of a concentration parameter, . As we look at more samples, we expect the number of new elements we discover to diminish, but not disappear, as our observations establish a lower frequency of occurrence for unobserved elements. The concentration parameter establishes the degree to which the proportions are concentrated, with low indicating a distribution concentrated on a few elements, and high indicating a more dispersed concentration.

First, let us look into learning an underlying system of categories. In a fixed set of categories of particular likelihood, the probability of a given sample in a particular category corresponds to the multinomial distribution, the multiparameter extension of the Bernoulli distribution. The conjugate prior, or the distribution that gives a Bayesian estimate of which multinomial distribution produced a given sample, is the Dirichlet distribution, itself the multivariable extension of the beta distribution. To create an infinite Dirichlet distribution, or rather a Dirichlet process, one can simply have a recursive form of the beta where the likelihood of a given category is . To use a Dirichlet process as a prior, it is easier to manipulate in the form of a Chinese restaurant process (CRP) [6]. Suppose we want to know the likelihood that the sample is a member of category . If the category is new, then that probability corresponds to the size of the concentration parameter in ratio to the count of the samples taken:

The implementation of this function is straightforward. The use of a parameterized random number function allows for the use of the algorithm in common random number comparison between simulation scenarios [7], as well as for estimation through Markov chain Monte Carlo, about which more will be said later.

In the second process, suppose we are interested in the sets of features observed in sets of examples. For example, suppose we go to an Indian food buffet and are unfamiliar with the dishes, so we observe the selected items that our fellow patrons have chosen. Supposing one overall taste preference, we might say that the likelihood of a dish’s being worth selecting is proportional to the number of times it was observed, but if there are not many examples we should also try some additional dishes that were not tried previously. This process, called the Indian buffet process [8], turns out to be equivalent to a beta process prior [9]. Suppose we want to know the likelihood of whether a given feature is going to be found in the sample. Then, the likelihoods can be calculated directly from other well-understood distributions:

Both of these processes are suitable as components in mixture models. Suppose we are conducting a phone poll of a city and ask the citizens we talk to about their concerns. Each person will report their various civic travails. We expect for each person to have their own varying issues, but also for there to be particular groups of concern for different neighborhoods and professional groups. In other words, we expect to see an unknown set of features emerge from an unknown set of categories. Then, we might use a CRPIBP mixture distribution to help learn those categories from the discovered feature sets.

Nonparametric inference tasks are particularly suited for computational support. What we would like to do is describe a space of potential mixture models that may describe the underlying data-generation processes and allow the inference of their likelihood without explicitly generating the potential structures of that space. Probabilistic programming is the use of language-specific support to aid in the process of statistical inference. This article shows that *Mathematica* has features that readily enable the sort of probabilistic programming that supports nonparametric inference.

Probabilistic programming is the use of language-specific support to aid in the process of statistical inference. Unlike statistical libraries, the structure of the programming language itself is used in the inference process [10]. Although *Mathematica* increasingly has the kinds of structures that support probabilistic programming, we are not going to focus on those features here. Instead, we will see how *Mathematica*’s natural capacity for memoization allows it to be very easily extended to write probabilistic programs that use stochastic memoization as a key abstraction. In particular, we are going to look at Church, a Lisp-variant with probabilistic query and stochastic memoization constructs [11]. Let us now explain stochastic memoization and then look at how to implement Metropolis-Hastings querying, which uses memoization to help implement Markov chain Monte Carlo-driven inference.

Stochastic memoization simply means remembering probabilistic events that have already occurred. Suppose we say that is the first flip of coin `c`. In the first call, it may return `Heads` or `Tails`, depending on a likelihood imposed to coin `c`, but in either case it is constrained in later calls to return the same value. Once undertaken, the value of a particular random event is determined.

In Church, this memoization is undertaken explicitly through its `mem` operator. Church’s flip function designates a Bernoulli trial with the given odds, with return values 0 and 1. Here is an example of a memoized fair coin flip in Church.

(define coinflip (mem (lambda (coin flip) (flip 0.5))))

*Mathematica* allows for a similar memoization by incorporating a `Set` within a `SetDelayed`.

Let us now look to a more complicated case. Earlier, we discussed the Dirichlet process. Church supports a DPmem operator for creating functions that when given a new example either returns a previously obtained sample according to the CRP or takes a new sample, depending upon the category assignment, and returns the previously seen argument. Here is a similar function in *Mathematica*, called `GenerateMemCRP`. Given a random function, we first create a memoized version of that function based on the category index of the CRP. Then, we create an empty initial CRP result, for which a new sample is created and memoized every time a new input is provided, potentially also resampling the provided function if a prediscovered category is provided.

For example, let us now take a sampling from categories that have a parameter distributed according to the standard normal distribution. Here we see outputs in a typical range for a standard normal, but with counts favoring resampling particular results according to the sampled frequency of the corresponding category.

Memoization implies that if we provide the same inputs, we get the same results.

Inference is the central operation of probabilistic programming. Conditional inference is implemented in Church through its various query operations. These queries uniformly take four sets of arguments: query algorithm-specific parameters, a description of the inference problem, a condition to be satisfied, and the expression we want to know the distribution of given that condition. Let us motivate the need for a *Mathematica* equivalent to the Church query operator by explaining other queries that are trivial to implement in *Mathematica* but that are not up to certain inference tasks.

Direct calculation is the most straightforward approach to conditional inference. However, sometimes we cannot directly compute the conditional likelihood, but instead have to sample the space. The easiest way to do so is rejection sampling, in which we generate a random sample for all random parameters to see if it meets the condition to be satisfied. If it does, its value is worth keeping as a sample of the distribution, and if it does not, we discard it entirely, proceeding until we are satisfied that we have found the distribution we intend.

There is a problem with rejection sampling, namely that much of the potential model space might be highly unlikely and that we are throwing away most of the samples. Instead of doing that, we can start at a random place but then, at each step, use that sample to find a good sample for the underlying distribution [12]. So, for a sample , we are interested in constructing a transition operator yielding a new sample , and constructing that operator such that for the underlying distribution , the transition operator is invariant with respect to distribution , or in other words, that the transition operator forms a Markov chain. For our transition operator, we first choose to generate a random proposal, , where a simple choice is the normally distributed variation along all parameters , and then accept that proposal with likelihood , so that we are incorporating less-likely samples at exactly the rate the underlying distribution would provide. After some initial samples of random value, we will have found the region for which the invariance property holds. Due to the use of applying random numbers to a Markov chain, this algorithm is called Markov chain Monte Carlo, or MCMC.

The following procedure is intended to be the simplest possible implementation of MCMC using memoization (for further considerations see [13, 14]). There is a trade-off in the selection of , such that if it is too large, we rarely accept anything and would effectively be undertaking rejection sampling, but if it is too small, we tend to stay in a very local area of the algorithm. One way to manage this trade-off is to control by aiming for a given rejection rate, which is undertaken here.

We now see why we constructed the CRP functions to accept random number functions: it lets us create evaluation functions suitable for MCMC.

Let us look to see how we might apply these examples. First, we are going to look at the infinite relational model, which demonstrates how to use the CRP to learn underlying categories from relations. Then, we will look at learning arithmetic expressions based upon particular inputs and outputs, which demonstrates using probabilistic programming in a recursive setting.

Suppose we are given some set of relations in the form of predicates, and we want to infer category memberships based on those relations. The infinite relational model (IRM) can construct infinite category models for processing arbitrary systems of relational data [15]. Suppose now we have some specific instances of objects, , and a few specific statements about whether a given -ary relation, , holds between them or not. Given a prior of how concentrated categories are, , and a prior for the sharpness of a relation to holding between members of various types, , we would like to learn categories and relational likelihoods , such that we can infer category memberships for objects and the likelihood of unobserved relationships holding between them, which corresponds to the following model structure.

Below we provide a sampler to calculate this, which first sets up memoization for category assignment, next memoization for relational likelihood, and then a function for first evaluating the object to category sampling and then the predicate/category sampling. Then, the function merely calculates the likelihood sampling along the way, returning the object to category memberships and the likelihood.

To understand how this works, let us look at an example. Suppose that there are two elementary schools, one for girls and one for boys, and that they both join together for high school. However, there is a new teacher who does not know this about the composition of incoming classes. This teacher finds another teacher and asks if they know who knows whom. This more experienced teacher says yes, but not why, and the younger teacher asks a series of questions about who knows whom, to the confusion of the older teacher, who does not understand why the younger teacher does not know (we have all had conversations like this). One potential set of such questions might yield the following answers. Notice that there is a deficiency in these questions; namely, the new teacher never asks if a boy knows a girl.

Given these samples, let us now perform a Metropolis-Hastings query to see if we can recover these categories. The result of a particular sample is a list of rules, where the left side of each rule is the likelihood of the given predicates to hold given the sampled model, and the right side is the sampled model in a two-element list. In this two-element list characterizing the sampled model, the first element is the result as provided by the evaluation method, and the second element contains the random values parameterizing the model.

Given these samples, let us now find a list of categories that fit them. Normalize the weight of each example by its likelihood, filter out the sampling information, and gather the common results together.

Removing the specific category assignments and determining for each person whether they are in the same category as each other, we see that we have a complete and accurate estimate for who knows whom.

There is no more idiomatic example of probabilistic programming than probabilistically generated programs. Here, we show how to implement learning simple arithmetic expressions. A Church program for undertaking this is as follows [16]. First, it defines a function for equality that is slightly noisy, creating a gradient that is easier to learn than strict satisfaction. Next, it creates a random arithmetic expression of nested addition and subtraction with a single variable and integers from 0 to 10 as terminals. Then, it provides a utility for evaluating symbolically constructed expressions. Finally, it demonstrates sampling a program with two results that are consistent with adding 2 to the input.

Let us now construct an equivalent

Next, here is an equivalent program for generating random programs. By recursively indexing each potential branch of the program, we can assure that common random number and MCMC algorithms will correctly assign a random number corresponding to that exact part of the potential program. We also explicitly limit the size of the tree.

Now we make a Metropolis-Hastings query and process the results down to the found expression and its calculated likelihood.

Given only one example, we cannot tell very much, but are pleased that the simplest, yet correct, function is the one rated most likely. Interestingly, the first six expressions are valid despite the noisy success condition.

With two inputs, the only viable expression is the one found.

We have now seen how to implement nonparametric Bayesian inference with *Mathematica*’s memoization features. Nonparametric Bayesian inference extends Bayesian inference to processes, allowing for the consideration of factors that are not directly observable, creating flexible mixture models with similar advantages to flexible data structures. We see that *Mathematica*’s capacity for memoization allows for the implementation of nonparametric sample generation and for Markov chain sampling. This capacity was then demonstrated with two examples, one for discovering the categories underlying particular observed relations and the other for generating functions that matched given results.

Probabilistic programming is a great way to undertake nonparametric Bayesian inference, but one should not confuse language-specific constructs with the language features that allow one to undertake it profitably. Through *Mathematica*’s memoization capabilities, it is readily possible to make inferences over flexible probabilistic models.

[1] | DARPA. “Probabilistic Programming for Advancing Machine Learning (PPAML).” Solicitation Number: DARPA-BAA-13-31. (Aug 8, 2013) www.fbo.gov/utils/view?id=a7bdf07d124ac2b1dda079de6de2eb78. |

[2] | B. Cronin. “What Is Probabilistic Programming?” O’Reilly Radar (blog). (Aug 8, 2013) radar.oreilly.com/2013/04/probabilistic-programming.html. |

[3] | D. Fidler, “Foresight Defined as a Component of Strategic Management,” Futures, 43(5), 2011 pp. 540-544. doi:10.1016/j.futures.2011.02.005. |

[4] | C. Kemp, A. Perfors, and J. B. Tenenbaum, “Learning Overhypotheses with Hierarchical Bayesian Models,” Developmental Science, 10(3), 2007 pp. 307-321.doi:10.1111/j.1467-7687.2007.00585.x. |

[5] | M. I. Jordan, “Bayesian Nonparametric Learning: Expressive Priors for Intelligent Systems,” in Heuristics, Probability, and Causality: A Tribute to Judea Pearl, (R. Dechter, H. Geffner, and J. Y. Halpern, eds.) London: College Publications, 2010. |

[6] | J. Pitman, Combinatorial Stochastic Processes (Lecture Notes in Mathematics 1875), Berlin: Springer-Verlag, 2006. |

[7] | A. Law and D. Kelton, Simulation Modeling and Analysis, 3rd ed., Boston: McGraw-Hill, 2000. |

[8] | T. Griffiths and Z. Ghahramani, “Infinite Latent Feature Models and the Indian Buffet Process,” in Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems (NIPS 18), Whistler, Canada, 2004. books.nips.cc/papers/files/nips18/NIPS2005_0130.pdf. |

[9] | R. Thibaux and M. I. Jordan, “Hierarchical Beta Processes and the Indian Buffet Process,” in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 2007), San Juan, Puerto Rico, 2007. jmlr.org/proceedings/papers/v2/thibaux07a/thibaux07a.pdf. |

[10] | N. D. Goodman. “The Principles and Practice of Probabilistic Programming,” in Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, (POPL 2013) Rome, Italy, 2013 pp. 399-402. doi:10.1145/2429069.2429117. |

[11] | N. D. Goodman, V. K. Mansinghka, D. Roy, K. Bonawitz, and J. B. Tenenbaum, “Church: A Language for Generative Models,” in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008), Helsinki, Finland, 2008. www.auai.org/uai2008/UAI_camera_ready/goodman.pdf. |

[12] | I. Murray. Markov Chain Monte Carlo [video]. (Aug 8, 2013) videolectures.net/mlss09uk_murray_mcmc. |

[13] | D. J. C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge, UK: Cambridge University Press, 2003. www.inference.phy.cam.ac.uk/itila/book.html. |

[14] | C. Robert and G. Casella, Monte Carlo Statistical Methods, 2nd ed, New York: Springer, 2004. |

[15] | C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda, “Learning Systems of Concepts with an Infinite Relational Model,” in Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, MA, 2006.www.aaai.org/Papers/AAAI/2006/AAAI06-061.pdf. |

[16] | N. D. Goodman, J. B. Tenenbaum, T. J. O’Donnell, and the Church Working Group. “Probabilistic Models of Cognition.” (Aug 8, 2013) projects.csail.mit.edu/church/wiki/Probabilistic_Models_of _Cognition. |

J. Cassel, “Probabilistic Programming with Stochastic Memoization,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-1. |

John Cassel works with Wolfram|Alpha, where his primary focus is knowledge representation problems. He maintains interests in real-time discovery, planning, and knowledge-representation problems in risk governance and engineering design. Cassel holds a Master of Design in Strategic Foresight and Innovation from OCADU, where he developed a novel research methodology for the risk governance of emerging technologies.

**John Cassel**

Wolfram|Alpha LLC

100 Trade Center Drive

Champaign, IL 61820-7237

*jcassel@wolfram.com*