The Karush-Kuhn-Tucker equations (under suitable assumptions) provide necessary and sufficient conditions for the solution of the problem of maximizing (minimizing) a concave (convex) function.

For an excellent reference, see the tutorial in [2]. Here we modify the code of [1] by correcting minor typos, simplifying, and letting the user specify restrictions on the exogenous parameters of the model.

The inputs of `KT` are the objective function to be maximized, the list of constraints, and the list of choice variables. Here is an example from consumer choice theory: maximize a utility function, subject to a budget constraint.

Several of the solutions do not make economic sense, because they do not use the fact that the income, price of good , and price of good are all positive. However, `KT` lets the user specify restrictions on the exogenous parameters of the model.

An important advantage of `KT` over other optimization functions (such as `Maximize` or `Minimize`) is that `KT` returns the value of the Kuhn-Tucker multipliers. These multipliers have an important economic interpretation: they are shadow prices for the constrained resources. In the above example, for instance, the value of is the “infinitesimal” increment in the utility function of the consumer that is generated when the budget constraint is relaxed by increasing the consumer’s income by an “infinitesimal amount.”

Nash equilibrium is the main solution concept in game theory. It is a crucial tool for economics and political science models. Essentially, a Nash equilibrium is a profile of strategies (one strategy for each player), such that if a player takes the choices of the others as given (i.e. as parameters), then the player’s strategy must maximize his or her payoff.

The function `Nash` takes as input the payoff function of player 1, the payoff function of player 2, and the actions available to players 1 and 2. It returns the entire set of Nash equilibria.

There are many versions of Colonel Blotto’s game; this is a simple one taken from [3]. General A (row player) has three divisions to defend a city; she has to choose how many divisions to place at the north road and how many divisions at the south road. General B (column player) has two divisions to try to invade the city; he also has to choose how many divisions to be assigned to the north road and how many to the south road. If General A has at least as many divisions as General B at a given road, General A wins the battle there (defense is favored in the case of a tie). To win the game, however, A must defeat B on both battlefields. Thus, A has four possible strategies and B has three strategies. The table below summarizes the players’ strategies and payoffs (, for the whole campaign). For example, in the first row and first column the entry is , which means A won and B lost; A chose three divisions for the north road and none for the south road; B chose two for the north and none for the south. Because and , A won both battles.

A Nash equilibrium for this game is a probability distribution over strategies; use `P` for the probabilities chosen by General A and `Q` for the probabilities chosen by General B.

The game has many Nash equilibria, but we still can make predictions: General B is never going to spread his forces evenly (the probability of his second strategy is zero in any equilibrium, ); with probability , B’s two divisions are placed at the north road () and with probability , they are placed at the south road (). As for General A, the probability that she places all of her three divisions on one front is less than half (i.e. and ). Also, the probability that General A places two or more divisions at the north (or south) is always equal to half (i.e. and ).

This game is also borrowed from [3]. A deck has two cards, one high and one low. Each player places one dollar into the pot. Player 1 gets one card from the deck. Player 2 does not see Player 1’s card. Player 1 decides whether to raise (by placing another dollar in the pot) or not raise. Player 2 observes 1’s action and then has to decide whether to match the bet or fold. If Player 2 folds, then Player 1 wins the contents of the pot. However, if Player 2 matches, Player 2 places another dollar into the pot if Player 1 had previously raised. Player 1 reveals her card. If it is the high card, Player 1 wins the pot; otherwise, Player 2 wins it.

See Figure 1 for the corresponding game tree. We introduce a fictitious player, Nature, who randomly decides if the card is high or low. We depict the bimatrix representation of the game. Player 1 has four strategies: always raise (RR), always not raise (NN), raise if the card is high and not otherwise (RN), and not raise if the card is high and raise otherwise (NR). Player 2 also has four strategies: always match (MM), always fold (FF), match only if Player 1 raised (MF), and fold only if Player 1 raised (FM). For simplicity, in the bimatrix representation, we write the expected payoffs of Player 1 and omit Player 2’s payoffs (this is without loss of generality in zero-sum games).

**Figure 1.** Game tree of the card game.

In this case, the Nash equilibrium delivers a sharp prediction. When Player 1 has the high card, she always raises (), but when she has the low card, she bluffs with probability (the probability of RR is ). When Player 1 does not raise, Player 2 always matches (). If Player 1 raises, Player 2 still may match, but with probability (the probability of always matching MM is ).

We extended the code of [1] to solve for Kuhn-Tucker conditions with additional assumptions on parameters and, more importantly, using the Kuhn-Tucker equations we provide a program to compute all the Nash equilibria of finite bimatrix games.

We presented a program to compute the set of all Nash equilibria in finite bimatrix games. Its intended goal is as a classroom tool for students and instructors. Needless to say, the code is not efficient. For larger inputs (say bimatrix games with five or more actions per player), `Reduce` often fails to solve the system of Kuhn-Tucker equations. For optimizing algorithms, we suggest [4]. Nevertheless, with continuous improvement of hardware and algorithms for solving semialgebraic systems (see [5]), these methods may become useful for research applications sooner than we think. Finally, as algorithmic game theory courses become more popular in computer science departments, it seems that the time to bring computational methods and algorithms to economics departments is already overdue.

[1] | F. J. Kampas, “Tricks of Using Reduce to Solve Khun-Tucker Equations,” The Mathematica Journal, 9(4), 2005 pp. 686-689.www.mathematica-journal.com/issue/v9i4/contents/Tricks9-4/Tricks9-4_ 2.html. |

[2] | M. J. Osborne. “Optimization: The Kuhn-Tucker Conditions for Problems with Inequality Constraints,” from Mathematical Methods for Economic Theory: A Tutorial. (Jan 8, 2014)www.economics.utoronto.ca/osborne/MathTutorial/MOIF.HTM. |

[3] | M. Osborne, “An Introduction to Game Theory,” New York: Oxford University Press, 2004. |

[4] | R. D. McKelvey, A. M. McLennan, and T. L. Turocy. “Gambit: Software Tools for Game Theory.” (Jan 8, 2014) www.gambit-project.org. |

[5] | Wolfram Research, “Real Polynomial Systems” from Wolfram Mathematica Documentation Center—A Wolfram Web Resource.reference.wolfram.com/mathematica/tutorial/RealPolynomialSystems.html. |

S. O. Parreiras, “Using Reduce to Compute Nash Equilibria,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-3. |

Sérgio O. Parreiras is an associate professor at the UNC-Chapel Hill Department of Economics. His research focus is on game theory and its applications to auctions, mechanism design, and contests. He is also interested in computational economics, general equilibrium theory, algorithmic game theory, and evolutionary anthropology.

**Sérgio O. Parreiras**

*UNC, Department of Economics
Gardner Hall, 200B
Chapel Hill, N.C. 27599-3305
*

The infinite Fibonacci word,

is certainly one of the most studied words in the field of combinatorics on words [1-4]. It is the archetype of a Sturmian word [5]. The word can be associated with a fractal curve with combinatorial properties [6-7].

This article implements *Mathematica* programs to generate curves from and a set of drawing rules. These rules are similar to those used in L-systems.

The outline of this article is as follows. Section 2 recalls some definitions and ideas of combinatorics on words. Section 3 introduces the Fibonacci word, its fractal curve, and a family of words whose limit is the Fibonacci word fractal. Finally, Section 4 generalizes the Fibonacci word and its Fibonacci word fractal.

The terminology and notation are mainly those of [5] and [8]. Let be a finite alphabet, whose elements are called symbols. A word over is a finite sequence of symbols from . The set of all words over , that is, the free monoid generated by , is denoted by . The identity element of is called the empty word. For any word , denotes its length, that is, the number of symbols occurring in . The length of is taken to be zero. If and , then denotes the number of occurrences of in .

For two words and in , denote by the concatenation of the two words, that is, . If , then ; moreover, by denote the word ( times). A word is a subword (or factor) of if there exist such that . If , then and is called a prefix of ; if , then and is called a suffix of .

The reversal of a word is the word and . A word is a palindrome if .

An infinite word over is a map , written as . The set of all infinite words over is denoted by .

**Example 1**

The word , where if is a prime number and otherwise, is an example of an infinite word. The word is called the characteristic sequence of the prime numbers. Here are the first 50 terms of .

**Definition 1**

There is a special class of words with many remarkable properties, the so-called Sturmian words. These words admit several equivalent definitions (see, e.g. [5], [8]).

**Definition 2**

Let . Let , the complexity function of , be the map that counts, for all integer , the number of subwords of length in . An infinite word is a Sturmian word if for all integer .

For example, .

Since for any Sturmian word, , Sturmian words have to be over two symbols. The word in example 1 is not a Sturmian word because .

Given two real numbers , with irrational and , , define the infinite word as . The numbers and are the slope and the intercept, respectively. This word is called mechanical. The mechanical words are equivalent to Sturmian words [5]. As a special case, gives the characteristic words.

**Definition 3**

On the other hand, note that every irrational has a unique continued fraction expansion

where each is a positive integer. Let be an irrational number with and for . To the directive sequence , associate a sequence of words defined by , , , .

Such a sequence of words is called a standard sequence. This sequence is related to characteristic words in the following way. Observe that, for any , is a prefix of , which gives meaning to as an infinite word. In fact, one can prove that each is a prefix of for all and [5].

**Definition 4**

Fibonacci words are words over defined inductively as follows: , , and , for . The words are referred to as the finite Fibonacci words. The limit

(1) |

It is clear that , where is the Fibonacci number, recalling that the Fibonacci number is defined by the recurrence relation for all integer and with initial values . The infinite Fibonacci word is a Sturmian word [5]; exactly, , where is the golden ratio.

Here are the first 50 terms of .

**Definition 5**

The Fibonacci word satisfies and for all .

Here are the first nine finite Fibonacci words.

**Definition 6**

The following proposition summarizes some basic properties about the Fibonacci word.

**Proposition 1**

- The words 11 and 000 are not subwords of the Fibonacci word.
- Let be the last two symbols of . For , if is even and if is odd.
- The concatenation of two successive Fibonacci words is almost commutative; that is, and have a common prefix of length , for all .
- is a palindrome for all .
- For all , , where ; that is, exchanges the two last symbols of .

The Fibonacci word can be associated with a curve using a drawing rule. A particular action follows on the symbol read (this is the same idea as that used in L-systems [9]). In this case, the drawing rule is called “*the odd-even drawing rule*” [7].

**Definition 7**

The Fibonacci curve, denoted by , is the result of applying the odd-even drawing rule to the word . The Fibonacci word fractal is defined as

The program `LShow` is adapted from [10] to generate L-systems.

Figure 1 shows an L-system interpretation of the odd-even drawing rule.

**Figure 1. **Interpretation of the odd-even drawing rule.

Here are the curves for .

The next proposition about properties of the curves and comes directly from the properties of the Fibonacci word from Proposition 1. More properties can be found in [7].

**Proposition 2**

- is composed only of segments of length 1 or 2.
- The number of turns in the curve is the Fibonacci number .
- The curve is similar to the curve .
- The curve is symmetric.
- The curve is composed of five curves: , where is the result of applying the odd-even drawing rule to the word .

The next figure shows the curve and the five curves; here .

The Fibonacci word and other words can be derived from the dense Fibonacci word, which was introduced in [7].

**Definition 8**

(2) |

Given a drawing rule, the global angle is the sum of the successive angles generated by the word through the rule. With the natural drawing rule, , , , then .

For a drawing rule, the resulting angle of a word is the function that gives the global angle. A morphism preserves the resulting angle if for any word , ; moreover, a morphism inverts the resulting angle if for any word , .

The dense Fibonacci word is strongly linked to the Fibonacci word fractal because can generate a whole family of curves whose limit is the Fibonacci word fractal [7]. All that is needed is to apply a morphism to that preserves or inverts the resulting angle.

Here are some examples.

Here are some examples with other angles.

This section introduces a generalization of the Fibonacci word and the Fibonacci word fractal [11].

**Definition 9**

The 2-Fibonacci word is the classical Fibonacci word. Here are the first six -Fibonacci words.

The following proposition relates the Fibonacci word to .

**Proposition 3**

(3) |

**Definition 10**

The -Fibonacci numbers are the Fibonacci numbers and the -Fibonacci numbers are the Fibonacci numbers shifted by one. The following table shows the first terms in the sequences and their reference numbers in the On-Line Encyclopedia of Sequences (OIES) [12].

**Proposition 4**

- The word 11 is not a subword of the -Fibonacci word, .
- Let be the last two symbols of . For , if is even and if is odd, .
- The concatenation of two successive -Fibonacci words is almost commutative; that is, and have a common prefix of length for all and .
- is a palindrome for all .
- For all , , where .

**Theorem 1**

For the proof, see [11]. This theorem implies that -Fibonacci words are Sturmian words.

Note that

where is the golden ratio.

**Definition 11**

The Fibonacci curve, denoted by , is the result of applying the odd-even drawing rule to the word . The -Fibonacci word fractal is defined as

Here are the curves for .

**Proposition 5**

- The Fibonacci fractal is composed only of segments of length 1 or 2.
- The curve is similar to the curve .
- The curve is composed of five curves: .
- The curve is symmetric.
- The scale factor between and is .

This section applies the above ideas to generate new curves from characteristic words (see

Definition 3).

**Conjecture 1**

Here are seven examples.

The first author was partially supported by Universidad Sergio Arboleda under grant number USA-II-2012-14. The authors would like to thank Borut Jurčič-Zlobec from Ljubljana University for his help during the development of this article.

[1] | J. Cassaigne, “On Extremal Properties of the Fibonacci Word,” RAIRO—Theoretical Informatics and Applications, 42(4), 2008 pp. 701-715. doi:10.1051/ita:2008003. |

[2] | W. Chuan, “Fibonacci Words”, Fibonacci Quarterly, 30(1), 1992 pp. 68-76. www.fq.math.ca/Scanned/30-1/chuan.pdf. |

[3] | W. Chuan, “Generating Fibonacci Words,” Fibonacci Quarterly, 33(2), 1995 pp. 104-112. www.fq.math.ca/Scanned/33-2/chuan1.pdf. |

[4] | F. Mignosi and G. Pirillo, “Repetitions in the Fibonacci Infinite Word,” RAIRO—Theoretical Informatics and Applications, 26(3), 1992 pp. 199-204. |

[5] | M. Lothaire, Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications), Cambridge: Cambridge University Press, 2005. |

[6] | A. Blondin Massé, S. Brlek, A. Garon, and S. Labbé, “Two Infinite Families of Polyominoes That Tile the Plane by Translation in Two Distinct Ways,” Theoretical Computer Science, 412(36), 2011 pp. 4778-4786. doi:10.1016/j.tcs.2010.12.034. |

[7] | A. Monnerot-Dumaine, “The Fibonacci Word Fractal,” preprint, 2009. hal.archives-ouvertes.fr/hal-00367972/fr. |

[8] | J.-P. Allouche and J. Shallit, Automatic Sequences: Theory, Applications, Generalizations, Cambridge: Cambridge University Press, 2003. |

[9] | P. Prusinkiewicz and A. Lindenmayer, The Algorithmic Beauty of Plants, New York: Springer-Verlag, 1990. |

[10] | E. Weisstein. “Lindenmayer System” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/LindenmayerSystem.html. |

[11] | J. Ramírez, G. Rubiano, and R. de Castro, “A Generalization of the Fibonacci Word Fractal and the Fibonacci Snowflake,” 2013. arxiv:1212.1368v2. |

[12] | OEIS Foundation, Inc. “The On-Line Encyclopedia of Integer Sequences.” (Aug 9, 2013) oeis.org. |

J. L. Ramírez and G. N. Rubiano, “Properties and Generalizations of the Fibonacci Word Fractal,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-2. |

**José L. Ramírez**

*Instituto de Matemáticas y sus Aplicaciones
Universidad Sergio Arboleda
Calle 74 no. 14 – 14 Bogotá, Colombia*

**Gustavo N. Rubiano **

*Departamento de Matemáticas
Universidad Nacional de Colombia
AA 14490, Bogotá, Colombia*

Bayesian statistics are an orderly way of finding the likelihood of a model from data, using the likelihood of the data given the model. From spam detection to medical diagnosis, spelling correction to forecasting economic and demographic trends, Bayesian statistics have found many applications, and even praise as mental heuristics to avoid overconfidence. However, at first glance Bayesian statistics suffer from an apparent limit: they can only make inferences about known factors, bounded to conditions seen within the data, and have nothing to say about the likelihood of new phenomena [3]. In short, Bayesian statistics are apparently withheld to inferences about the parameters of the model they are provided.

Instead of taking priors over factors of the model itself, we can say that we are taking priors over factors in the process involving how the data was generated. These stochastic process priors give the modeler a way to talk about factors that have not been directly observed. These nonobservable factors include the likely rate at which further factors might be seen, given further observation and underlying categories or structures that might generate the data being observed. For example, in statistics problems we are often presented with drawing marbles of different colors from a bag, and given randomly drawn samples, we might talk about the most likely composition of the bag and the range of likely compositions. However, suppose we had a number of bags, and we drew two marbles each from three of them, discovering two red marbles, two green marbles, and two yellow marbles [4]. If we were to draw marbles from yet another bag, we might expect two marbles identical in color, of a color we have not previously observed. We do not know what this color is, and in this sense we have made a nonparametric inference about the process that arranged the marbles between bags.

The ability to talk about nonobserved parameters is a leap in expressiveness, as instead of explicitly specifying a model for all parameters, a model utilizing infinite processes expands to fit the given data. This should be regarded similarly to the advantages afforded by linked data structures in representing ordinary data. A linked list has a potentially infinite capacity; its advantage is not that we have an infinite memory, but an abstract flexibility to not worry too much about maintaining its size appropriately. Similarly, an infinite prior models the growth we expect to discover [5].

Here are two specific processes that are useful for a number of different problems. These two processes are good for modeling unknown discrete categories and sets of features, respectively. In both of these processes, suppose that we can take samples so that there are no dependencies in the order that we took them, or in other words that the samples are exchangeable. Both of these processes also make use of a concentration parameter, . As we look at more samples, we expect the number of new elements we discover to diminish, but not disappear, as our observations establish a lower frequency of occurrence for unobserved elements. The concentration parameter establishes the degree to which the proportions are concentrated, with low indicating a distribution concentrated on a few elements, and high indicating a more dispersed concentration.

First, let us look into learning an underlying system of categories. In a fixed set of categories of particular likelihood, the probability of a given sample in a particular category corresponds to the multinomial distribution, the multiparameter extension of the Bernoulli distribution. The conjugate prior, or the distribution that gives a Bayesian estimate of which multinomial distribution produced a given sample, is the Dirichlet distribution, itself the multivariable extension of the beta distribution. To create an infinite Dirichlet distribution, or rather a Dirichlet process, one can simply have a recursive form of the beta where the likelihood of a given category is . To use a Dirichlet process as a prior, it is easier to manipulate in the form of a Chinese restaurant process (CRP) [6]. Suppose we want to know the likelihood that the sample is a member of category . If the category is new, then that probability corresponds to the size of the concentration parameter in ratio to the count of the samples taken:

The implementation of this function is straightforward. The use of a parameterized random number function allows for the use of the algorithm in common random number comparison between simulation scenarios [7], as well as for estimation through Markov chain Monte Carlo, about which more will be said later.

In the second process, suppose we are interested in the sets of features observed in sets of examples. For example, suppose we go to an Indian food buffet and are unfamiliar with the dishes, so we observe the selected items that our fellow patrons have chosen. Supposing one overall taste preference, we might say that the likelihood of a dish’s being worth selecting is proportional to the number of times it was observed, but if there are not many examples we should also try some additional dishes that were not tried previously. This process, called the Indian buffet process [8], turns out to be equivalent to a beta process prior [9]. Suppose we want to know the likelihood of whether a given feature is going to be found in the sample. Then, the likelihoods can be calculated directly from other well-understood distributions:

Both of these processes are suitable as components in mixture models. Suppose we are conducting a phone poll of a city and ask the citizens we talk to about their concerns. Each person will report their various civic travails. We expect for each person to have their own varying issues, but also for there to be particular groups of concern for different neighborhoods and professional groups. In other words, we expect to see an unknown set of features emerge from an unknown set of categories. Then, we might use a CRPIBP mixture distribution to help learn those categories from the discovered feature sets.

Nonparametric inference tasks are particularly suited for computational support. What we would like to do is describe a space of potential mixture models that may describe the underlying data-generation processes and allow the inference of their likelihood without explicitly generating the potential structures of that space. Probabilistic programming is the use of language-specific support to aid in the process of statistical inference. This article shows that *Mathematica* has features that readily enable the sort of probabilistic programming that supports nonparametric inference.

Probabilistic programming is the use of language-specific support to aid in the process of statistical inference. Unlike statistical libraries, the structure of the programming language itself is used in the inference process [10]. Although *Mathematica* increasingly has the kinds of structures that support probabilistic programming, we are not going to focus on those features here. Instead, we will see how *Mathematica*’s natural capacity for memoization allows it to be very easily extended to write probabilistic programs that use stochastic memoization as a key abstraction. In particular, we are going to look at Church, a Lisp-variant with probabilistic query and stochastic memoization constructs [11]. Let us now explain stochastic memoization and then look at how to implement Metropolis-Hastings querying, which uses memoization to help implement Markov chain Monte Carlo-driven inference.

Stochastic memoization simply means remembering probabilistic events that have already occurred. Suppose we say that is the first flip of coin `c`. In the first call, it may return `Heads` or `Tails`, depending on a likelihood imposed to coin `c`, but in either case it is constrained in later calls to return the same value. Once undertaken, the value of a particular random event is determined.

In Church, this memoization is undertaken explicitly through its `mem` operator. Church’s flip function designates a Bernoulli trial with the given odds, with return values 0 and 1. Here is an example of a memoized fair coin flip in Church.

(define coinflip (mem (lambda (coin flip) (flip 0.5))))

*Mathematica* allows for a similar memoization by incorporating a `Set` within a `SetDelayed`.

Let us now look to a more complicated case. Earlier, we discussed the Dirichlet process. Church supports a DPmem operator for creating functions that when given a new example either returns a previously obtained sample according to the CRP or takes a new sample, depending upon the category assignment, and returns the previously seen argument. Here is a similar function in *Mathematica*, called `GenerateMemCRP`. Given a random function, we first create a memoized version of that function based on the category index of the CRP. Then, we create an empty initial CRP result, for which a new sample is created and memoized every time a new input is provided, potentially also resampling the provided function if a prediscovered category is provided.

For example, let us now take a sampling from categories that have a parameter distributed according to the standard normal distribution. Here we see outputs in a typical range for a standard normal, but with counts favoring resampling particular results according to the sampled frequency of the corresponding category.

Memoization implies that if we provide the same inputs, we get the same results.

Inference is the central operation of probabilistic programming. Conditional inference is implemented in Church through its various query operations. These queries uniformly take four sets of arguments: query algorithm-specific parameters, a description of the inference problem, a condition to be satisfied, and the expression we want to know the distribution of given that condition. Let us motivate the need for a *Mathematica* equivalent to the Church query operator by explaining other queries that are trivial to implement in *Mathematica* but that are not up to certain inference tasks.

Direct calculation is the most straightforward approach to conditional inference. However, sometimes we cannot directly compute the conditional likelihood, but instead have to sample the space. The easiest way to do so is rejection sampling, in which we generate a random sample for all random parameters to see if it meets the condition to be satisfied. If it does, its value is worth keeping as a sample of the distribution, and if it does not, we discard it entirely, proceeding until we are satisfied that we have found the distribution we intend.

There is a problem with rejection sampling, namely that much of the potential model space might be highly unlikely and that we are throwing away most of the samples. Instead of doing that, we can start at a random place but then, at each step, use that sample to find a good sample for the underlying distribution [12]. So, for a sample , we are interested in constructing a transition operator yielding a new sample , and constructing that operator such that for the underlying distribution , the transition operator is invariant with respect to distribution , or in other words, that the transition operator forms a Markov chain. For our transition operator, we first choose to generate a random proposal, , where a simple choice is the normally distributed variation along all parameters , and then accept that proposal with likelihood , so that we are incorporating less-likely samples at exactly the rate the underlying distribution would provide. After some initial samples of random value, we will have found the region for which the invariance property holds. Due to the use of applying random numbers to a Markov chain, this algorithm is called Markov chain Monte Carlo, or MCMC.

The following procedure is intended to be the simplest possible implementation of MCMC using memoization (for further considerations see [13, 14]). There is a trade-off in the selection of , such that if it is too large, we rarely accept anything and would effectively be undertaking rejection sampling, but if it is too small, we tend to stay in a very local area of the algorithm. One way to manage this trade-off is to control by aiming for a given rejection rate, which is undertaken here.

We now see why we constructed the CRP functions to accept random number functions: it lets us create evaluation functions suitable for MCMC.

Let us look to see how we might apply these examples. First, we are going to look at the infinite relational model, which demonstrates how to use the CRP to learn underlying categories from relations. Then, we will look at learning arithmetic expressions based upon particular inputs and outputs, which demonstrates using probabilistic programming in a recursive setting.

Suppose we are given some set of relations in the form of predicates, and we want to infer category memberships based on those relations. The infinite relational model (IRM) can construct infinite category models for processing arbitrary systems of relational data [15]. Suppose now we have some specific instances of objects, , and a few specific statements about whether a given -ary relation, , holds between them or not. Given a prior of how concentrated categories are, , and a prior for the sharpness of a relation to holding between members of various types, , we would like to learn categories and relational likelihoods , such that we can infer category memberships for objects and the likelihood of unobserved relationships holding between them, which corresponds to the following model structure.

Below we provide a sampler to calculate this, which first sets up memoization for category assignment, next memoization for relational likelihood, and then a function for first evaluating the object to category sampling and then the predicate/category sampling. Then, the function merely calculates the likelihood sampling along the way, returning the object to category memberships and the likelihood.

To understand how this works, let us look at an example. Suppose that there are two elementary schools, one for girls and one for boys, and that they both join together for high school. However, there is a new teacher who does not know this about the composition of incoming classes. This teacher finds another teacher and asks if they know who knows whom. This more experienced teacher says yes, but not why, and the younger teacher asks a series of questions about who knows whom, to the confusion of the older teacher, who does not understand why the younger teacher does not know (we have all had conversations like this). One potential set of such questions might yield the following answers. Notice that there is a deficiency in these questions; namely, the new teacher never asks if a boy knows a girl.

Given these samples, let us now perform a Metropolis-Hastings query to see if we can recover these categories. The result of a particular sample is a list of rules, where the left side of each rule is the likelihood of the given predicates to hold given the sampled model, and the right side is the sampled model in a two-element list. In this two-element list characterizing the sampled model, the first element is the result as provided by the evaluation method, and the second element contains the random values parameterizing the model.

Given these samples, let us now find a list of categories that fit them. Normalize the weight of each example by its likelihood, filter out the sampling information, and gather the common results together.

Removing the specific category assignments and determining for each person whether they are in the same category as each other, we see that we have a complete and accurate estimate for who knows whom.

There is no more idiomatic example of probabilistic programming than probabilistically generated programs. Here, we show how to implement learning simple arithmetic expressions. A Church program for undertaking this is as follows [16]. First, it defines a function for equality that is slightly noisy, creating a gradient that is easier to learn than strict satisfaction. Next, it creates a random arithmetic expression of nested addition and subtraction with a single variable and integers from 0 to 10 as terminals. Then, it provides a utility for evaluating symbolically constructed expressions. Finally, it demonstrates sampling a program with two results that are consistent with adding 2 to the input.

Let us now construct an equivalent

Next, here is an equivalent program for generating random programs. By recursively indexing each potential branch of the program, we can assure that common random number and MCMC algorithms will correctly assign a random number corresponding to that exact part of the potential program. We also explicitly limit the size of the tree.

Now we make a Metropolis-Hastings query and process the results down to the found expression and its calculated likelihood.

Given only one example, we cannot tell very much, but are pleased that the simplest, yet correct, function is the one rated most likely. Interestingly, the first six expressions are valid despite the noisy success condition.

With two inputs, the only viable expression is the one found.

We have now seen how to implement nonparametric Bayesian inference with *Mathematica*’s memoization features. Nonparametric Bayesian inference extends Bayesian inference to processes, allowing for the consideration of factors that are not directly observable, creating flexible mixture models with similar advantages to flexible data structures. We see that *Mathematica*’s capacity for memoization allows for the implementation of nonparametric sample generation and for Markov chain sampling. This capacity was then demonstrated with two examples, one for discovering the categories underlying particular observed relations and the other for generating functions that matched given results.

Probabilistic programming is a great way to undertake nonparametric Bayesian inference, but one should not confuse language-specific constructs with the language features that allow one to undertake it profitably. Through *Mathematica*’s memoization capabilities, it is readily possible to make inferences over flexible probabilistic models.

[1] | DARPA. “Probabilistic Programming for Advancing Machine Learning (PPAML).” Solicitation Number: DARPA-BAA-13-31. (Aug 8, 2013) www.fbo.gov/utils/view?id=a7bdf07d124ac2b1dda079de6de2eb78. |

[2] | B. Cronin. “What Is Probabilistic Programming?” O’Reilly Radar (blog). (Aug 8, 2013) radar.oreilly.com/2013/04/probabilistic-programming.html. |

[3] | D. Fidler, “Foresight Defined as a Component of Strategic Management,” Futures, 43(5), 2011 pp. 540-544. doi:10.1016/j.futures.2011.02.005. |

[4] | C. Kemp, A. Perfors, and J. B. Tenenbaum, “Learning Overhypotheses with Hierarchical Bayesian Models,” Developmental Science, 10(3), 2007 pp. 307-321.doi:10.1111/j.1467-7687.2007.00585.x. |

[5] | M. I. Jordan, “Bayesian Nonparametric Learning: Expressive Priors for Intelligent Systems,” in Heuristics, Probability, and Causality: A Tribute to Judea Pearl, (R. Dechter, H. Geffner, and J. Y. Halpern, eds.) London: College Publications, 2010. |

[6] | J. Pitman, Combinatorial Stochastic Processes (Lecture Notes in Mathematics 1875), Berlin: Springer-Verlag, 2006. |

[7] | A. Law and D. Kelton, Simulation Modeling and Analysis, 3rd ed., Boston: McGraw-Hill, 2000. |

[8] | T. Griffiths and Z. Ghahramani, “Infinite Latent Feature Models and the Indian Buffet Process,” in Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems (NIPS 18), Whistler, Canada, 2004. books.nips.cc/papers/files/nips18/NIPS2005_0130.pdf. |

[9] | R. Thibaux and M. I. Jordan, “Hierarchical Beta Processes and the Indian Buffet Process,” in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 2007), San Juan, Puerto Rico, 2007. jmlr.org/proceedings/papers/v2/thibaux07a/thibaux07a.pdf. |

[10] | N. D. Goodman. “The Principles and Practice of Probabilistic Programming,” in Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, (POPL 2013) Rome, Italy, 2013 pp. 399-402. doi:10.1145/2429069.2429117. |

[11] | N. D. Goodman, V. K. Mansinghka, D. Roy, K. Bonawitz, and J. B. Tenenbaum, “Church: A Language for Generative Models,” in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008), Helsinki, Finland, 2008. www.auai.org/uai2008/UAI_camera_ready/goodman.pdf. |

[12] | I. Murray. Markov Chain Monte Carlo [video]. (Aug 8, 2013) videolectures.net/mlss09uk_murray_mcmc. |

[13] | D. J. C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge, UK: Cambridge University Press, 2003. www.inference.phy.cam.ac.uk/itila/book.html. |

[14] | C. Robert and G. Casella, Monte Carlo Statistical Methods, 2nd ed, New York: Springer, 2004. |

[15] | C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda, “Learning Systems of Concepts with an Infinite Relational Model,” in Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, MA, 2006.www.aaai.org/Papers/AAAI/2006/AAAI06-061.pdf. |

[16] | N. D. Goodman, J. B. Tenenbaum, T. J. O’Donnell, and the Church Working Group. “Probabilistic Models of Cognition.” (Aug 8, 2013) projects.csail.mit.edu/church/wiki/Probabilistic_Models_of _Cognition. |

J. Cassel, “Probabilistic Programming with Stochastic Memoization,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-1. |

John Cassel works with Wolfram|Alpha, where his primary focus is knowledge representation problems. He maintains interests in real-time discovery, planning, and knowledge-representation problems in risk governance and engineering design. Cassel holds a Master of Design in Strategic Foresight and Innovation from OCADU, where he developed a novel research methodology for the risk governance of emerging technologies.

**John Cassel**

Wolfram|Alpha LLC

100 Trade Center Drive

Champaign, IL 61820-7237

*jcassel@wolfram.com*

We study well-known chocolate games and new ones that we have invented. Most of the games presented here have simple formulas for the set of losing states, but many do not, and it is interesting that those have very beautiful graphs for the set of losing states.

The setup for Nim is a set of heaps that contain objects. For example, three heaps might contain three, four, and five objects. Two players alternate in taking any positive number of objects away from a single heap. The player who takes the last object loses.

Chocolate games generalize Nim to two dimensions. In this section we study well-known chocolate games and variations created by the authors. The chocolate games are simple to play, but they are interesting mathematically.

**Definition 1**

In a rectangular piece of chocolate made of squares, the brown parts are sweet and a single blue square is very bitter. Two players play alternately. To move, a player breaks the chocolate in a straight line along one of the grooves and eats the rectangular piece broken off. The player who takes the bitter square loses. The same game can be played with a rectangular box made of cubes.

There are other types of chocolate games; one of the most well-known is Chomp [1]. Again the chocolate is a rectangle made of squares, with the top-left square being very bitter. To move, a player removes a square and all the squares below it and to its right. Many people have studied this game and many interesting theories about it have been developed.

Each chocolate game treated in this article satisfies an inequality, so they are very different from Chomp, and for certain types of inequalities the mathematical structure of the winning strategy is very simple.

Play the chocolate game in Figure 1 with “nim-sum” unchecked. Choose the values of , , and then click “new game.” Each time you click a line (not a square), you break the chocolate into two pieces along that line and eat the piece without the bitter square [2]. The game coordinates , , are the number of squares to the left, above, and to the right of the blue square, respectively.

In chocolate games there are two kinds of states.

**Definition 2**

A winning state is a state from which we can force a win as long as we play correctly at every stage. A losing state is a state from which we lose no matter how well we play, if our opponent plays correctly.

The mathematical structure of chocolate game 1 is well-known, and there is a formula to calculate losing states.

**Proposition 1**

The state is a losing state of chocolate game 1 if and only if , where is the bitwise exclusive-or operation (the same as `BitXor` in *Mathematica*).

For a proof, see [2].

Once we know the formula for losing states, the strategy to win is clear. Suppose that you have a winning state. Then you can choose a move that gives your opponent a losing state. Any move by your opponent is a winning state for you, and you can always move to a losing state again. Finally, you reach and win the game.

You can use this strategy in chocolate game 1 by selecting “nim-sum.” For each state you see , and you know if the state is a losing state or a winning state.

In the box chocolate game, the bitter (blue) cube is at the bottom of the 3D chocolate; to see it, drag to rotate.

Chocolate game 2 is a little more complicated than chocolate game 1, but the mathematical structure is almost the same.

In chocolate game 2, there are five controls. (You cannot cut the chocolate by clicking a face.)

**Proposition 2**

For a proof, see [2].

Here are some new chocolate games.

In chocolate game 3, you can cut the chocolate in three ways: to the left or right of the blue piece, or above it. The coordinates , , and are the maximum number of times you can cut the chocolate in those directions.

The geometry is such that clicking lines with low values of also deletes lines, so that the coordinates always satisfy ; that is, .

**Proposition 3**

In fact this proposition is valid for chocolate games with the inequality for even . Although Proposition 3 is very similar to Proposition 1, their proofs are very different. For proofs, see [3] and [4].

Chocolate game 4 is the same as chocolate game 3, but you can vary . In this game ; that is, for any state.

The authors are the first researchers who studied chocolate games whose coordinates satisfy inequalities.

For certain types of inequalities, the mathematical structure of the set of losing states is very similar to that of the well-known games of Nim and rectangular chocolate, but for most inequalities the mathematical structure of the set of losing states is very different.

The authors could not find any simple formula for the losing states of triangular chocolate games whose coordinates satisfy the inequality . They present a conjecture for the formulas of losing states in Section 3 of [5]. It seems that chocolate games whose coordinates satisfy for odd do not have simple formulas for the set of losing states, because the set of losing states has very complicated graphs. In Figure 9 you can see graphs made by the set of losing states; for even the graph is a Sierpinski gasket and for odd the graph is complicated. As for the graphs made by losing states, see Figure 6.

**Conjecture 1**

For a triangular chocolate game satisfying , is a losing state if and only if when (mod 4), and is a losing state if and only if when (mod 4).

This conjecture is Proposition 4 when and is proved for in [6].

For other types of chocolate games that the authors studied, see [7], [6], [8], and [9].

**Proposition 4**

For the triangular chocolate game that satisfies the inequality , is a losing state if and only if .

For a proof, see [10].

**Conjecture 2**

For the chocolate game that satisfies the inequalities and , the state is a losing state of this chocolate game if and only if .

The authors have proved but not published this proposition.

Here the authors present a *Mathematica* program to calculate losing states of the chocolate game whose coordinates satisfy the inequality . First define the function for each state of the chocolate and the function , where is a set of positive integers.

**Definition 3**

**Definition 4**

The function is the smallest non-negative integer that is not an element of the set (the “minimum excluded ordinal”).

*Example 1*

and .

This defines the Grundy number recursively.

**Definition 5**

**Proposition 5**

This is a well-known fact in the field of combinatorial games.

This game has two coordinates with the inequality ; and are the height and width of the chocolate when we ignore the blue part.

See Figure 8; here is the upper bound of the number of the rows and columns you can remove. Below the game is the table of Grundy numbers .

**Conjecture 3**

- , for any natural number .
- If , then .

By Proposition 5, is a losing state if and only if the Grundy number .

Here is a *Mathematica* program to calculate losing states using Grundy numbers. Here . This *Mathematica* program comes from [11]. The definition for `mex` is as in the code for Figure 8. The definitions for `move3` and `Gr3` are modifications of `move` and `Gr` from that code, so we change their names slightly.

These are the candidate states for a particular case.

This calculates the losing states using Grundy numbers.

Indeed these are losing states.

There are no other losing states.

Here the authors define the Grundy-like numbers , which are efficient to calculate.

**Definition 6**

- If , then let . If , then let . If , then let .
- Suppose that and .
- Suppose that and .
- Suppose that and .

**Proposition 6**

For a proof of this proposition see Theorem 4.2 of [4].

This defines the corresponding *Mathematica* function `G2`.

This calculates the losing states using Grundy-like numbers.

They are exactly all the losing states.

Here are some graphs of the set of losing states of chocolate games. (For more, see [10].)

By Proposition 2 there is a simple formula for the set of losing states of the chocolate games satisfying , where is even. In Figure 9 the graph is a Sierpinski gasket for even .

When is odd, the graph is very different from a Sierpinski gasket. Since the graph is quite complicated, it seems likely that there is no simple formula for the set of losing states. As for the relation between the graphs of losing states and the Sierpinski gasket, see [7] (in Japanese).

Figure 10 shows that the graph is the Sierpinski gasket for odd , so it seems that there may be a simple formula for the set of losing states in that case.

When , by Proposition 3 there is a simple formula for the set of losing states. When , the state is a losing state of this chocolate game if and only if [6]. The authors are now trying to find a simple formula for the set of losing states for an arbitrary odd number .

When is even, the graph is very different from the Sierpinski gasket. Since the graph is quite complicated, it seems that there is no simple formula for the set of losing states.

We can study many kinds of inequalities. For example, the chocolate games that satisfy the inequalities or are interesting, when is even and is a non-negative integer. The authors are studying these games.

Dr. Miyadera taught the rectangular chocolate games presented in [2] to his students and encouraged them to make new chocolate games. Students were very good at proposing new types of chocolate games, including the ones in Figures 3-8. These students were the first people who studied chocolate games that satisfy inequalities.

As for the triangular chocolate game applications on smart phones, see [12], [13], and [14].

The fact that high school students could discover new theorems using computer algebra systems is very important for mathematics education. There are many teachers around the world who make students rediscover formulas in the classroom. In so-called “learning by doing,” students are supposed to rediscover formulas and theorems, while teachers know these formulas and theorems. In the authors’ high school research projects, students discover new things! The difference between discovery and rediscovery is very big, and the method the authors introduced can change mathematics education greatly.

The method used in the group is as follows.

First Dr. Ryohei Miyadera introduces some problems to the high school students, and after that students are required to create new problems by changing the conditions of the original problems. Sometimes they make a problem that is a little bit different from the original problem. At other times they propose a completely new problem. When the proposed problem seems to be very promising to Dr. Miyadera, he and his students begin to research it. They make computer programs using *Mathematica *and try to find interesting facts by calculation. Once they discover a new fact, they begin to prove it. Even if the proposed problem does not seem to be interesting enough, Dr. Miyadera encourages students to study it. It is often the case that a problem that seems to be trivial to him turns out to be a good problem. Some students have a kind of intuition that is beyond the imagination of Dr. Miyadera, who is an active mathematician himself. This method of research looks simple, but it is very effective.

In the history of this research group, more than 60 students have participated. Dr. Miyadera discovered that some students are very good at proposing new ideas while others are good at proving formulas and theorems. According to some researchers of mathematics education, there has never been a group of high school students that have been constantly discovering new formulas and theorems.

This group of students won the first prize in the Canada Wide Virtual Science Fair 2007, 2008, 2009, 2010, 2011, 2012, and 2013. They won the Imai Isao Award in 2007, which is the first prize in the Japan Wide High School Research Contest supported by Kogakuin University in Japan. They won the first prize in the Japan Science & Engineering Challenge 2008, and represented Japan in the International Science and Engineering Fair. (Japan declined to send them to the fair because of the epidemic of swine influenza.) They became finalists in the Japan Science & Engineering Challenge in 2011, 2012, and 2013. They became Google Science Fair Regional Finalists in 2012 and became semifinalists in the Yau High School Mathematics Awards in 2012 and 2013.

For a detailed explanation about the authors’ high school mathematics research project, see [15], [16], [17], [18], [19], [20], and [21]. The authors have presented their research at the International *Mathematica* Symposium in [22], [23], and [24].

They have also presented their research in the *Mathematica* Demonstrations Project, and Ryohei Miyadera was selected as a featured contributor. See [25], [26], and [27].

Dr. Miyadera received the Excellent Teacher Award from the Ministry of Education of Japan in 2012, and he received the Wolfram Innovator Award in 2012 for his research with high school students.

If you are interested in doing a high school or undergraduate mathematics research project, one of the best books to use is by Cowen and Kennedy [11].

If students use *Mathematica* freely with an experienced mathematician, they can discover a lot of new and interesting facts of mathematics. The difference between a true discovery and rediscovery is great. We think that the best way to learn how to be creative is to create new and interesting things.

We would like to thank Professor Robert Cowen of the City University of New York for his encouragement when we met [22]; he gave his book [11] to us. We learned how to use *Mathematica* in our research by using his book. We also would like to thank Ed Pegg Jr of Wolfram Research and Professor Tadashi Takahashi for giving us valuable advice.

[1] | Wikipedia. “Chomp.” (Dec 4, 2013) en.wikipedia.org/wiki/Chomp. |

[2] | A. C. Robin, “A Poisoned Chocolate Problem,” Problem Corner, The Mathematical Gazette, 73(466), 1989 pp. 341-343. (An answer for the above problem is in 74(468), 1990pp. 171-173.) |

[3] | R. Miyadera, S. Nakamura, and R. Hanafusa, “New Chocolate Games—Variants of the Game of Nim,” Proceedings of the Annual International Conference on Computational Mathematics, Computational Geometry & Statistics, Singapore, 2012 pp. 122-128.dl4.globalstf.org/?wpsc-product=new-chocolate-game-variants-of-the-game-of-nim. |

[4] | R. Miyadera, S. Nakamura, and R. Hanafusa, “New Chocolate Games,” GSTF Journal of Mathematics, Statistics and Operations Research, 1(1), 2012 pp. 111-116.dl4.globalstf.org/?wpsc-product=new-chocolate-games. |

[5] | S. Nakamura, Y. Naito, R. Miyadera et al., “Chocolate Games That Are Variants of Nim and Interesting Graphs Made by These Games,” Visual Mathematics, 14(2), 2012. www.mi.sanu.ac.rs/vismath/miyaderasept2012/index.html. |

[6] | T. Yamauchi, T. Inoue, and Y. Tomari, “Variants of the Game of Nim That Have Inequalities as Conditions,” Rose-Hulman Institute of Technology, Undergraduate Math Journal, 10(2), 2009. www.rose-hulman.edu/mathjournal/archives/2009/vol10-n2/paper12/v10n2-12pd.pdf. |

[7] | R. Miyadera and S. Nakamura, “A Chocolate Game,” Mathematics of Puzzles, Journal of Information Processing (special issue), 53(6), 2012 pp. 1582-1591 (in Japanese). |

[8] | M. Naito, T. Yamauchi, T. Inoue et al., “Discrete Mathematics and Computer Algebra System,” Proceedings of the 9th Asian Symposium on Computer Mathematics-Mathematical Aspects of Computer and Information Sciences Joint Conference (ASCM-MACIS), Fukuoka, Japan: Kyushu University, 2009 pp. 127-136.gcoe-mi.jp/temp/publish/a2019bae287f4628e4bfc9d273150000.pdf. |

[9] | MathPuzzle.com. “The Bitter Chocolate Problem.” Material added 8 Jan 06 (Happy New Year). www.mathpuzzle.com/26Feb2006.html. |

[10] | T. Nakaoka, R. Miyadera et al., “Combinatorial Games and Beautiful Graphs Produced by Them,” Visual Mathematics, 11(3), 2009. www.mi.sanu.ac.rs/vismath/miyaderasept2009/index.html. |

[11] | R. Cowen and J. Kennedy, Discovering Mathematics with Mathematica, Erudition Books, 2001. |

[12] | M. Fukui, “Bitter Chocolate Games 1.” itunes.apple.com/jp/app/bitter-chocolate-games-1/id495880660?mt=8. |

[13] | M. Fukui, “Bitter Chocolate Games 2.” itunes.apple.com/jp/app/bitter-chocolate-games-2/id502476677?mt=8. |

[14] | S. Nakamura, “Poison Chocolate Game.” market.android.com/details?id=com.kgshmathclub.chocoapp. |

[15] | R. Miyadera, T. Inoue, and S. Nakamura, “High School Mathematics Research Project Using Computer Algebra System,” Proceedings of the 16th International Seminar on Education of Gifted Students in Mathematics, The Korean Society of Mathematical Education, Chungnam National University, Daejeon, Korea, 2011 pp. 303-316. |

[16] | S. Hashiba, D. Minematsu, H. Shimoda, and R. Miyadera, “How High School Students Can Discover Original Ideas of Mathematics Using Mathematica,” Mathematica in Education and Research 11(3), 2006. |

[17] | H. Matsui, D. Minematsu, T. Yamauchi, and R. Miyadera, “Pascal-Like Triangles and Fibonacci-Like Sequences,” Mathematical Gazette, 94(529), 2010 pp. 27-41. |

[18] | R. Miyadera, T. Hashiba, Y. Nakagawa, T. Yamauchi, H. Matsui, S. Hashiba, D. Minematsu, and M. Sakaguchi, “Pascal-Like Triangles and Sierpinski-Like Gaskets,” Visual Mathematics, 9(1), 2007. www.mi.sanu.ac.rs/vismath/miyad/pascalliketriangle.html. |

[19] | H. Matsui and T. Yamauchi, “Formulas for Fibonacci-Like Sequences Produced by Pascal-Like Triangles,” Rose-Hulman Institute of Technology Undergraduate Math Journal, 9(2), 2008. www.rose-hulman.edu/mathjournal/archives/2008/vol9-n2/paper11/v9n2-11p.pdf. |

[20] | H. Matsui, N. Saita, K. Kawata, Y. Sakurama, and R. Miyadera, “Elementary Problems, B-1019,” Fibonacci Quarterly, 44(3), 2006. www.fq.math.ca/Problems/August2006elementary.pdf. |

[21] | R. Miyadera, “General Theory of Russian Roulette.” library.wolfram.com/infocenter/MathSource/5710. |

[22] | R. Miyadera, K. Fujii et al., “How High School Students Could Present Original Math Research Using Mathematica,” The International Mathematica Symposium, 2003. library.wolfram.com/infocenter/Conferences/4983. |

[23] | R. Miyadera and D. Minematsu, “Curious Properties of an Iterative Process: How Could High School Students Present Original Mathematics Research Using Mathematica?,” The International Mathematica Symposium, 2004. library.wolfram.com/infocenter/Conferences/6055. |

[24] | R. Miyadera, “Using Mathematica in Doing the Research of Mathematics with High School Students,” Proceedings of the Fourth International Mathematica Symposium (IMS2001), Tokyo, 2001. |

[25] | H. Matsui, T. Yamauchi, D. Minematsu, and R. Miyadera. “Pascal-Like Triangles Made from a Game” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/PascalLikeTrianglesMadeFromAGame. |

[26] | H. Matsui, T. Yamauchi, D. Minematsu, and R. Miyadera. “Pascal-Like Triangles Mod ” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/PascalLikeTrianglesModK. |

[27] | T. Inoue, T. Nakaoka, and R. Miyadera. “A Bitter Chocolate Problem That Satisfies an Inequality” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/ABitterChocolateProblemThatSatisfiesAnInequality. |

R. Miyadera, S. Nakamura, Y. Okada, T. Ishikawa, and R. Hanafusa, “Chocolate Games,” The Mathematica Journal, 2013. dx.doi.org/doi:10.3888/tmj.15-12. |

Ryohei Miyadera received a Ph.D. (mathematics) at Osaka City University and received a second Ph.D. (mathematics education) at Kobe University. He has two fields of research: probability theory of functions with values in an abstract space and applications of *Mathematica* to discrete mathematics. He and his high school students have been doing research in discrete mathematics for more than 15 years. They have talked at IMS 2001 [24], IMS 2003 [22], and IMS 2004 [23]. They have also published more than 22 refereed articles. Ryohei Miyadera received the Excellent Teacher Award from the Ministry of Education of Japan in 2012, and he received the Wolfram Innovator Award in 2012. His hobby is long-distance running, and he is one of the best runners in the over-55-years-old group of his prefecture.

Shunsuke Nakamura has graduated from Kwansei Gakuin High School and is preparing for the entrance examination to university. He did mathematical research with Dr. Miyadera when he was a high school student at Kwansei Gakuin. He has published papers in [3], [4], [5], and [7]. He is a member of the Information Processing Society of Japan. Nakamura’s hobbies are playing the piano and playing video games.

Yu Okada is a high school student. He is a very serious player of Massively Multiplayer Online RPGs.

Tomoki Ishikawa is a university student at Kwansei Gakuin. He did mathematical research with Dr. Miyadera when he was a high school student at Kwansei Gakuin. He has published papers on origami mathematics. He is a photographer who received an excellent photography award in the Japan Wide Cultural Festival.

Ryo Hanafusa is a university student at Kwansei Gakuin. He did mathematical research with Dr. Miyadera when he was a high school student at Kwansei Gakuin. He has published papers in [3] and [4]. He is a photographer who received an excellent photography award in the Japan Wide Cultural Festival.

**Ryohei Miyadera**

*Mathematics Department
Kwansei Gakuin High School*

**Shunsuke Nakamura**

*Kwansei Gakuin High School*

*nakashun1994@gmail.com*

**Yu Okada**

*Kwansei Gakuin High School*

*alm1_a1m_aim@ezweb.ne.jp*

**Tomoki Ishikawa**

*Kwansei Gakuin High School*

* tom94826@gmail.com*

**Ryo Hanafusa**

*Kwansei Gakuin High School*

*ryo3waygate@yahoo.co.jp*

Affymetrix gene expression microarrays (“chips”) are a commercial implementation of a powerful concept originally introduced to the world by Shena and colleagues in 1995 [1]. When successfully implemented, gene expression microarrays let a biologist measure the expression of thousands of genes simultaneously in a biological sample, such as heart tissue, and further, to compare that measure of expression between two biological states, such as diseased heart tissue and healthy heart tissue. In many ways, the introduction of microarray technology has created a revolution in biology, transforming the field into a “big data” science like its sister disciplines of physics and chemistry.

Gene expression microarrays (Figure 1A) are manufactured by attaching strands of deoxyribonucleic acid (DNA), corresponding to different genes of an organism, across the surface of a glass slide. The basic process of identifying genes that are expressed begins with the extraction of messenger RNA (mRNA) from a source, for example, healthy heart cells (Figure 1B). The molecule mRNA is made by cells when a gene is expressed, meaning its physical presence—assuming it can be reliably detected—is an indicator of gene expression. By fluorescently labeling the mRNA and hybridizing it to the surface of the chip, it is possible to quantify the intensity of multiple genes’ expression from that biological source by using a scanner able to measure a fluorescent signal. When this process is repeated on a different biological source, such as diseased heart tissue, a separate gene expression profile is created for the diseased tissue, which can then be computationally compared to the expression profile from the healthy tissue. In this manner, genes that are more highly expressed or more highly repressed in diseased tissue can be identified by comparing their expression profile to the expression profile of the same genes in the healthy tissue. This has obvious implications for determining which genes may be playing a role in disease development or any other biological process of interest, from cancer metastasis and drug resistance in medicine to fruit ripening and drought resistance in agriculture.

**Figure 1.** (A) Basic microarray design and layout. (B) Extraction of mRNA—expressed from the nucleus of a healthy heart cell—and its subsequent fluorescent labeling. (C) Hybridization of the fluorescently labeled mRNA (“target”) to the surface of an Affymetrix microarray chip, and its subsequent scanning to quantify the fluorescent signal, assumed to be proportional to gene expression.

The Affymetrix implementation of gene expression microarrays utilizes probesets, synthesized in place, on the surface of the microarray chip (Figure 1C). Probesets are groups of small DNA fragments that are complementary to different regions of the same mRNA molecule made whenever a gene is expressed. By combining the fluorescent signal of the probeset group, a single measure of gene expression is arrived at computationally, which is the primary focus of the algorithm presented here. Probesets are composed of groups of perfect match (“PM”) and mismatch (“MM”) probes. In the context of microarray analysis, the term “probe” refers to the strands of DNA physically tethered to the microarray chip and the term “target” refers to the fluorescently labeled sample of mRNA obtained from a biological source, which will be hybridized to the probes to measure gene expression. Perfect match probes are single-strand sequences of DNA, usually 25 nucleotides in length, which have perfect complementarity to the mRNA sequence to which they are designed to hybridize. Mismatch probes are identical to PM probes, with the exception of one nucleotide in the center of the molecule (typically at position 13) that is not a proper match to the mRNA with which it is designed to hybridize. The purpose of MM probes is to measure background fluorescent signal off the surface of the chip, which is one source of technical noise.

The analysis of microarray data involves numerous steps, some of which are not universally performed, but whose combination is collectively referred to as an “analysis pathway.” The steps in an analysis pathway typically involve:

- background correction: performed to remove fluorescent signal not due to biology
- probe normalization: used to place the individual datasets of an experiment on the same scale, so the datasets can be compared accurately
- perfect match (PM) probe correction: used to correct biases in the PM probe signal, often due to differences in DNA sequence between the probes
- summarization: performed to obtain a single measure of gene expression from the multiple measurements obtained by each probeset; this process often attempts to correct “probe” and “chip” technical noise
- probeset normalization: sometimes performed to make the probesets between datasets more directly comparable
- differentially expressed genes (DEG) test: uses a statistical test to identify “true” differentially expressed genes

Despite all of its positive aspects, microarray technology requires considerable knowledge to use effectively, as the raw signal that is generated from the technology is almost always noisy. The scientific literature is rife with algorithms designed to remove various sources of known error, and it is immediately clear that there is no “perfect” algorithm that is universally useful in all experimental situations. Even so, a seminal article by Zhu and colleagues [2] evaluated different combinations of commonly used algorithms—representing over 40,000 different analysis pathways—using a precisely controlled “spike-in” dataset, which allowed the researchers to identify the most important steps common to a good analysis pathway.

The algorithm presented here represents a merging of several of the “best step” analysis pathway practices identified in [2]. Specifically, the algorithm presented here uses the following analysis pathway:

- background correction: none
- probe normalization: performed using quantile normalization [3]
- perfect match (PM) probe correction: none
- summarization: performed using median polish [4]
- probeset normalization: none
- differentially expressed genes (DEG) test: while probability data is provided to aid interpretation, the identification of differentially expressed genes is not performed with statistical methods, but instead relies on graphical interpretation of the processed data

The algorithm presented here does not perform background correction, perfect match probe correction, or probeset normalization because the evidence presented in [2] suggests that these steps are at best unnecessary and sometimes even detrimental. Readers interested in a deeper discussion of microarray technology and data analysis are referred to the excellent reviews in [5, 6].

The AffyDGED algorithm is template-driven, meaning that the algorithm expects several pieces of user-defined information to be provided in a notebook cell used as a template for entering the information.

The example above contains several variables that must be completed by the user to let AffyDGED do its job properly.

The variables requiring user input are:

- cellocation: This variable holds the directory location for finding the CEL data of the microarray hybridizations. CEL files contain the raw fluorescent data from a microarray experiment using Affymetrix technology.
`affycdflocation`: This variable holds the directory location for finding the Affymetrix CDF library file. CDF stands for “chip description file” and refers to an Affymetrix file that describes, among other things, the location of the probes on the specific type of Affymetrix chip being used.`savelocationroot`: This variable holds the location where the user would like the final results of the analysis to be saved.`qnormversion`: This variable lets the user select between two options. The option “all” can be entered to perform quantile normalization using all the chips involved in an experiment at once, or alternatively, the option “condition” can be entered, which directs the algorithm to perform quantile normalization by condition, that is, perform quantile normalization twice, once using the data in the experimental condition (such as diseased heart tissue) and then a second time using the control condition data (such as healthy heart tissue). When in doubt, it is recommended that “all” be used.`studyname`: This variable lets the user name the experiment/study being processed by the AffyDGED algorithm. The output of AffyDGED is saved using this name to the location provided in “savelocationroot” above.`experimentchips`: This variable contains a list of the experimental condition datasets being studied.`controlchips`: This variable contains a list of the control condition datasets being studied.

To illustrate the features of the AffyDGED algorithm, we use data from a modestly sized microarray experiment involving the detection of differentially expressed genes between the saliva of pancreatic patients and healthy individuals [7]. All microarray data used in this study and presented here is publicly available at NCBI’s Gene Expression Omnibus portal (www.ncbi.nlm.nih.gov/geo), using the access number GSE14245.

The first tasks completed by AffyDGED include the loading of raw data, the determination of the physical dimensions of the chip data, and the conversion of Affymetrix probe position coordinates to equivalent *Mathematica* indices.

The `chipdimensions` and `affyindextoMMAindices` modules are designed to establish the number of rows and columns of probe data on the microarray chips, as well as to convert the probe position coordinates as assigned by Affymetrix to equivalent *Mathematica* indices.

For example, the chips used here (Human Genome U133 Plus 2.0) happen to be square, with 1,164 rows of information and 1,164 columns of information.

Affymetrix uses a single-number index (present on the Affymetrix CDF file) created from coordinates of the probes present on their microarray chips. The single-number index is derived from a formula used by Affymetrix that assumes an index of refers to the uppermost-leftmost position of the chip. To successfully load the probeset data into usable groups in *Mathematica*, it is necessary to shift the coordinates used by Affymetrix into indexing used by *Mathematica*.

Here is an example of the data contained within the Affymetrix CDF file.

The AffyDGED algorithm parses out the single-number indices for the perfect match probes of each probeset and converts that positional information into usable indices for *Mathematica*. The mismatch probes are purposely ignored in the AffyDGED algorithm because they often produce signals higher than the perfect match probes, which is an indication that the mismatch probes are not performing as originally intended by Affymetrix engineers.

There are 11 individual numbers, referring to the positions (in Affymetrix coordinates) of 11 perfect match probes of a single probeset used to measure the expression of a specific gene.

Here is the same positional information, now expressed in *Mathematica*’s indexing system.

There are now 11 groups of *Mathematica* indices referring to the same data and accessible by conventional *Mathematica* indexing.

We can now take a look at the raw data from a single chip used in the pancreatic cancer study that has been loaded. Notice the extreme range of values typically seen in microarray experiments.

For this reason, we look at a histogram of the data using a logarithmic scale (Figure 2).

**Figure 2.** A histogram of the raw fluorescence intensity data (log scale) contained in the microarray chip GSM356796, used to measure the expression of genes in the saliva of a pancreatic cancer patient.

The majority of the data has an approximately Gaussian appearance (on a log scale), while still harboring extreme values on the rightward tail (look carefully along the axis). This shape is very characteristic of microarray data.

The next steps in processing include transforming the raw data to a scale, determining the probeset size specific for the chip in use, and performing quantile normalization.

The convention of transforming raw microarray data by is almost universally used in the microarray community, because it performs two useful functions. First, it makes the distribution of raw data more Gaussian (although certainly not perfectly so), and it aids interpretation of gene expression ratios for the end user, because it is easier to appreciate that a ratio of and on a log scale indicates the same degree of “up” and “down” regulation for a gene, as opposed to and on an absolute scale.

In the example shown here, this is the probeset size (the number of probes making up each probeset).

Quantile normalization is performed to place each dataset of the experiment on a common scale so the datasets can be appropriately compared to each other.

Using `BoxWhiskerChart`, this shows the difference between the pre-quantile normalized data and the post-quantile normalized data (Figure 3).

**Figure 3.** A box-and-whisker comparison of all 24 microarray chips used to compare gene expression between the saliva of pancreatic cancer and healthy patients, before and after quantile normalization.

Following quantile normalization, the probesets are summarized (i.e., a single measure of gene expression is generated) by first performing median polish and then taking the mean of the polished values for all probes within a probeset. After the probesets are summarized, the differential expression of each gene is obtained by subtracting the expression of a gene in the control condition from the expression of the gene in the experimental condition. For example, if the expression of a gene in the saliva of pancreatic cancer patients (the “experimental” condition) is 2.1 and the expression of the same gene in the saliva of healthy patients (the “control” condition) is 1.4, then the differential expression of the gene is .

Here are the distributions of gene expression in the saliva of pancreatic cancer and healthy patients, as well as the distribution of differentially expressed genes between the two states (Figure 4).

**Figure 4.** Histograms of the summarized (post median polish) gene expression values contained in the saliva of pancreatic cancer and healthy patients, as well as the difference in gene expression between those two biological groups.

From these results, simulations are performed to calculate probability -values, which answer the question, If we were to repeat this experiment many times, under the same experimental conditions as this study, how often would we find results as (or more) extreme than we have observed for each gene in the current study? The simulations performed to calculate the -values use the corrected fluorescent signal values contained within the experimental and control datasets and thus make the assumptions that the corrections are valid and that the data is now a good representative of the biological “truth” being studied. If these assumptions are correct, the probability values reported help the end user to gauge how rare a gene’s measure of differential expression is, but not—as is traditionally used in the microarray community—to make a decision about statistical or biological significance.

Following this, the algorithm organizes the output into a more human-readable table.

Here is a random sample of our current results.

The output columns are as follows:

Column 1: the signal-corrected fluorescence intensity of gene expression in the experimental condition

Column 2: the signal-corrected fluorescence intensity of gene expression in the control condition

Column 3: the measure of differential expression, obtained by subtracting column 2 from column 1

Column 4: the -value obtained through simulation

Column 5: the probeset name as assigned by Affymetrix

Column 6: descriptive information for the probeset including the genbank accession number, gene name, and gene product information

Due to the length of descriptive data in column 6, the output here has been purposefully rearranged to print column 6 underneath each data point’s first five column entries.

The final steps in processing the data include establishing the upper and lower thresholds for determining when a gene is considered up-regulated (turned “on”) in the experimental condition versus the control condition, and when a gene is considered down-regulated (turned “off”). Further, the algorithm saves the final results and provides a summary report of the completed analysis.

AffyDGED saves all the data and a list of differentially expressed genes to the location defined by the user in the template above. Further, both lists of data are saved in two formats, one convenient for users to continue to explore the data in *Mathematica* and the other conveniently readable in Microsoft Excel.

Affymetrix gene expression array technology utilizes a suite of library files containing important annotation information about the layout and content of its microarray products. AffyDGED can analyze any Affymetrix gene expression experiment as long as the .cdf (chip description file) and .gin (gene information) library files associated with the specific type of microarray chip being used are provided. These files are freely available to the public at www.affymetrix.com/support/technical/libraryfilesmain.affx. The variable “affycdflocation”, defined in the user template, holds the directory location of where the user has stored the .cdf file, and AffyDGED expects the .gin file to also be stored in the same location. Having the .cdf and .gin files together happens naturally, as they are packaged together by Affymetrix and unzip to the same location upon download.

As mentioned previously, there is no universally correct algorithm for processing microarray data in all experimental circumstances. AffyDGED was developed under the guidance of [2] because the exact expression of genes in this study was precisely controlled and therefore is a useful gauge of how effectively a microarray analysis algorithm is performing.

Supplemental file 5 in [2] describes the 1,944 differentially expressed genes and 3,426 non-differentially expressed genes that were purposefully “spiked-in” to the microarray experiment. When this dataset is analyzed by the AffyDGED algorithm described here, the thresholds for determining “up” and “down” gene expression are calculated to be 0.225 and , respectively (the red lines in Figure 5). Establishing the thresholds for determining differential expression relies on the observation illustrated in Figure 5, that a plot of the processed data always reveals a tight clustering of data about the line . As the reader scans above and below that axis, note how the density of data noticeably separates from the cluster along that line. This observation was used to develop code that scans vertically up and down in small increments and establishes a breakpoint in each direction any time the density of data at a vertical position is 50% less than it was at the previous increment. These breakpoints become the thresholds for determining differentially expressed up and down genes.

**Figure 5.** The resulting differential expression threshold determination plot by AffyDGED when processing the data in [2] (accession number GSE21344).

The AffyDGED algorithm identifies 1,832 genes as differentially expressed in [2]. Of these, 1,591 genes overlap with the true 1,944 differentially expressed genes for a true positive rate of . This means AffyDGED was unable to correctly identify 18% of the true list of differentially expressed genes. While perhaps surprising to readers unfamiliar with microarray analysis, this places AffyDGED’s performance among the top performers of leading algorithms identified by [2]. State of the art at this time in microarray analysis means accepting a 1 out of 5 “miscall” rate in differential expression detection. Because of the complexity of microarray technology and of all the steps that occur prior to the actual algorithmic analysis of the data, it is important to realize that at least some of the miscalls by any microarray algorithm are really due to factors that are poorly controlled for by the underlying technology, and do not indicate a weakness in the algorithm itself [8].

To gauge the performance of AffyDGED, several publicly available datasets of different sizes and complexity were profiled. The first row of Table 1 shows the series accession number for each dataset available at NCBI’s Gene Expression Omnibus. All timings were obtained using quantile normalization with all chips (i.e. qnormversion set to “all”). Timings were acquired running *Mathematica* 9.0 under Windows 7 (64 bit) using an Intel Core i5-2500K processor overclocked to 4.48 Ghz.

AffyDGED performs very well, in all cases completing its analysis in less than seven and a half minutes. The largest chip in this comparison is the Human chip, where each of the processed files contains 13.2 Mb of data. AffyDGED is able to process the combined worth of data in a practically usable time frame.

Microarray technology continues to be heavily used by the biomedical and basic science research communities throughout the world. AffyDGED brings a contemporary algorithm useful in the real world to the *Mathematica* user community interested in exploring fundamental biology questions with their favorite computational tool chest.

[1] | M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, “Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray,” Science, 270(5235), 1995 pp. 467-470. www.jstor.org/stable/2889064. |

[2] | Q. Zhu, J. C. Miecznikowsk, and M. S. Halfon, “Preferred Analysis Methods for Affymetrix GeneChips. II. An Expanded, Balanced, Wholly-Defined Spike-in Dataset,” BMC Bioinformatics, 11(285), 2010. doi:10.1186/1471-2105-11-285. |

[3] | B. M. Bolstad, R. A. Irizarry, M. Ästrand, and T. P. Speed, “A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias,” Bioinformatics, 19(2), 2003 pp. 185-193. doi:10.1093/bioinformatics/19.2.185. |

[4] | J. W. Tukey, Exploratory Data Analysis, Reading, MA: Addison-Wesley, 1977. |

[5] | C. A. Harrington, C. Rosenow, and J. Retief, “Monitoring Gene Expression Using DNA Microarrays,” Current Opinion in Microbiology, 3(3), 2000 pp. 285-291.doi:10.1016/S1369-5274(00)00091-6. |

[6] | J. Lovén, D. A. Orlando, A. A. Sigova, C. Y. Lin, P. B. Rahl, C. B. Burge, D. L. Levens, T. I. Lee, and R. A. Young, “Revisiting Global Gene Expression Analysis,” Cell, 151(3), 2012 pp. 476-482. doi:10.1016/j.cell.2012.10.012. |

[7] | L. Zhang, J. J. Farrell, H. Zhou, D. Elashoff, D. Akin, N.-H. Park, D. Chia, and D. T. Wong, “Salivary Transcriptomic Biomarkers for Detection of Resectable Pancreatic Cancer,” Gastroenterology, 138(3), 2010 pp. 949-957.e7. doi:10.1053/j.gastro.2009.11.010. |

[8] | R. A. Rubin, “A First Principles Approach to Differential Expression in Microarray Data Analysis,” BMC Bioinformatics, 10(292), 2009. doi:10.1186/1471-2105-10-292. |

T. Allen, “Detecting Differential Gene Expression Using Affymetrix Microarrays,” The Mathematica Journal, 2013. dx.doi.org/doi:10.3888/tmj.15-11. |

Todd Allen is an associate professor of biology at HACC-Lancaster. His interest in computational biology using *Mathematica* took shape during his postdoctoral research years at the University of Maryland, where he developed a custom cDNA microarray chip to study gene expression changes in the fungal pathogen *Cryphonectria parasitica*.

**Todd D. Allen, Ph.D.**

*Harrisburg Area Community College (Lancaster Campus)
East 206 R
1641 Old Philadelphia Pike
Lancaster, PA 17602*

*Mathematica*’s industrial-strength Boolean computation capability is not used as often as it should be. There probably are several reasons for this lack of use, but it is our view that a primary reason is lack of experience in expressing mathematical problems in the form required for Boolean computation. We look at a typical problem that is susceptible to Boolean analysis and show how to translate it so that it can be tested for satisfiability with *Mathematica*’s built-in function `SatisfiableQ`. The problems we investigate come from an area of mathematics called Ramsey theory. Although Ramsey theory has been studied extensively for over 80 years and still provides many challenges, we neglect the theory (for the most part) and instead concentrate on translating the problems so that they are amenable to Boolean computation and then see what can be accomplished by computation alone. Those interested in learning a little more about Ramsey theory can consult [1]; for a standard reference, see [2].

We only concern ourselves with Boolean formulas in *conjunctive normal form* (*cnf*). A cnf is a conjunction of disjunctions of statements or their negations; the disjuncts are often called *clauses*.

An example of a cnf is ; letters represent statements and the symbols , , and stand for “or,” “and,” and “not.” A propositional formula is *satisfiable *if there is an assignment of *true* and *false* to its statements that makes the formula *true* when evaluated using the usual truth table rules.

Before *Mathematica* can test whether this formula is satisfiable using `SatisfiableQ`, we must replace , , and by `||`, `&&`, and `!`. Here is the translation to a *Mathematica* expression.

It turns out that the formula is satisfiable.

A well-known problem states that at any party with at least six people, there are at least three mutual acquaintances (each knows the other two) or three mutual strangers (each does not know the other two). In the language of graph theory, if the edges of the complete graph on six vertices, are colored red or blue, there must be either a red or a blue triangle.

We translate this into a satisfiability problem in propositional logic and then use `SatisfiableQ` to prove this result. More precisely, we show that it is not possible to color the edges of either red or blue without forming either a red or blue triangle, by building a cnf whose satisfaction is equivalent to the existence of such a coloring and then showing that this cnf cannot be satisfied.

More generally, Ramsey theory considers problems of this form: given a complete graph and integers , , with , is it possible to color the graph’s edges red or blue without obtaining a red or a blue as a subgraph?

Begin by numbering the vertices of from 1 to 6 and name its edges with ordered pairs of vertex numbers, , . For each such pair, generate two propositional variables, and , which intuitively express coloring the edge red or blue.

The cnf needs two sets of Boolean clauses.

- For each edge , the clauses and express that the edge is either red or blue, but not both.
- For each triangle , , , the clauses and express that not all edges of the triangle can be the same color.

If the cnf that is the conjunction of all these clauses is satisfiable, then it is possible to color the edges of red or blue without obtaining either a red or blue triangle. Moreover, any truth value assignment satisfying this cnf would lead immediately to a coloring of the edges by coloring the edge blue exactly when is assigned to be true. Conversely, a red-blue coloring of the edges of with no monochromatic triangle leads directly to a satisfying assignment of ; simply assign to be true if and only if the edge is colored blue, and so on. We will show that the cnf is unsatisfiable.

The function `ColorEdges` generates the first set of clauses, where the *Mathematica* expressions and play the role of and . The function states that the edges of are either one or the other of the given colors. It is generalized to three colors in the last section.

Here is the first set of clauses for the party problem.

The function `NoCompleteSubgraph` can generate the second set of clauses. gives `True` if does not contain a with all edges of the given color.

The functions `ColorEdges` and `NoCompleteSubgraph` can be extended to treat Ramsey problems for complete graphs with any number of colors and complete subgraphs.

The function puts it all together to obtain our test formula.

Here is an example.

The Boolean formula tests the party problem.

Thus, it is not possible to color the edges of red or blue and avoid a monochromatic triangle.

Although `RamseyTest` is very fast, there are ways to speed it up. (This is unimportant so far but becomes crucial when considering much larger problems!)

Since vertex 1 is connected to five other vertices, at least three of the connecting edges must be the same color; for if there were at most two red and at most two blue edges there would be at most four connecting edges instead of five. Therefore assume that the first three edges are red and not blue; by symmetry, it makes no difference which color we choose. (Replacements by one-element or unit clauses often result in significantly faster running times in automated theorem provers.)

Of course, the length of is , which is .

The edges of can be colored blue or red without any monochromatic triangles. This is easily shown by hand or with the following computation.

In fact, *Mathematica* can suggest a coloring. First generate the list `v` of propositional variables in . Then use `SatisfiabilityInstances` to indicate the coloring.

This draws the suggested edge-colored graph.

It turns out there are always at least two monochromatic triangles in [3, 4]! Since there is at least one monochromatic triangle in , assume it is formed by the edges , , and and, by symmetry, assume that its edges are all red. The next command modifies accordingly. The first replacement drops the uncertainty about the color of those three edges, the second replacement drops the conditions that those edges are not all the same color, and the `Append` asserts that indeed they are all red. The result `False` means that it is impossible to deny that there is a second monochromatic triangle, or, more simply, there is a second monochromatic triangle.

The Ramsey number, , is defined to be the least integer such that any red-blue edge coloring of results in either a blue or a red . (The previous section showed that .) Not many Ramsey numbers are precisely known [1, 2].

This confirms the result .

This confirms the result .

The calculation took too long and had to be aborted, but the simple idea to speed up the party problem in the previous section makes it possible to show : in any , at least half the edges from vertex 1 must be the same color; that is, edges must be the same color. Unless there is symmetry (i.e., ), we must consider two separate cases, that this color is blue or red. The function `QuickerRamseyTest` is the faster version of `RamseyTest`.

For the symmetric case , only one test suffices for each of and .

There are more than 6000 clauses in the cnf constructed to show that , since there are two clauses needed in to rule out any ’s being all red or all blue and there are 3060 in (). Also, each of these clauses has six negated propositional letters, one for each edge of the (a has edges). In addition, there are the clauses that require each edge of to be red or blue but not both, has edges, and so on. It seems that *Mathematica*’s claim of “industrial strength” Boolean capability is fully justified.

Ramsey theory also considers edge colorings with more than two colors. The classical result for three colors is ; that is, it is not possible to color the edges of with three colors without a monochromatic triangle; however, has such a coloring.

This generalizes `ColorEdges` from two to three colors so that each edge gets exactly one of three colors.

The generalization of `RamseyTest` takes an additional argument, , and a third color. It tests whether the edges of can be colored with three colors without a monochromatic , , or .

To speed up the calculation, we make use of the following observation. Vertex 1 of is connected by 16 edges to the remaining vertices. Surely, at least six of these edges must get the same color, say red; for if at most five got the same color, there would be at most 15 edges. This leads to a quicker test formula than `RamseyTest`.

To check , it suffices to check and .

Here is the with its edges colored red, blue, and green separately and together. There are no monochromatic triangles. Each of the three graphs with one color is known to be isomorphic to the Clebsch graph; [5] shows them with their vertices permuted in all possible ways.

Suppose we remove an edge from ; will it still be the case that any two-coloring of the edges has a monochromatic triangle? Or, if we remove an edge from , will the resulting graph still have the property that any two-coloring of its edges must yield a monochromatic ? We show that Boolean computation is also well suited to investigate these kinds of problems.

Remove from the clauses that require the edge to be blue or green, but not both, and add a statement to color that edge white. The result is satisfiable, which means that removing just one edge from gives a graph that can be two-colored without monochromatic triangles.

This defines an auxiliary function.

Here is the coloring.

Recall that ; however, removing even one edge from allows a two-coloring without a monochromatic .

Unlike the case of with an edge removed, it is too hard to see that the blue and green edge-colored subgraphs contain no monochromatic . `FindClique` finds the largest complete subgraph, which in both cases is a triangle, not a .

*Mathematica*’s Boolean computation is a useful tool for doing research in mathematics.

We thank the editor and referee for their generous help, which greatly improved this work.

[1] | E. W. Weisstein. “Ramsey Number” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/RamseyNumber.html. |

[2] | R. L. Graham, B. L. Rothschild, and J. H. Spencer, Ramsey Theory, 2nd ed., New York: Wiley-Interscience, 1990. |

[3] | A. W. Goodman, “On Sets of Acquaintances and Strangers at Any Party,” American Mathematical Monthly, 66(9), 1959 pp. 778-783. www.jstor.org/stable/2310464. |

[4] | G. Beck. “Among Six People, Either Three Know Each Other or Three Are Strangers to Each Other” from the Wolfram Demonstrations Project—A Wolfram Web Resource. www.demonstrations.wolfram.com/AmongSixPeopleEitherThreeKnowEachOtherOrThreeAreStrangersToE. |

[5] | E. Pegg Jr. “Ramsey(3,3,3)>16″ from the Wolfram Demonstrations Project—A Wolfram Web Resource. www.demonstrations.wolfram.com/Ramsey33316. |

R. Cowen, “Using Boolean Computation to Solve Some Problems from Ramsey Theory,” The Mathematica Journal, 2013. dx.doi.org/doi:10.3888/tmj.15-10. |

Robert Cowen is a professor of mathematics emeritus at Queens College, CUNY. He uses *Mathematica* in his own research and has written a textbook with John Kennedy called *Discovering Mathematics with Mathematica*. His web page can be found at sites.google.com/site/robertcowen.

**Robert Cowen**

*Department of Mathematics
Queens College, CUNY
Flushing, NY 11367*

The World Wide Web (WWW), commonly referred to as simply “the web,” is a vast information network accessible via the Internet. Initially proposed in 1989 [1], the web grew out of the work of Tim Berners-Lee while at the European Organization for Nuclear Research, known as CERN.

Two software technologies form the core of the web, namely the HyperText Markup Language (HTML) and the Hypertext Transfer Protocol (HTTP). The HTML (or source) of a web page contains elements known as tags that describe the content of the document. For instance, the anchor tag defines a hyperlink, or link, to another document. Each file (or resource) on the web is identified by a Uniform Resource Locator (URL). A browser or other user agent may request a file via HTTP by specifying its URL. The response—typically the requested file—is again transmitted by HTTP from the web server to the client.

The topology of large-scale complex networks, such as the web, can be explored using graph theoretic methods (see [2] and references therein). Specifically, the web can be viewed as a directed graph, where the web pages are vertices and the hyperlinks are edges. Unfortunately, two problems exist due to the nature of the web: (1) it cannot be indexed (or mapped) in its entirety; and (2) analyzing the corresponding graph would be highly computationally intensive. In fact, a recent announcement [3] suggests that the web may contain at least unique URLs.

Despite its sheer size and complexity, it is possible to extract meaningful measures that can be used to quantify the web’s structure. Using the method of random walks, one can sample the network and examine a subgraph of the web.

In this method, one begins at a specified URL. The client requests the web page at that URL, and the server responds with the document’s HTML. Next, the client extracts all URLs from the web page and chooses one of them at random. Again, the client makes a request for the document at the chosen URL. If the server cannot be reached or the web page cannot be found, the client simply chooses another URL at random. The entire process is then repeated a finite number of times.

The focus of this article is on the application of random walks to the study of the World Wide Web. In Section 2, we provide a brief overview of *RandomWalkWeb*, a package developed to perform random walks on the web and to visualize the resulting data. Next, in Section 3, we build upon the package’s functionality and use it to perform a random walk to sample the web. The collected empirical network data is then analyzed and several properties of the web are estimated. Finally, in Section 4, we provide a summary of our work and give our concluding remarks.

In this section, we give a brief overview of *RandomWalkWeb* and demonstrate some of its functionality.

The *RandomWalkWeb* Package (www.mathematica-journal.com/data/uploads/2013/09/

RandomWalkWeb.zip) is built upon the graph and network functionality introduced in *Mathematica* 8. In addition, connectivity to the web is provided through .*NET/Link* and requires the .NET Framework 2.0 (or higher).

The package consists of 28 public symbols covering the following four areas: (1) data collection and visualization; (2) web page components; (3) operations on URLs; and (4) message logging. Each symbol is fully documented and can be easily accessed through the *Mathematica* help system.

We begin by loading the package.

A single random walk on the web can be executed, as described in Section 1, by using the package’s namesake, `RandomWalkWeb`. The start (or origin) URL is specified, along with the maximum number of steps to be taken.

`RandomWalkWeb` returns a list of successfully visited URLs, displayed here as a column. If the function is evaluated from a notebook-based front end, the walk’s progress is displayed in the window status area. In the event that it reaches a URL with zero valid outgoing links, `RandomWalkWeb` will attempt to backtrack at most one step. The function may exit prematurely; that is, the number of steps returned is less than , if all previous hyperlinks have been exhausted.

One may wish to perform multiple random walks from the same URL. This can be accomplished by using `PerformRandomWalks`. Like `RandomWalkWeb`, we specify the start URL and the maximum number of steps to be taken. Additionally, we pass the number of random walks to be performed as the function’s second argument.

The value that is returned indicates the number of successfully exported data files.

The root directory used to store the data files is specified by `$BaseDataDirectory`. By default, this is set to the current working directory. For each unique URL passed to `PerformRandomWalks`, a folder is created in the root data directory whose name is a 32-character, hexadecimal-formatted MD5 hash of that URL. The successfully visited URLs from each walk are exported as separate human-readable plain text files. The name of each file is a combination of a label, specified by `$DataFilePrefix`, and the walk number.

In the next example, we examine previously collected network data (available at www.mathematica-journal.com/data/uploads/2013/09/RW_Data_1.zip). We begin by specifying both `$BaseDataDirectory` and `$DataFilePrefix`.

The data can be easily imported and visualized by using `RandomWalkGraph`. Here, we are interested in the first seven steps extracted from the second random walk.

The first part of the returned list is a `Graph` object, while the last part contains a list of enumerated vertices. All graphs returned by `RandomWalkGraph` are simple directed graphs; that is, they contain neither loops nor multiple edges.

Similarly, one can construct a graph from multiple data files. Here, we combine all steps from the first and third random walks.

The graphs can be visually enhanced by using the `VertexIcon` option. With `VertexIcon` set to `True`, `RandomWalkGraph` attempts to download each vertex’s associated favorite icon and uses it in place of the default vertex shape. `RandomWalkGraph` also accepts the same options as the built-in function `Graph`.

In this section, we build upon *RandomWalkWeb*’s functionality and use it to perform a random walk to sample the web. The collected empirical network data is then analyzed and several properties of the web are estimated.

**A Note on Timings**

The timings reported in this section were measured on a custom workstation PC using the built-in function `AbsoluteTiming`. The system consists of an Intel® Core i7 CPU 950 @ 4 GHz and 12 GB of DDR3 memory. It runs Microsoft® Windows 7 Professional (64-bit) and scores 1.23 on the *MathematicaMark8* benchmark.

Let us define a new function called `RandomWalkWithJumps`.

The function behaves like `PerformRandomWalks` (see Section 2) except that, instead of returning to the start URL, `RandomWalkWithJumps` “jumps” to a URL chosen at random from the walk’s history. From there, the function attempts to perform an additional steps, and the process is repeated. The number of walks completed is given by , where is the specified number of jumps.

Before using `RandomWalkWithJumps` to collect empirical network data, we set up a few package parameters.

Next, we change the location of the log file (see Appendix).

Finally, we specify a generic user agent string (see Appendix).

For this particular walk, our goal is to perform a total of 150,000 steps.

We now evaluate `RandomWalkWithJumps`.

The function exits after nearly 26 hours on the web. The returned list shows that approximately 89% of the requested number of steps was completed. Additionally, we see that 35,616 unique URLs were visited.

We proceed by importing the collected empirical network data (available at www.mathematica-journal.com/data/uploads/2013/09/RW_Data_2.zip) using `RandomWalkGraph`. The built-in function `Range` is used to generate a complete list of file numbers.

It takes approximately 68 seconds for the function to return the `Graph` object and list of enumerated vertices.

Together, they contain all the information needed to perform a domain-level analysis.

Using *Mathematica*’s built-in functions, we extract a few basic graph measures from our data.

Let be the number of vertices (i.e., domain names) in the graph.

Similarly, let be the number of edges (i.e., links).

In general, the degree of vertex is a count of the edges attached to it. For example, we can use `VertexDegree` to get the number of links connected to a given domain name.

The mean vertex degree of a graph is given by

(1) |

If a vertex is not specified, `VertexDegree` returns a list of degrees for all vertices in the graph. Calculating from our empirical network data is then straightforward.

Here, we report the mean absolute deviation from .

For a directed graph, a vertex has both an in- and out-degree equal to the number of ingoing and outgoing edges, respectively.

Let be the fraction of vertices in a directed graph that have in-degree . Similarly, let be the fraction of vertices with out-degree . We define two functions to calculate these quantities.

Both and can be viewed as the probability that a vertex chosen at random will have in- and out-degree , respectively. For example, using our empirical network data, we can calculate the probability of randomly choosing a domain name with seven ingoing links.

We use the built-in function `Histogram` to visualize the degree distributions.

Evidence suggests [4] that the in- and out-degree distributions of the web exhibit power-law behavior. Here, we attempt to reproduce those results using our collected empirical network data. We proceed by defining the in-degree cumulative distribution function (CDF):

(2) |

where is the maximum vertex in-degree of the graph and is the fraction of vertices with in-degree , as defined earlier. A similar expression can be written for the out-degree CDF, .

Next, we use both functions to generate data spanning their entire degree domains.

The resulting CDF data is visualized using the built-in function `ListLogLogPlot`.

The log-log plots reveal approximate power-law behavior in the degree distributions of the web.

Now, let us assume that the degree distributions are proportional to for , where is some minimum degree for which the power law holds. Following [5], we use the maximum likelihood estimator (MLE) to estimate the power-law exponents:

(3) |

where is the number of vertices with degree . This approximation remains accurate, provided . The standard error on is given by

(4) |

We encapsulate equations (3) and (4) in the following function.

Evaluating `PLExponentEstimated` yields an estimate of the power-law exponent.

Here, the minimum value for was used. We see that and for the in- and out-degree distributions of the web, respectively. These values are in good agreement with those reported in [4].

We examine the distribution of top-level domains (TLDs) in our data. Examples of TLDs include `com`, `net`, and `org`.

First, the domain names are extracted from the list of enumerated vertices. We then extract the last part of each domain name and use `EffectiveTLDNameQ` (see Appendix) to filter the results.

Next, we get the total number of TLDs and tally the list.

The relative frequency of each TLD is then calculated and the data is sorted—both numerically and alphabetically.

Finally, we visualize the resulting data using the built-in function `BarChart`.

Here, the relative frequencies of the top 15 TLDs are compared.

In this article, we have presented *RandomWalkWeb*, a package developed to perform random walks on the World Wide Web and to visualize the resulting data. Building upon the package’s functionality, we collected empirical network data consisting of 35,616 unique URLs. A domain-level analysis was performed and several properties of the web’s structure were measured. We examined the in- and out-degree distributions and verified their approximate power-law behavior. The power-law exponents were estimated to be and , in good agreement with previously published results.

The *RandomWalkWeb* Package relies upon the graph and network functionality introduced in *Mathematica* 8. In addition, the package was designed to take advantage of the client-server communication features provided by the .NET Framework. The choice to use .*NET/Link* affects only a small number of package functions, and it would be a straightforward task to reimplement those functions to utilize other technologies, e.g., *J/Link*.

*RandomWalkWeb* can also be improved and expanded in many different ways. For instance, one could modify the code to allow functions like `RandomWalkWeb` to be evaluated on parallel subkernels. Another possibility would be to construct a full-featured *Mathematica-*based crawler capable of exploring the web’s structure more methodically.

Finally, since *Mathematica* forms the foundation of Wolfram|Alpha, one could easily imagine the web-based computational knowledge engine returning graph theoretic answers to users’ queries regarding the World Wide Web.

This appendix provides some details on the design and implementation of the *RandomWalkWeb* Package. Readers are strongly encouraged to review the fully documented source code.

The first operation in performing a random walk on the web is to request and obtain the HTML (or source) of a web page. The most straightforward way to accomplish this is to use the built-in function `Import`.

There is, however, at least one drawback to this method. A website may be configured to serve different content to different devices (e.g., mobile versus desktop). Various methods exist for detecting the type of device making the request.

One technique in particular involves the server parsing the User-Agent HTTP header sent by the client. The client software uses this header to identify itself to the server during requests. For instance, if we pass a URL to `Import` and evaluate it using *Mathematica *8, a server would see as its user agent string. Unfortunately, this string is immutable.

To circumvent this constraint, *RandomWalkWeb* implements its own HTML import function called `GetSource`. The function uses .*NET/Link* to communicate with the .NET runtime. During HTTP requests, `GetSource` transmits the string assigned to `$UserAgent`.

We can request and obtain the HTML of a web page using `GetSource`.

The first part of the returned list is the responding address. Typically, this address is the same as the requested URL. However, it may differ due to one or more redirects. The last part contains the HTML of the web page.

Let us now set `$UserAgent` to mimic a popular mobile device.

This time we pass a list of URLs to `GetSource` and inspect the responding addresses.

Here, we see that `GetSource` follows the redirects and obtains the HTML from mobile-specific addresses. Having the ability to modify the user agent string allows us to perform random walks on the so-called “mobile web.”

*RandomWalkWeb* contains several functions that perform useful operations on URLs. One in particular, `DomainName`, is used both in standalone form and internally by `RandomWalkGraph` and related functions. As its name suggests, it extracts the domain name from the specified URL.

At the heart of `DomainName` lies a list of known effective TLDs, or public suffixes, that is imported from a data file and stored in memory during the package’s initialization. Examples of effective TLDs include `com`, `co`.`uk`, and . The file is located in the Data folder under the package’s root directory. It consists of a base list [6] augmented by user-specified additions.

We can evaluate `$ETLDNInfo` and inspect the last part of the returned list to determine the number of effective TLD names in the data file.

`DomainName` works by first splitting the hostname into a list of components.

Next, the function takes the last part of the returned list and uses `EffectiveTLDNameQ` to test whether the string is a known effective top-level domain. If it is, the next-to-last part of the list is prepended to the string and joined with a dot (a period). Again, the string is tested. The process of growing and testing the string continues until `EffectiveTLDNameQ` gives `False`. The result is the domain name.

If the effective TLD cannot be determined, `DomainName` returns the equivalent of `Hostname`.

If message logging is enabled (see later in this appendix), `DomainName` logs the error and hostname for later review. The user can then decide whether to add the missing effective TLD to the appropriate section of the data file.

During code development, it is often necessary to examine detailed error messages (e.g., .NET exceptions) or to trace the execution path of a function. This can be especially difficult if the function is called repeatedly hundreds or even thousands of times. For these reasons, several of the functions in *RandomWalkWeb* have been designed to write error, informational, and debug-level messages to a plain text file.

By default, messages are written to located in the directory given by `$TemporaryDirectory`. Evaluating `$LogFileName` shows the fully qualified name of the log file.

One is free to change the name or location of the file. Also, message logging can be disabled entirely by assigning an empty string to `$LogFileName`.

We use `LogMessage` to write a message to the log file, specifying its level of importance.

We can verify that the message was appended to the file.

The author is grateful to his family for their unwavering patience and support.

[1] | T. Berners-Lee. “Information Management: A Proposal.” (May, 1990) www.w3.org/History/1989/proposal-msw.html. |

[2] | M. Newman, Networks: An Introduction, Oxford, UK: Oxford University Press, 2010. |

[3] | J. Alpert and N. Hajaj. “We Knew the Web Was Big…” Google Official Blog (blog, Google, owner). (Jul 25, 2008) googleblog.blogspot.com/2008/07/we-knew-web-was-big.html. |

[4] | A. Barabási, R. Albert, and H. Jeong, “Scale-Free Characteristics of Random Networks: The Topology of the World-Wide Web,” Physica A, 281(1-4), 2000 pp. 69-77.doi:10.1016/S0378-4371(00)00018-2. |

[5] | A. Clauset, C. Shalizi, and M. Newman, “Power-Law Distributions in Empirical Data,” SIAM Review, 51(4), 2009 pp. 661-703. doi:10.1137/070710111. |

[6] | Mozilla Foundation. “Public Suffix List.” (Aug 2, 2012) publicsuffix.org. |

T. Silvestri, “Random Walks on the World Wide Web,” The Mathematica Journal, 2013. dx.doi.org/doi:10.3888/tmj.15-9. |

Todd Silvestri received his undergraduate degrees in physics and mathematics from the University of Chicago in 2001. As a graduate student, he worked briefly at the Thomas Jefferson National Accelerator Facility (TJNAF), where he helped to construct and test a neutron detector used in experiments to measure the neutron electric form factor at high momentum transfer. From 2006 to 2011, he worked as a physicist at the US Army Armament Research, Development and Engineering Center (ARDEC). During his time there, he cofounded and served as principal investigator of a small laboratory focused on improving the reliability of military systems. He is currently working on several personal projects.

**Todd Silvestri**

*New Jersey, United States*

*todd.silvestri@optimum.net*

The standard definitions of the complete elliptic integrals of the first and second kind (see [1], [2], [3], [4]) are respectively:

(1) |

In *Mathematica*, these are and .

We also have

(2) |

(3) |

The elliptic singular moduli is defined to be the solution of the equation

(4) |

In *Mathematica*, is computed using .

The complementary modulus is given by . (For evaluations of see [7], [8], [9]).

We need the following relation satisfied by the elliptic alpha function (see [7]):

(5) |

Our method requires finding derivatives of powers of the elliptic integrals and that can always be expressed in terms of , , and . This article uses *Mathematica* to carry out these evaluations.

The function is not widely known (see [7, 10]). Like the singular moduli, the elliptic alpha function can be evaluated from modular equations. The case is given in [7] Chapter 5:

(6) |

In view of [7], [11], and [5], the formula for is

(7) |

where is a root of the polynomial equation

(8) |

In the next section, we review and extend the method for constructing a series for based on . These Ramanujan-type formulas for , are presented here for the first time. The only formulas that were previously known are of orders 1, 2, and 3 ([12], [13]). There are few general formulas of order 2 and only one for order 3, due to B. Gourevitch (see references [14], [15], [5], [16], [17], [18]:

(9) |

In the last section we prove a formula for the evaluation of in terms of .

We have (see [16]):

(10) |

This is the *Mathematica* definition.

Define , , such that

(11) |

It turns out that

(12) |

Here are the *Mathematica* definitions for for .

Consider the following equation for the function :

(13) |

Set ; then and , for suitable values of , is a function of and , so is an algebraic number when . The and can be evaluated from (13). Higher values of and give more accurate and faster formulas for and .

The general formula produced by our method for is

(14) |

This computes the polynomial in the variable in the sum (13).

To find the , the function `Arules` replaces by and by and sets all the Taylor expansion coefficients with respect to to 0.

Choose `M` large enough to get a solution for all the for . (Here and .)

Now that we have the `A[i]`, this computes the sum on the left-hand side of (13).

This computes the right-hand side of (13).

Example 1. From [19] and [7], for and , we have and . Hence we get the formula

(15) |

We verify this numerically.

Example 2. Here is another example for that we verify numerically.

The coefficients of and the parameters for the formula are obtained using the same method as for . (The same can be done as well for , of course.) Higher values of and give more accurate and faster formulas for and .

For we get

(16) |

This calculates the .

Example 3. For ,

(17) |

Example 4. For , we have and ; then

(18) |

We verify this numerically.

Example 5. For , we have and ; then

(19) |

It is clear from the results in the previous section that getting rapidly convergent series for and its even powers requires values of the alpha function for large , say (see [14], [20], [5]). In this section we address this problem.

From (4), (7), and [2] pages 121-122, Chapter 21, if we set , , , , then

(20) |

From the duplication formula

and

equation (20) becomes

(21) |

Setting

(22) |

gives the following proposition.

(23) |

This connects Ramanujan’s results of Chapter 21 in [2] with the evaluation of the alpha function and the evaluations of . Solving (23) with respect to gives

Equations (21), (22), and (23) give another interesting formula,

where

(24) |

Entry 4 of [2], p. 436 is

(25) |

where and .

Set

(26) |

where is the Rogers-Ramanujan continued fraction (see [2], [21], [22]):

(27) |

and

(28) |

this gives

(29) |

and hence the evaluation

(30) |

But for the evaluation of the Rogers-Ramanujan continued fraction, from [22] we have

If and is a positive real, then

(31) |

with

(32) |

(33) |

(34) |

with (see [22])

(35) |

In some cases, the next formula from [9] is very useful:

(36) |

Here the function is , where , , and are as defined in [9] and is the iterate of .

The coefficient was defined in (24) and occurred in (32); also satisfies the equation

If we know and , we can evaluate from (31) and then we can evaluate .

The following conjecture is most compactly expressed in terms of the quantity

(37) |

The function is the -invariant (see [23], [8]). For more properties of and see [24].

Numerical results calculated with *Mathematica* indicate that whenever , then .

For a given and , , or , if the smallest nested root of is , then we can evaluate the Rogers-Ramanujan continued fraction with integer parameters.

**1.** When ,

(38) |

with , where , , are positive integers.

**2.** When ,

a) If , then

(39) |

where

(40) |

and where is the positive integer solution of . Hence and is a positive integer. The parameter is a positive rational and can be found directly from the numerical value of .

b) If , then

(41) |

where we set . Then a starting point for the evaluation of the integers , is

(42) |

the square of an integer.

**3.** When , then we can evaluate .

The degree of is 8 and the minimal polynomial of is of degree 4 or 8 and symmetric. Hence the minimal polynomial can be reduced to at most a fourth-degree polynomial and so it is solvable. With the help of step 2, we can evaluate .

(43) |

where , , , , , are integers, and

(44) |

Here are some values of that can found with the *Mathematica* built-in function `Recognize` or by solving Pell’s equation and applying the conjecture.

(45) |

(46) |

(47) |

(48) |

(49) |

(50) |

(51) |

(52) |

(53) |

(54) |

Example 6. If , from (54) we have , hence

(55) |

(56) |

Hence

(57) |

[1] | M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions, New York: Dover, 1972. |

[2] | B. C. Berndt, Ramanujan’s Notebooks, Part III, New York: Springer-Verlag, 1991. |

[3] | J. V. Armitage and W. F. Eberlein, Elliptic Functions, New York: New York: Cambridge University Press, 2006. |

[4] | E. T. Whittaker and G. N. Watson, A Course of Modern Analysis, 4th ed., Cambridge: Cambridge University Press, 1927. |

[5] | N. D. Bagis and M. L. Glasser, “Conjectures on the Evaluation of Alternative Modular Bases and Formulas Approximating 1/,” Journal of Number Theory, 132(10), 2012 pp. 2353-2370. |

[6] | N. D. Baruah and B. C. Berndt, “Eisenstein Series and Ramanujan-Type Series for ,” Ramanujan Journal, 23(1-3), 2010 pp. 17-44.link.springer.com/article/10.1007/s11139-008-9155-8. |

[7] | J. M. Borwein and P. B. Borwein, Pi and the AGM, New York: Wiley, 1987. |

[8] | D. Broadhurst, “Solutions by Radicals at Singular Values from New Class Invariants for .” arxiv.org/abs/0807.2976. |

[9] | N. Bagis, “Evaluation of Fifth Degree Elliptic Singular Moduli.” arxiv.org/abs/1202.6246v1. |

[10] | E. W. Weisstein, “Elliptic Alpha Function” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/EllipticAlphaFunction.html. |

[11] | J. M. Borwein and P. B. Borwein, “A Cubic Counterpart of Jacobi’s Identity and the AGM,” Transactions of the American Mathematical Society, 323(2), 1991 pp. 691-701. www.ams.org/journals/tran/1991-323-02/S0002-9947-1991-1010408-0/S0002-9947-1991-1010408-0.pdf. |

[12] | N. D. Baruah and B. C. Berndt, “Ramanujan’s Series for Arising from His Cubic and Quartic Theories of Elliptic Functions,” Journal of Mathematical Analysis and Applications, 341(1), 2008 pp. 357-371. doi:10.1016/j.jmaa.2007.10.011. |

[13] | N. D. Baruah, B. C. Berndt, and H. H. Chan, “Ramanujan’s Series for : A Survey,” American Mathematical Monthly, August-September, 2009 pp. 567-587. www.math.uiuc.edu/~berndt/articles/monthly567-587.pdf. |

[14] | N. Bagis, “Ramanujan-Type Approximation Formulas.” arxiv.org/abs/1111.3139v1. |

[15] | S. Ramanujan, “Modular Equations and Approximations to ,” Quarterly Journal of Pure and Applied Mathematics, 45, 1914 pp. 350-372. |

[16] | W. Zudilin, “Ramanujan-Type Formulae for : A Second Wind?” arxiv.org/abs/0712.1332. |

[17] | E. W. Weisstein, “Pi Formulas,” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/PiFormulas.html. |

[18] | The Mathematics Genealogy Project. “Jesús Guillera.” (Jul 17, 2013) genealogy.math.ndsu.nodak.edu/id.php?id=124102. |

[19] | E. W. Weisstein, “Elliptic Lambda Function” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/EllipticLambdaFunction.html. |

[20] | B. C. Berndt and H. H. Chan, “Eisenstein Series and Approximations to ,” Illinois Journal of Mathematics, 45, 2001 pp. 75-90. www.math.uiuc.edu/~berndt/publications.html. |

[21] | B. C. Berndt, Ramanujan’s Notebooks, Part V, New York: Springer-Verlag, 1998. |

[22] | N. D. Bagis, “Parametric Evaluations of the Rogers-Ramanujan Continued Fraction,” International Journal of Mathematics and Mathematical Sciences, #940839, 2011. doi:10.1155/2011/940839. |

[23] | B. C. Berndt and H. H. Chan, “Ramanujan and the Modular -Invariant,” Canadian Mathematical Bulletin, 42(4), 1999 pp. 427-440. cms.math.ca/10.4153/CMB-1999-050-1. |

[24] | N. Bagis, “On a General Polynomial Equation Solved by Elliptic Functions.” arxiv.org/abs/1111.6023v1. |

N. D. Bagis, “A General Method for Constructing Ramanujan-Type Formulas for Powers of ,” The Mathematica Journal, 2013. dx.doi.org/doi:10.3888/tmj.15-8. |

Nikos D. Bagis is a mathematician with a PhD in Mathematical Informatics from Aristotle University of Thessaloniki.

**N. D. Bagis**

*Stenimahou 5 Edessa Pellas
58200 Greece
*

Euclid’s *Elements* marked a revolutionary achievement in the development of mathematics and is by far the most famous and most often printed mathematical work of all time, making its author the most celebrated mathematician ever. Bertrand Russell said about this work, “I had not imagined that there was anything so delicious in the world.” The thirteen books of the *Elements* deal with a systematic gathering and orderly presentation of the mathematical thought up to when it was written, around 300 BC. Euclid’s merit consists in offering a wealth of original and fundamental ideas from a few axioms, using a methodology involving definitions, lemmas, theorems, corollaries, and postulates. He introduced a style of proof for mathematics. The *Elements* contains “the oldest nontrivial algorithm that has survived to the present day” [1], namely Euclid’s algorithm to compute the GCD of two natural numbers. Euclidean geometry is based on the tradition of the *Elements*. Non-Euclidean geometry arose from trying to prove the fifth (or the parallel) postulate, which states that through a given point there can pass only one line parallel to another given line. After centuries of unsuccessful attempts at deriving this postulate from the others, its independence was eventually demonstrated by Beltrami in 1868. In non-Euclidean geometry, this postulate is negated (either by allowing none or a multiplicity of parallels), following the ideas of Bolyai, Lobachevsky, Gauss (who coined the term non-Euclidean geometry), Riemann, Klein, and Hilbert, among others.

The approach to computational non-Euclidean geometry in this series of articles is toward the design of algorithms arising from problems set in the context of non-Euclidean geometry. The emphasis is placed on verification and algorithmic implementation, visualization, and testing. In the opinion of the author, inversive geometry is the natural starting point to introduce non-Euclidean ideas. The author of [2] describes the technique of inversion as a “dark art.” A suitable interpretation of this description is offered by [3] as “an advanced technique, which can offer considerable advantage in solving certain problems.” This article examines the basic properties of inversive geometry, starting from the introduction of involutions and the family of generalized circles, the inversion of segments, arcs, triangles, and quadrilaterals with applications to Nicomachus’s theorem, the inversion of tilings made by regular polygons, and an inversive spirograph. This work extends and complements the material in [4, 5, 6].

A transformation that is not the identity with the property that is called an *involution* or an *involutory transformation*, *self-inverse*, or *of period two*. Familiar examples of involutions are multiplication by and the taking of reciprocals in arithmetic, taking complements in set theory, conjugation of complex numbers (i.e., , if and are real), geometrical reflection in a line, the matrix transpose and inverse, and the mapping of a number into , with an arbitrary number.

An important example of an involution involves permutations. A permutation under the operation of group composition is an involution if it is of order 2; that is, if it can be written as a product of disjoint transpositions (cycles of length 2). For example, the 7-permutation maps , , , , , , ; it factors into cycles as or, leaving out the fixed points 2, 3, 4, as .

The number of involutory -permutations grows quickly with .

These numbers can also be readily obtained by solving a recurrence equation [7].

Consider the involution *reversion*, , which maps a nonzero complex number to its reciprocal. Let be the unit circle given by the equation or , with . It is easy to see that maps the interior of to its exterior and vice versa; that is, iff . Moreover, iff . Reversion maps the number into the number , where is the conjugate of . Considering a complex number as a point in the complex plane, if a point is close to the origin, its reversion is far away, and vice versa.

The involution *inversion*, , is similar to reversion. For inversion, and are collinear for nonzero . The point inverts to the point .

The following `Manipulate` compares reversion, inversion, and some similar transformations. Drag the point ; the arrow starts at and ends at for the chosen transformation ; the interior of is colored blue. The transformations can be expanded into the form , with and real.

A linear transformation maps the family of lines to itself. The following theorem suggests a family that may be preserved by inversion.

**Theorem 1**

Points that satisfy the equation , with , , , real numbers, invert into points that satisfy the equation .

Applying the transformation corresponding to inversion of Cartesian points verifies this.

Let be the set of points that satisfy the first equation in theorem 1. This four-parameter family includes points (for instance, when and ), lines (when ), circles (when ), the whole plane (when ), and the empty set (when ).

Define a *generalized circle* to be a line or circle. (The function `genCircle` defined below constructs the graphics objects to draw the line or circle.) Under inversion, the generalized circle transforms into the generalized circle . If , (and , are not simultaneously 0), then is a line. If , the equation corresponds to a circle of center and radius , provided that . If , passes through the origin. For any nonzero , is the same as . Other families of polynomials in two variables that include generalized circles do not transform their members into members of the same family; that is, they are not closed under inversion. For instance, the six-parameter family of equations includes conic sections, but some invert into quartics.

**Theorem 2**

The parameters needed for the description of such a line are obtained by solving the following equation.

**Theorem 3**

This result is readily obtained by simply expanding the Cartesian equation of such a circle.

From a structural point of view, if a generalized circle is mapped into itself, it is said to be *preserved* by inversion. (Warning: that is not the same as saying that a point on a generalized circle is mapped to itself; for instance, a rotation about the center of a circle preserves the circle but moves each point of the circle.) In that case, . If , the line passes through the origin and each point is inverted into its negative. If , and , the circle is preserved. Assume its center is at a point of the form . Then, in particular, the point is inverted into the point .

Hence ; that is, the origin, center, and either of the two intersection points of the circle and the unit circle form a right triangle; hence the circle is orthogonal to . This condition turns out to be necessary and sufficient for a generalized circle to be preserved. Theorem 1 implies the following results, which also apply conversely.

- Any circle not passing through the origin inverts into a circle not passing through the origin (in fact passing through its intersection points with the unit circle , if any).
- Any circle passing through the origin inverts into a line not passing through the origin (in fact passing through its intersection points with , if any; in general, this line is parallel to the tangent of the circle at the origin).
- Any line passing through the origin is preserved by inversion (any point is mapped into its negative).
- Any line not passing through the origin inverts into a circle passing through the origin (and its intersection points with , if any).

To make inversion continuous, define the inversion of the origin to be a legal point, the “point at infinite,” , making the phrase “nonzero” unnecessary when talking about inversion. Inversion is thus a one-to-one map of the extended plane.

Conversions of generalized circles to and from graphics primitives are handled by the following functions.

For example, the following `Manipulate` shows how a line and a circle behave under inversion. To vary their positions, drag the two small disks. For a line, the disks are points on the line; for a circle, one disk is the center and the other is on the circumference. You can also choose a family of parallel lines or a family of concentric circles. The arrows show the action of the inversion on the two control points.

Concentric circles do not invert into concentric circles, and the center of a circle does not invert into the center of its inversion (although the two centers are collinear with the origin). The center of the inverted circle can even be outside the original circle! (This particularly applies to ; its interior gets mapped into its exterior and its center gets mapped to .) The next section shows how to locate the center of the inversion of a given circle.

As a circle is an ellipse, it is interesting to see the inversion of a general ellipse. The following `Manipulate` shows such an inversion; you can drag the center of the ellipse with a locator and vary the lengths of its axes with a pair of sliders. Contour lines showing concentric ellipses are optional; increase the zoom slider if the display is all one color.

Up to now, inversion has been in the unit circle; it can be generalized to use any circle . Temporarily using complex numbers to represent points in the plane, assume has center and radius . The generalization takes three steps: first transform into by translating to the origin and scaling by . Second, invert in . Third, scale back by and translate back by . As these operations are rigid motions, all the properties of inversion apply to the generalized inversion. Generalized inversion is implemented by the function `invZ`.

The explicit Cartesian coordinates for inverting points come from expanding the following expression.

Then, operating with Cartesian coordinates, the function `inver` inverts a given point in .

For example, the functions give matching results when applied to corresponding representations of the same point in the plane. The latter form is more convenient here.

An interesting property of inversion is adopted by many authors as an alternative way of defining inversion. Let be the inversion of the point in the circle . It is easy to verify that the product of the distance of to and the distance of to is a constant equal to .

Therefore, define the point as the inverse of if is that unique point such that its distance to times the distance of to is [8].

Before obtaining the Cartesian coordinates corresponding to inverting circles and lines, first consider the following problem. What is the center of the circle through three different noncollinear points , , ?

The function `disSq` computes the square of the distance between two given points , and is used by the function `cir3` that computes the required center.

For example, here is an explicit formula for the center of such a circle through , , and .

This is its radius.

The denominators of the expressions for the center and radius contain a factor that is zero only when , , and are collinear.

Now consider an arbitrary circle that does not pass through the center of . Invert in to and then invert in to . Then the inversion of in is a circle with center . The equality of the following two results verifies this property.

**Theorem 4**

The vertical bars in the last expression have two different meanings: the modulus of a vector and the absolute value of a number. Theorem 4 arises from using the property mentioned in the previous paragraph. First we obtain the square of the radius of .

Match the center of with to get the following.

So a circle is inverted into a concentric circle if is outside the circle of inversion and . This implies that and are orthogonal. Therefore, a circle inverts into a concentric circle (and necessarily of the same radius, i.e. itself) iff it is orthogonal to . To visualize these ideas, consider the following `Manipulate` that inverts a pattern of tangent circles.

The inversion of a polygon is a closed sequence of arcs of circles that go through the center of inversion. Consider the problem of finding an arc through three noncollinear points , , and . As stated, this problem has many solutions; by assuming one of the points is not an endpoint of the arc, there is a unique answer. The function `seg3` constructs such an arc and includes an argument to control whether the arc passes through the point .

The following `Manipulate` applies the function `seg3` to any three noncollinear points, optionally passing through one of them.

Let the inversion of the point be and the point be . There are four cases for the inversion of the finite line segment in the circle .

- If , the segment is a point and inverts into the point .
- If and , the segment inverts into a ray starting at , going away from .
- If , , and are collinear, there are two subcases:
- If and are not collinear with , consider the circle passing through , , and . The inversion of is the arc joining and that does not pass through .

If is in between and , the inverse is the union of two rays, one starting at , the other starting at , and both going away from .

Otherwise, and are on the same side of , and the inverse of is a line segment joining and on the same side of as and , but now switched around.

Similar results apply to the inversion of an arc. The following `Manipulate` shows all these cases as they apply to line segments and arcs. The initial segments or arcs are shown in red and their inversions in blue. Control locators are drawn in red. When a segment/arc is such that its corresponding line/circle passes through , say that it does so *directly* if is part of the segment/arc and *indirectly* otherwise.

The function `invSeg` encapsulates the four cases appearing in the inversion of a segment mentioned in the previous section.

The function `invSeg` is used to invert a general triangle and a general quadrilateral. As mentioned before, the inversion of a polygon is a figure made by adjoining arcs of circles that pass through . (The geometry of coincident arcs is the same as the geometry of polygons. Just invert them!) However, the interior of a polygon does not invert to the interior of its inverse. The function `fill` fills this interior for aesthetics.

The following `Manipulate` shows a triangle or quadrilateral and the corresponding filled inversion. You can zoom by changing the radius of the circle of inversion . For what positions of do the sides of a given triangle invert into three congruent circles? Hint: it has something to do with its incircle. What about a quadrilateral?

*Nicomachus’s theorem* [9, 10] states that . This section inverts a well-known pattern showing a proof without words of this identity.

To justify the identity, consider the pattern made by half-squares and squares of increasing integer side lengths. The area of the big square is the square of the sum of the side lengths on the base . On the other hand, this area is the sum of the individual areas of the interior squares, taking into account that two half-squares make a whole square: . Logically these quantities are the same; hence Nicomachus’s theorem.

In order to get a symmetrical version, promote the half-squares into squares.

Add replicas around the lower-left vertex of the unit square made by rotating this pattern 90, 180, and 270 degrees to obtain a symmetrical pattern. Then eliminate repeated squares; moreover, eliminate repeated segments from adjacent squares using `filter`.

The function `nichoSegs` computes the segments needed to form the next pattern shown, which gives a visual proof of Nicomachus’s theorem [10].

The function `nichoSegs` computes the minimum number of segments necessary to produce the pattern. This is the number of segments up to .

The pattern has segments and squares. Not removing unnecessary segments would give segments, so for large , removing unnecessary segments is more than four times better. The following `Manipulate` shows Nicomachus’s pattern in gray along with its inversion in red. You can change and thus by dragging the locator, and you can vary with the slider.

There are only three ways to tile the plane with a regular polygon: using an equilateral triangle, a square, or a regular hexagon. The goal of this section is to show the patterns arising from inverting these tilings. To that end, the functions `ring3`, `ring4`, and `ring6` generate the corresponding set of pieces in a ring surrounding the circle of inversion (in red).

Here is the triangular case for the first level of complexity .

The corresponding functions `inv3`, `inv4`, and `inv6` invert each of the segments forming the tiling. For instance, here is the pattern for the triangular case.

Although the interiors of triangles are not preserved by inversion, they are filled to show the interference patterns they produce. The idea is shown in the next figure; to avoid clutter, only the first four members of are drawn. Lines join the vertices of the triangles to their inversions.

Although the triangles do not overlap, the interiors of their inverses do. In fact, the inversion of the central triangle contains the interiors of all the rest. The next level of complexity renders the following pattern.

Finally, a detail showing the sixth level.

Similarly, in the case of tiling with squares, here are the corresponding functions.

Here is a detail corresponding to the first four squares of the arrangement .

And here is a detail of the sixth level. (The color assignment has to be made explicit and does not rely only on overlapping as it did before.)

The tiling using regular hexagons cannot be colored with two colors, and there are too many segments to place the center of . So here is a line pattern.

The second level of complexity corresponding to the above pattern is the following.

This detail shows the sixth level of complexity, made with shapes formed with six arcs, all passing indirectly through .

Finally, the following `Manipulate` shows an animated circle (orange) rotating inside a circle (pale brown) and the patterns generated by a point at the end of a line at a variable distance from the center of the circle. By varying the center and radius of the inversive circle, you can zoom in; the rotating radial line inverts into an arc orthogonal to the inversive path. You can enlarge the inversion by dragging the center of the inversive circle (light blue).

[1] | D. Knuth, The Art of Computer Programming, Volume 1: Fundamental Algorithms, 3rd ed., Reading, MA: Addison-Wesley Professional, 1997. |

[2] | G. Smith, A Mathematical Olympiad Primer, London: United Kingdom Mathematics Trust, 2008. |

[3] | A. Goucher. “Complex Projective 4-Space.” (Jun 24, 2013). cp4space.wordpress.com/2012/11/04/final-chapter-of-moda. |

[4] | J. Rangel-Mondragon, “Inversive Patterns, Part I: Complex Inversion,” Mathematica in Education and Research, 12(2), 2007 pp. 162-183. |

[5] | J. Rangel-Mondragon, “Inversive Patterns, Part II: Fundamental Properties of Inversion,” Mathematica in Education and Research, 12(4), 2007 pp. 330-354. |

[6] | J. Rangel-Mondragon, “Inversive Patterns, Part III: Common Tangents, Mandalas and Gothic Windows,” Mathematica in Education and Research, 12(4), 2007 pp. 355-376. |

[7] | N. J. A. Sloane. seq. A001189 in The On-Line Encyclopedia of Integer Sequences. oeis.org. |

[8] | H. S. M. Coxeter and S. L. Greitzer, Geometry Revisited, New York: Random House, 1967. |

[9] | E. W. Weisstein. “Nicomachus’s Theorem” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/NicomachussTheorem.html. |

[10] | M Schreiber. “A Visual Proof of Nicomachus’s Theorem” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/AVisualProofOfNicomachussTheorem. |

J. Rangel-Mondragon, “Selected Themes in Computational Non-Euclidean Geometry: Part 1,” The Mathematica Journal, 2013. dx.doi.org/doi:10.3888/tmj.15-7. |

Jaime Rangel-Mondragon received M.Sc. and Ph.D. degrees in applied mathematics and computation from the University College of North Wales in Bangor, UK. He has been a visiting scholar at Wolfram Research, Inc. and held positions in the Faculty of Informatics at UCNW, the College of Mexico, the Center of Research and Advanced Studies, the Monterrey Institute of Technology, the Queretaro Institute of Technology and the University of Queretaro in Mexico, where he is presently a member of the Faculty of Informatics. His current research includes combinatorics, the theory of computing, computational geometry, urban traffic, and recreational mathematics.

**Jaime Rangel-Mondragon**

*UAQ, Facultad de Informatica
Queretaro, Qro. Mexico
*

Negative binomial regression is a type of generalized linear model in which the dependent variable is a count of the number of times an event occurs. A convenient parametrization of the negative binomial distribution is given by Hilbe [1]:

(1) |

where is the mean of and is the heterogeneity parameter. Hilbe [1] derives this parametrization as a Poisson-gamma mixture, or alternatively as the number of failures before the success, though we will not require to be an integer.

The traditional negative binomial regression model, designated the NB2 model in [1], is

(2) |

where the predictor variables are given, and the population regression coefficients are to be estimated.

Given a random sample of subjects, we observe for subject the dependent variable and the predictor variables . Utilizing vector and matrix notation, we let , and we gather the predictor data into the design matrix as follows:

Designating the row of to be , and exponentiating (2), we can then write the distribution (1) as

We estimate and using maximum likelihood estimation. The likelihood function is

and the log-likelihood function is

(3) |

The values of and that maximize will be the maximum likelihood estimates we seek, and the estimated variance-covariance matrix of the estimators is , where is the Hessian matrix of second derivatives of the log-likelihood function. Then the variance-covariance matrix can be used to find the usual Wald confidence intervals and -values of the coefficient estimates.

We will use *Mathematica* to replicate some examples given by Hilbe [1], who uses R and Stata. We start with simulated data generated with known regression coefficients, then recover the coefficients using maximum likelihood estimation. We will generate a sample of observations of a dependent random variable that has a negative binomial distribution with mean given by (2), using , , and . The design matrix will contain independent standard normal variates.

Now we define and maximize the log-likelihood function (3), obtaining the estimates of and . Some experimentation with starting values for the search may be required, and the accuracy goal may need to be lowered; we could obtain good starting values for using Poisson regression via `GeneralizedLinearModelFit`, while is usually between 0.0 and 4.0 [1].

But we arbitrarily set all starting values to 1.0 and successfully find the correct estimates.

Define two helper functions.

Next, we find the standard errors of the estimates. The standard errors are the square roots of the diagonal elements of the variance-covariance matrix , which as mentioned above is given by , where is the Hessian matrix of second derivatives of the log-likelihood function. First, define the Hessian for any function.

Then we find the Hessian and at the values of our parameter estimates.

Finally, these are our standard errors.

We can now print a table of the results: the estimates of the coefficients, their standard errors, and the Wald statistics, -values, and confidence intervals.

We see that in each case the confidence interval has captured the population parameter.

If the dependent variable counts the number of events during a specified time interval , then the observed rate can be modeled by using the traditional negative binomial model above, with a slight adjustment. We note that can also be thought of as area or subpopulation size, among other interpretations that lead to considering a rate.

Since , we make the following adjustment to model (2) above:

which can also be written as:

(4) |

This last term, , is called the *offset*. So in our log-likelihood function, instead of replacing with , we replace with , resulting in the following:

(5) |

Then we proceed as before, maximizing the new log-likelihood function in order to estimate the parameters.

The *Titanic* survival data, available from [2] and analyzed in [1] using R and Stata, is summarized in Table 1, with crew members deleted.

Why did fewer first-class children survive than second class or third class? Was it because first-class children were at extra risk? No, it was because there were fewer first-class children on board the *Titanic* in the first place. So we do not want to model the raw number () of survivors; instead, we want to model the proportion () of survivors, which is the survival rate. So in (4) we need to be the number of cases.

We set up the design matrix, with indicators 1 for adults and males, and using indicator variables for second class and third class, which means first class will be a reference.

Then we set up the dependent variable and the offset.

We define the log-likelihood (5).

Now we maximize it to find the coefficients.

Then we find the standard errors of the coefficients.

And again we can print a table of the results.

But perhaps more useful for interpretation of the coefficients would be the Incidence Rate Ratio (IRR) for each variable, which is obtained by exponentiating each coefficient. For example, out of a sample of adults, we expect that the survival rate, from our model (4), will be while for an identical number of children we expect their survival rate to be . So by dividing the two rates, we obtain the ratio of rates (IRR) to be

which we estimate to be . Thus, our interpretation is that adults survived at roughly half the rate at which children survived, among those of the same sex and class. The standard error of IRR is found by multiplying the estimated IRR by the standard error of the coefficient (see [1]), while a confidence interval for IRR is found by exponentiating the confidence interval for the coefficient. Thus we obtain the following.

We do not need IRR for or , so we drop them and then print the resulting table.

The confidence interval for the variable `class2` contains 1.0, consistent with the lack of significance of its coefficient, and indicating that the survival rate of second-class passengers was not significantly different than that of first-class passengers. We will address this after computing some model assessment statistics and residuals.

Various types of model fit statistics and residuals are readily computed. We use definitions given in [1]; alternate definitions exist and would require only minor changes.

Commonly used model fit statistics include the log-likelihood, deviance, Pearson chi-square dispersion, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC).

We already have the log-likelihood as a byproduct of the maximization process. The deviance is defined as

where is our log-likelihood function (5), and is the log-likelihood function with replacing . For our NB2 model, this simplifies to , where

(6) |

The Pearson chi-square dispersion statistic is given by , while AIC and BIC are defined as

and

We compute these for the *Titanic* data above and display them.

These model assessment statistics are most useful when compared to those of a competing model, which we pursue in the next section after computing residuals.

The raw residuals are of course , while the Pearson residuals are , and the deviance residuals are , as defined in (6).

These residuals can be standardized by dividing by , where the are the leverages obtained from the diagonal of the hat matrix , for equal to the diagonal matrix, with as the element of the diagonal.

Here are the unstandardized residuals for the *Titanic* data.

And here are the leverages and the standardized residuals.

Hilbe recommends plotting the Standardized Pearson residuals versus , with a poor model fit indicated by residuals that are outside the interval when the leverage is high.

We have two Standardized Pearson residuals that are not within the range , one of which has a high leverage. We also recall that the variable `class2` was not significant. Perhaps the model will be improved if we remove `class2`. All that is required is to remove `class2` from the design matrix , remove the corresponding starting value from the maximizing command, and run the model again. We obtain the following assessment statistics and standardized residuals for the revised model with `class2` removed.

We set up design matrix and find the coefficients.

Comparing to the full model, we see that the assessment statistics have improved (they are smaller, indicating a better fit), and the Standardized Pearson residuals with high leverages are within the recommended boundaries. It appears that the model has been improved by dropping `class2`.

The traditional negative binomial regression model (NB2) was implemented by maximum likelihood estimation without much difficulty, thanks to the maximization command and especially to the automatic computation of the standard errors via the Hessian.

Other negative binomial models, such as the zero-truncated, zero-inflated, hurdle, and censored models, could likewise be implemented by merely changing the likelihood function.

The author acknowledges suggestions and assistance by the editor and the referee that helped to improve this article.

[1] | J. Hilbe, Negative Binomial Regression, 2nd ed., New York: Cambridge University Press, 2011. |

[2] | “JSE Data Archive.” Journal of Statistics Education. (Nov 19, 2012) www.amstat.org/publications/jse/jse_data_archive.htm. |

M. L. Zwilling, “Negative Binomial Regression,” The Mathematica Journal, 2013. dx.doi.org/10.3888/tmj.15-6. |

**Michael L. Zwilling**

*Department of Mathematics
University of Mount Union
1972 Clark Avenue
Alliance, OH 44601*