This article presents a compact analytic approximation to the solution of a nonlinear partial differential equation of the diffusion type by using Bürmann’s theorem. Expanding an analytic function in powers of its derivative is shown to be a useful approach for solutions satisfying an integral relation, such as the error function and the heat integral for nonlinear heat transfer. Based on this approach, series expansions for solutions of nonlinear equations are constructed. The convergence of a Bürmann series can be enhanced by introducing basis functions depending on an additional parameter, which is determined by the boundary conditions. A nonlinear example, illustrating this enhancement, is embedded into a comprehensive presentation of Bürmann’s theorem. Besides a recursive scheme for elementary cases, a fast algorithm for multivalued Bürmann expansions and inverse functions is developed using integer partitions. The present approach facilitates the search for expansions of analytic functions superior to commonly used Taylor series and shows how to apply these expansions to nonlinear PDEs of the diffusion type.

For most nonlinear problems in physics, analytic closed-form solutions are not available. Thus the investigator initially searches for an approximate analytic solution. This approximation must be reliable enough to correctly describe the dependence of the solution on all essential parameters of the system. This article aims to show that Bürmann’s theorem can serve as a powerful tool for gaining approximations fulfilling such demands. We have chosen canonical examples [1, 2, 3] from the field of linear and nonlinear heat transfer to illustrate this technique.

A Bürmann series may be regarded as a generalized form of a Taylor series: instead of a series of powers of the independent variable , we have a series of powers of an analytic function

:

Starting at an elementary level, we present a recursive calculation scheme for the coefficients of a Bürmann series. Such a recursive formulation is easily implemented in *Mathematica* and can find the Bürmann expansion for all elementary cases. For instances where we have to deal with series expansions of in terms of powers of functions of the form

that is, functions for which the first derivatives vanish at some point of the complex plane, we approach the limits of the recursive account. To calculate such expansions using *Mathematica* efficiently, we give a generalized formulation of the coefficients of the Bürmann series, using the expansion coefficients of the reciprocal power of an analytic function :

This formulation avoids the time-consuming process of symbolic differentiation usually used. The calculation of the coefficients is based on finding all partitions of the index in terms of the frequencies of the part of ,

These sets of frequencies for the partitions are tabulated by using the function `FrobeniusSolve`.

Once the coefficients are determined by using the tabulated solutions for , the calculation of the coefficients of a generalized Bürmann series is a simple task. A special case of a Bürmann series, representing a function as a series of powers of its own derivative, is of particular importance:

Expansions of this type will be applied to functions defined by integrals. For linear and also for nonlinear processes of heat transfer, these series expansions will serve us to find valuable approximations. This is due to the fact that the integral representation for the error function leads to an expansion in fractional powers of the integrand. It turns out that a similar strategy can be applied to find approximate solutions for nonlinear cases, since these solutions obey integral equations closely related to the integral defining the error function. Finally, by introducing a free parameter, the convergence of a Bürmann expansion can be improved. The free parameter is determined by the boundary conditions. By this procedure, we find reliable analytical approximations for the heat transfer in ZnO [3], comprising only a few terms.

The common analytic solutions to these problems use Taylor series or numerical evaluations, which do not exploit the structure revealed by the integral relation fulfilled by the exact solutions. We mention here that a similar procedure can also be applied successfully to the diffusion of metal cations in a solid solvent [4].

The article is organized in such a way as to offer the formulas to the reader, together with brief remarks concerning their origin. Necessary details of deriving the formulas are displayed in the corresponding appendices.

Bürmann’s theorem [5] states that it is possible to find a convergent expansion of an analytic function as a sum of powers of another analytic function . The simplest form of such an expansion, supposed to be valid around some point in the complex plane, is given by

(1) |

or transferred to another notation, for some purposes more convenient,

(2) |

where the functions and are called the basis functions of the Bürmann series. The functions and have to fulfill certain conditions in order to guarantee the convergence of the series in (1) and (2). These conditions will be discussed later in this article. In their classic work *A Course of Modern Analysis* [5], Whittaker and Watson give a formula for the coefficient (Bürmann coefficient) of a Bürmann series. Transferred to the notation used in (1), their formula is

(3) |

This formula is widely cited by numerous authors [6, 7]. Actually determining the limit value of a higher-order derivative is a cumbersome procedure, which is shown in an example. Expanding the function in powers of around gives the following coefficients , for which CPU time increases dramatically for .

This section shows how to calculate the coefficients of the Bürmann series recursively, which is easier to handle than (3) and more efficient when translated to symbolic programs. If we use (1) and (2) to find convergent series representations of solutions to differential equations, it is important to simplify the algorithms necessary to determine the expansion coefficients.

For basis functions of the general form

(4) |

we get the recursion in terms of the representation used in (1),

(5) |

with the initial coefficient given by

(6) |

Hence, the nested expression for is

(7) |

The recursion (5) is more efficient to calculate than the expression in (3) and is easily implemented.

The Bürmann series for up to order in is calculated with .

We now show the expansion for the same problem shown in the previous section (i.e. expanding around into powers of ). It can be easily expanded to order 25 in a reasonable amount of time. This is the explicit truncated Bürmann series.

The result is validated in terms of a Taylor series. This shows that the error is at least of order 26.

Here are the coefficients as they are calculated with this procedure in their explicit analytic form, according to the definition given in (2).

A useful application of the recursion (5)-(7) to the case of the expansion of the inverse function of around follows immediately: we have and it follows that the inverse function is the Bürmann series of in powers of (see [8]). By writing

(8) |

we obtain

(9) |

The following program calculates the first three coefficients for the inverse function in general, which corresponds to the expression shown in [8].

As an example, the inverse of the sine function is expanded, and the result is displayed for order 11.

Compare it to the result of the *Mathematica* built-in function `InverseSeries` (see [9]).

If we choose the basis function to be equal to the first derivative of , we find, by using formula (5), the recursive expression and the first three coefficients:

(10) |

The idea of expanding an analytic function using its derivative as a basis function is fruitful for cases where the function is defined by an integral. It will be shown that solutions to linear and nonlinear problems of diffusion or heat transfer can be expressed as integrals. We get

To illustrate the advantage of this technique, we choose the expansion of the transcendental function . The function is defined by the integral

Using the results listed in (10) for the Bürmann coefficients, we arrive at once at the expansion around :

Here is the series to order 11.

It can be simplified.

Although the representation given by the recursive formula in (5) is more efficient in terms of CPU time compared to formula (3), there is still the restriction of using basis functions with nonvanishing first derivative at the expansion position , since appears in the denominator of the coefficients in (5)-(7). To overcome this limitation, we introduce an alternative representation of the Bürmann coefficients based on a combinatorial approach [10] that can be generalized.

Actually, the Bürmann expansions are related to Taylor series of reciprocal powers of analytic functions represented by

(11) |

The function is uniquely determined by the basis function . We will show that explicit expressions for the expansion of and the corresponding Bürmann coefficients as defined in (2) can be derived using the coefficients . The Bürmann expansion in this representation reads as

(12) |

The formula for the Bürmann coefficient in terms of the coefficients is thus given by

(13) |

The explicit formulation of generalized Bürmann series using powers of functions with derivatives, vanishing up to the order at , is given by

(14) |

and the expression for the Bürmann coefficient is

(15) |

Also, the special case of expanding the inverse function of can be derived in this general way, and one gets

(16) |

The standard case (i.e. and ) is obtained by setting in (14)-(16). By using the combinatorial approach as shown in the next subsection, one can also evaluate the coefficients in expansions resulting from the theorem of Teixeira [5], which is a generalization of Bürmann’s theorem to singular functions.

The approach is explained in more detail and demonstrated with examples coded in *Mathematica* in the following.

A partition of the positive integer is a sequence of positive integers with such that , for example, , a partition of 10 that is usually written as . The number of times a part occurs in a partition is its frequency . In the example, the parts occur with frequencies ; the example partition can be written as .

For a partition of , define the vector of frequencies, . Then

(17) |

for convenience, define

(18) |

In the example, and .

In *Mathematica*, gives all possible partitions of .

The frequencies of the parts, , can be found with .

However, `FrobeniusSolve` is slow for integers larger than about 30.

The function `PartitionsM`, based on `IntegerPartitions`, is significantly faster.

The function `PartitionsJ`, based on an undocumented but highly efficient function [11], is even faster.

According to

(19) |

the coefficients of reciprocal powers of analytic functions can be derived explicitly on the basis of combinatorics and analysis, recapitulated in appendix A. When is an integer, the coefficients are

(20) |

The case when is rational, , is relevant for Bürmann expansion with basis functions with vanishing derivatives . In that case,

(21) |

Now we show how to calculate (21) symbolically. For example, choosing the function , let us calculate .

We use the fact that the *Mathematica* functions `Times`, `Plus`, and `Total` work with empty lists.

To avoid the undefined expression , the differentiation is performed analytically first on the symbolic function . Then raising to the power of (which can be zero) is performed, and finally the symbolic function is substituted out by the function .

Here are the for the symbolic function expanded at .

Now we expand at . For convenience, define `auxf`.

While the expansion is valid for complex-valued functions, the plot shows only the real part of .

Explicit expressions for the Bürmann coefficients as they are defined in (2) can be defined with respect to the coefficients .

Again using (see appendix A), we can formulate the general expressions for Bürmann series using functions with vanishing derivatives . For instance, series of powers of functions of this type can give convergent expansions for functions that are defined by integrals, like the error function

(22) |

which plays a key role in the theory of linear and nonlinear heat transfer [1]. Defining the integrand as the basis function of a Bürmann series, as explained in (10), we will find a rapidly converging series representation of .

All the results of the previous section can be applied to get a formula for the Bürmann coefficients that is efficiently implemented in a simple function. The starting point of the derivation of this expression is a formulation of the Bürmann expansion in terms of a complex contour integral, as it is given in [5]. This approach can be found in various presentations [6, 7]. The evaluation of the integral representation of the Bürmann expansion results in

(23) |

The formula for the Bürmann coefficient in terms of the coefficients is thus

given by

(24) |

The function `fbür` shows how to apply (23) and (24) to the expansion of in powers of , the same example as presented in the first section. The series is expanded up to order 15, so that the error is at least of order 16.

For the special case of the inverse function of given by (12), in using the transformations indicated in (8),

(25) |

So the expansion of the inverse function can be expressed as

(26) |

Equation (26) is the compact formulation of a result given by Morse and Feshbach [12]. As a result of our approach, we have developed formulas for Bürmann coefficients and for the expansion coefficients of inverse functions that reveal the close relationship of these coefficients to the coefficients for reciprocal powers of an analytic function defined in (20). In the following section, we present a generalization of Bürmann’s theorem, using the solutions of equation (17).

Inspecting formulas (23) and (26), we notice that they cannot be evaluated for cases where or vanishes. This shortcoming must be overcome, for in some cases of interest we will be forced to find Bürmann series using basis functions whose first derivatives at vanish:

(27) |

To this end, define the multivalued function

(28) |

which is cast into the form

(29) |

The function in (29) can be expanded into a Taylor series with , and hence (28) fulfills the condition violated by . Thus, instead of expanding in powers of , we expand in powers of . A reformulation of the contour integral [5] results in the generalized form of Bürmann’s theorem given in (14). Actually, the introduction of the root function (28) in (14) leads to several solution branches. For real-valued functions , the use of the `Sign` and `Abs` functions in the following code extracts the correct branch of the root function for numerical purposes. For a formal proof of the equivalence of the Bürmann series and the Taylor series for , the `Sign` and `Abs` functions can be omitted.

As an example, we calculate an expansion of up to order 15 according to (14) in powers of , a basis function with a vanishing first derivative at (i.e. ):

Use this definition for the formal proof of equivalence.

Use this definition for numerical purposes, such as for plotting.

A faster convergence can be achieved by using a basis function of the form , for which the first and second derivatives vanish at the point (i.e. ).

Using formula (14), it is easy to deduce the expansion of the inverse function of an analytic function of the form

(30) |

The inverse function of comes from (14) by setting and :

(31) |

The error function

is the first example demonstrating the efficiency of Bürmann series using the first derivative as a basis function. We define the function and the basis function by

The error function will be expanded around the origin , where we find that . This expansion thus calls for the application of the generalized form of the Bürmann expansion given in (14). Hence we have to set, according to (28),

To evaluate (14), we use the following relations for the derivatives of the integrand:

The result of this calculation performed up to order nine in is

(32) |

A function calculating the expansion (32) is given below. To show that this approach is superior to a common Taylor expansion in a plot, we calculate the power series in up to order 10.

The plot shows that the series in (32) has only a small constant offset error for larger values of , whereas the Taylor expansion dramatically deviates for smaller values of , although it converges uniformly for all values of . The series in (32) converges uniformly for all and gives the exact value for the error function. The rearrangement of terms, leading to , is thus justified. Even for the lowest order, we will find a result that shows no unbounded error, unlike the Taylor series.

Due to the uniform convergence of (32), we can write:

(33) |

Using and in (33), we find . So by reordering the sums, we automatically get rid of the offset error at infinity. In fact, we can furthermore achieve a practical application of (33) by keeping only a few coefficients . For example, using only and requesting the correct slope at , one gets an approximation of the error function with a relative error smaller than 1.2%. Taking additional terms of (33) with meaningful conditions further improves the approximating series in (33), as also shown in the following plot (choosing and ).

This section applies the concept of Bürmann series to get solutions of nonlinear differential equations. After a short introduction, an example from the field of the diffusion type is presented.

In studying nonlinear ordinary differential equations, we cannot, in general, expect to find an exact solution expressible in terms of commonly used algebraic or transcendental functions. This difficulty is illustrated by the equations studied by Fujita, Lee, and Crank, which we will encounter later [13-17]. For the case of a general nonlinear second-order equation

(34) |

where denotes an analytic function of its arguments, one approach is to cast a solution into the form of a series of powers of the independent variable . Depending on the complexity of the expression on the right-hand side of the equation (34), determining the coefficients of this expansion by collecting the powers of and solving the resulting system of equations is a cumbersome procedure. Equations of the form (34) are often encountered in physics, and either their solutions can be determined numerically or their behavior is known qualitatively from experiments. Guided by this prior knowledge about the nature of , we can eventually construct or guess a function that is a more favorable base for a power series than the independent variable itself. We have to cast the representation of the solution of equation (34) into the form

and so we expand the solution using the recursive formula (5). The code below gives an expansion of the first four terms.

The free parameter occurring in will then be determined by the boundary conditions for . In the following, this kind of expansion with suitable basis functions will be applied to the solution of a problem of nonlinear heat transfer, and its convergence will be treated as far as relevant for this special case. For all the cases investigated in this article, we apply the recursive approach (7), since the structure of the chosen basis functions is relatively simple. For more sophisticated basis functions, the combinatorial formulas (12) and (14) have to be implemented in order to reduce CPU time. This may be done in future investigations.

As a canonical example, we study the partial differential equation of transient nonlinear heat transfer with temperature-dependent thermal conductivity . We demonstrate the application of Bürmann series to the practical problem of heat transfer in ZnO ceramics The half space is filled by this material, which has initial constant temperature , and the temperature as . At the surface temperature at is instantaneously raised to a constant temperature . Using the results of measurements of the thermal conductivity in ZnO [3], we can formulate the problem by writing

(35) |

Using the transformation

Kirchhoff’s transformation

and Boltzmann’s transformation

we find a nonlinear ordinary differential equation that has been extensively studied by Fujita [13-15], Lee [16, 17], and Crank [2]:

(36) |

While the first boundary condition in (35) is regular (i.e. and ), the second one is given in terms of the asymptotic expression . The equivalent value of the derivative has to be estimated, which is performed by calculating the time evolution of the total thermal energy of the semi-infinite half space [18]. The approximate value of this energy integral (in terms of an algebraic expression) is determined in appendix B, where we use the Bürmann series (10) to approximate the energy integral. The explicit expression is displayed in appendix B, equation (56), which describes the dependence of on the parameters with a relative accuracy of 0.37%.

To apply the methods developed in the preceding sections, we establish a convergent iteration scheme for equation 1 of (36). To this end, we transform this equation into an integral equation:

(37) |

The solutions for 1 of equation (36) are strictly decreasing functions of , which implies that for all positive values of we have

From this relation, we conclude that we can define a system of functions , , , … of the form

(38) |

Taking the first term of the expansion (32) of the error function, we have

(39) |

Since the system (39) converges toward the solution , a possible choice for a basis function would be . Instead of , we prefer to introduce a less complicated function that simplifies the calculations. According to (39), this function has to fulfill the following conditions:

The function can be constructed in such a way as to guarantee that all essential boundary conditions are fulfilled by :

The parameter is as yet undetermined. A useful basis for an expansion is obtained by taking

(40) |

which will lead, according to (6) and (7), to the Bürmann series calculated as follows.

Expressed in terms of , this is

(41) |

If we consider in (40) and (41) as a free parameter, we have to investigate how its choice influences the convergence of the corresponding Bürmann series. We write, using (37) and omitting the index for ,

All integrands are representable by uniformly convergent series expansions in powers of , and thus the argument is similar to the one used in proving the convergence of the system (39). Furthermore, we observe that the expansion of the exponential function converges more quickly for higher values of . Thus, as long as can be chosen to guarantee the condition

for some value , an iterative system of functions can be constructed that converges to the solution of equation 1 of (36). This fact can be exploited by determining in such a way as to assure that the approximation

assumes the correct value at infinity. So we have

(42) |

resulting in an algebraic equation of the order in with roots ,

(43) |

where satisfies the condition

(44) |

If there is a solution to equation (43) simultaneously fulfilling condition (44), we get an approximation that converges to the correct value of as . For the third-order approximation , we get from (41) and (43) the cubic equation

(45) |

The relevant real solution to this equation is

(46) |

with

(47) |

We display the explicit expression for the third-order approximation below:

(48) |

In the next two plots, we show a comparison between the approximation (48) and the exact solution found by applying `NDSolve` to 1 of (36). The value for is calculated by using the approximation given by equation 2 of (57). Note that (48) can also be inverted exactly by using common algebraic and transcendent functions. The corresponding procedure is listed below for the same parameters as given in the previous section.

This defines the cubic approximation according to (45)-(48).

For the numerically exact solution, calculated by using `NDSolve`, we impose the condition (i.e. ). This condition is equivalent to selecting a slope at the surface . Bürmann’s theorem is used a second time to find an approximation for the slope by calculating the Bürmann expansion (56) of the energy integral (see appendix B).

The plot shows the numerically exact temperature profiles (the colored curves) and the exact solution’s third-order approximation according to equation (48) (the dotted lines), in the range at .

Additionally the relative error of the third-order approximation (48) is displayed for the same profiles as shown before.

The goal of this work is to give a comprehensive presentation of Bürmann’s theorem and its application to linear and nonlinear DEs and PDEs of heat transfer, using single-valued and multivalued basis functions.

To this end, a reformulation of the formulas of the expansion coefficients of Bürmann series, based on a combinatorial viewpoint, is developed. As a result of this reformulation, an algorithm is presented, which accelerates the calculation of expansion coefficients, compared to standard methods. Using this approach, the expansion of transcendental functions in powers of their derivative is applied to the error integral, to the solution of nonlinear differential equations, and to the evaluation of the heat integral. By combining these methods, it is possible to show that the approximate solution of nonlinear problems of heat transfer can be given in terms of Bürmann expansions. Finally, it is shown that the introduction of an additional parameter in the basis function can significantly enhance the convergence of a Bürmann series. The value of this parameter can be found by solving algebraic equations that result from the boundary conditions of the problems.

The coefficients , defined by

result from elementary considerations. We write

(49) |

The power of is given by

(50) |

Rearranging equation 1 of (50) in increasing powers , we have two conditions

(51) |

Thus, collecting all powers of over all the contributions arising from all with is equivalent to imposing one single condition, replacing the conditions 1 and 2 of (51):

Since , we define :

(52) |

Using equations 1 of (49), 1 of (50), and (52) we finally arrive at the expansion

A similar result, displayed in (21), is obtained for .

We define the temperature difference and the differential equation it obeys:

(53) |

By integrating equation 2 of (53) over , we find

(54) |

Performing the transformations of Kirchhoff and Boltzmann, we arrive at the solution for :

(55) |

On the other hand, using the definition given in (54), we expand in a Bürmann series in powers

of according to (10). This leads to

(56) |

Combining (55) and (56) up to order three gives

According to the transformations in (36), we have

which leads, after some manipulations, to a quartic algebraic equation for and its solution , given by

(57) |

An approximation of (56) up to order six would lead to a sextic equation that is reducible to a cubic equation, and hence to an algebraic expression for . The approximation obtained from the first equation of (57) for is , which shows a maximum relative error of 0.37% compared to the exact value found by numerical methods (i.e. `NDSolve`) using the parameters listed in (35).

The authors wish to thank Prof. Dieter Messner, Lienz, and Dr. Hans Riedler, Graz, for their encouragement and helpful advice. Special thanks go to the editors and reviewers of *The Mathematica Journal* for their constructive suggestions and support.

[1] | H. S. Carslaw and J. C. Jaeger, Conduction of Heat in Solids, 2nd ed., Oxford: Clarendon Press, 1959 pp. 482-484. |

[2] | J. Crank, The Mathematics of Diffusion, 2nd ed., Oxford: Clarendon Press, 1975 pp. 107-110, 119-121. |

[3] | T. Olorunyolemi, A. Birnboim, Y. Carmel, O. C. Wilson Jr., I. K. Loyd, S. Smith, and R. Campbell, “Thermal Conductivity of Zinc Oxide: From Green to Sintered State,” Journal of the American Ceramic Society, 85(5), 2002 pp.1249-53.doi:10.1111/j.1151-2916.2002.tb00253.x. |

[4] | C. Wagner, “Diffusion of Lead Chloride Dissolved in Solid Silver Chloride,” Journal of Chemical Physics, 18, 1950 pp. 1227-1230. doi:10.1063/1.1747915. |

[5] | E. T. Whittaker and G. N. Watson, A Course of Modern Analysis, 4th ed., Cambridge: Cambridge University Press, 1927 pp. 128-132. |

[6] | E. W. Weisstein. “Bürmann’s Theorem” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/BuermannsTheorem.html. |

[7] | Wikipedia. “Lagrange Inversion Theorem.” (Oct 17, 2014) en.wikipedia.org/wiki/Lagrange_inversion_theorem. |

[8] | H. Stenlund, “Inversion Formula.” arxiv.org/abs/1008.0183. |

[9] | E. W. Weisstein, “Series Reversion” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/SeriesReversion.html. |

[10] | L. Comtet, “Advanced Combinatorics: The Art of Finite and Infinite Expansions” (J. W. Nienhuys, trans.), D. Reidel: Dordrecht, 1974. |

[11] | E. W. Weisstein, “Faà di Bruno’s Formula” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/FaadiBrunosFormula.html. |

[12] | P. M. Morse and H. Feshbach, Methods of Theoretical Physics, Vol. I, New York: McGraw-Hill, 1953 pp. 411-413. |

[13] | H. Fujita, “The Exact Pattern of a Concentration-Dependent Diffusion in a Semi-infinite Medium, Part I,” Textile Research Journal, 22(11), 1952 pp. 757-760. doi:10.1177/004051755202201106. |

[14] | H. Fujita, “The Exact Pattern of a Concentration-Dependent Diffusion in a Semi-infinite Medium, Part II,” Textile Research Journal, 22(12), 1952 pp. 823-827. doi:10.1177/004051755202201209. |

[15] | H. Fujita, “The Exact Pattern of a Concentration-Dependent Diffusion in a Semi-infinite Medium, Part III,” Textile Research Journal, 24(3), 1954 pp. 234-240. doi:10.1177/004051755402400304. |

[16] | C. F. Lee “On the Solution of Some Diffusion Equations with Concentration-Dependent Diffusion Coefficients—I,” Journal of the Institute of Mathematics and Its Applications (now IMA Journal of Applied Mathematics), 8(2), 1971 pp. 251-259. doi:10.1093/imamat/8.2.251. |

[17] | C. F. Lee “On the Solution of Some Diffusion Equations with Concentration-Dependent Diffusion Coefficients—II,” Journal of the Institute of Mathematics and Its Applications (now IMA Journal of Applied Mathematics), 10(2), 1972 pp. 129-133. doi:10.1093/imamat/10.2.129. |

[18] | T. R. Goodman, “Application of Integral Methods for Transient Nonlinear Heat Transfer,” in Advances in Heat Transfer, Vol. I (T. F. Irvine, Jr. and J. P. Hartnett, eds.), New York: Academic Press, 1964 pp. 51-122. |

H. M. Schöpf and P. H. Supancic, “On Bürmann’s Theorem and Its Application to Problems of Linear and Nonlinear Heat Transfer and Diffusion,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-11. |

Harald Markus Schöpf was born in Lienz, Austria on May 5, 1965. He studied solid-state physics and technical physics in Graz. After two years as a research assistant at the Graz University of Technology, Austria, he moved to Siemens Matsushita (now EPCOS) in Deutschlandsberg, Austria. Now he works as a teacher in Lienz.

Peter Hans Supancic was born in Graz, Austria on September 7, 1967. He studied theoretical physics in Graz and graduated in materials science at the University of Leoben, Austria. After finishing the habilitation on functional ceramics, he became ao. Prof. at the University of Leoben.

**Harald Markus Schöpf**

*Schillerstrasse 4
A-9900 Lienz*

**Peter Hans Supancic**

Institut für Struktur- und Funktionskeramik/Montanuniversitaet Leoben

*Peter Tunner Strasse 5
A-8700 Leoben *

When my son brought home a paper from school with a grid of numbers on it, I was immediately interested. The goal: cover the puzzle with all the dominoes from the “bone pile,” making sure that each number of the puzzle is covered by the same number on a domino. Many similar puzzles can be found online and in puzzle collections: see [1, 2, 3, 4, 5] for several online resources, which are the source of some of the examples considered here.

**Figure 1.** A partially solved domino grid, with almost half of the 28 dominoes placed on the underlying puzzle grid.

Our first task is to represent the board.

Next, we need the bone pile, the list of available dominoes. In this case, the bone pile consists of all 28 dominoes from the double zero to double six, but the definition is generally valid for any non-negative number , for a total of dominoes.

Find possible locations for a given piece .

This is the workhorse of the entire solution, first dividing the puzzle into pairs along each row and looking for matches to the given pair, then repeating the process on the transposed matrix (i.e. along the columns of the original grid) and noting the locations of any matches found. The location of the pair in the partition gives the location of the first half domino in the original grid, but adding the appropriate offset gives the location of the second half domino as well, and both are included as a domino location in the list of locations found.

Now for functions to highlight the dominoes within a puzzle. The function `frameDomino` generates the options to include in the `Frame` option of `Grid`.

The function `displayPuzzle` accepts a puzzle grid (a matrix) and a domino list (a list of location pairs) and displays the puzzle grid with frames around the dominoes indicated in the list.

For example, there are two possible locations for the domino in the `m9` puzzle.

A `puzzle` object takes three arguments.

- The matrix
`m`contains the puzzle to be solved, a 2D array of integers. - The filled locations list
`filled`is a list of coordinate pairs: , where either and or and . - The bone pile
`bones`, the list of unplayed dominoes, consists of a list of pairs of integers.

The `Format` command defines how to format a puzzle: the puzzle matrix has its filled list of dominoes framed, and a tooltip shows the bone pile, if any.

This section shows examples of various puzzles; mouse over a puzzle to see the bone pile. In this puzzle, no dominoes have been played yet.

Here two pieces have been removed from the bone pile and placed on the board.

To ensure that the squares filled by already placed pieces are no longer included, make a version of the board with the affected squares blanked out.

This function finds the forced locations; only one piece can possibly go into a forced location.

Find the forced locations after two particular dominoes have been played.

The forced locations are shown empty.

- Select the pieces that fit in forced locations.
- Use
`find`to return a list of all possible locations for playable pieces, and select the pieces that have only one possible location:

In this artificial case, there are two forced locations: in each, only one piece can be placed.

The function `step` finds the forced locations and fills them in with the appropriate dominoes taken from the bone pile. Mouse over to see that these dominoes have been removed from the bone pile.

At the beginning, there are no forced locations, but there are four forced pieces: pieces that can only be placed in one location in the puzzle: , , , and . The `step` function plays all four at once.

We are ready to solve the whole puzzle. The next command prints the current state, takes one step, and repeats until the bone pile is empty.

Along the way, multiple partial solutions had to be considered when no forced locations or forced pieces were found, but in the end all but one solution were dropped because of inconsistency. The comments were left in to show the forced locations or forced pieces at each step, but now we turn them off.

There is no reason not to make a prettier display function to show the dominoes with their customary pips (or dots), rather than showing only the grid numbers. We can represent the pip positions by matrices, some of which can be easily created by built-in matrix commands. Since the pip positions of double-9 and double-6 domino sets are consistent, let us build the larger set here. (A double-12 set would require adjusting the pip positions.)

The other matrices could be built by hand or using `SparseArray` or `Table` with appropriate criteria, but it is easy to create them by addition and subtraction.

A pip will be placed on a half-domino square wherever the matrix had a 1.

The function `displayDottedPuzzle` creates a graphical display of the puzzle, optionally replacing numbers by half-domino faces for any locations listed in the “filled list,” outlining any placed dominoes in a way similar to `displayPuzzle`.

The method described here can be thought of as “human-type,” since it uses intelligently chosen criteria for deciding which step to perform and which option to try next. The criteria used can be summarized as follows:

- Seek forced locations: if any locations can take none of the available dominoes, abandon the partial solution currently being constructed; if any locations can take exactly one available domino (and not the same one), fill all of these “forced locations.”
- Else seek forced dominoes: if any of the available dominoes cannot be placed on the board, abandon the partial solution currently being constructed; if any of the available dominoes can only be placed in one location on the board, play all of these “forced dominoes.”
- Else for a minimal case, place one domino in all possible locations, making separate copies of the puzzle for each case.
- Repeat until no further changes occur.

A human can make more complicated arguments eliminating some options; for examples, see the explanations at the sites [1, 2, 3, 4, 5]. (But not all suggested solving strategies turn out to be useful. One common idea, placing the “double” dominoes first, can easily be defeated by a clever puzzle designer.) The order is arbitrary and might be modified, but is far faster than the more simplistic, brute-force method presented in the following section.

Here is a list of all possible locations of all dominoes in our original puzzle.

The number of options for the pieces varies wildly.

(You can easily verify that in this puzzle, all the double dominoes have between four and eight possible placement options, making “place doubles first” a poor strategy in this case.) Taking all possible options for all the pieces gives a very large number.

Too many cases to consider! But this method would work, theoretically: Use `Outer` to get all combinations of choices of these options and then use `Select` on those that have no overlapping dominoes. Here do only the first three dominoes.

Using the first three dominoes, there are possibilities, reduced to 19 after elimination of conflicts. Placing the first 13 dominoes involves considering 653184 cases, of which only four have no conflicts.

So the following code should work, but will take an unreasonable amount of time and memory. It is beautifully simple and short, but do not run it, as it probably would not finish in our lifetimes!

“To a hammer, everything looks like a nail.” A few years ago, I worked out an exhaustive search-and-collision detection algorithm based on a idea of a generalized odometer, and since then I have seen applications for it everywhere. It works here, too.

Create a 28-digit generalized odometer, whose digit refers to which option we are trying for the domino. All digits start as 1; incrementing the odometer does not in general occur at the right end, but at the first digit (from the left) whose domino placement conflicts with that of any previous domino. A digit “rolls over” when it is incremented past its maximum value and must be reset to 1. Whenever a digit rolls over, also increment the digit to its left, just as in a real odometer. Each odometer digit has a separate maximum determined by the number of options available for that domino. When the first digit finally rolls over, all solutions have been found. We also accelerate the procedure by sorting the domino option list in increasing length.

Notice that the first four odometer digits can only be 1; each starts at 1 and has a maximum of 1.

To see or use the parts of options specified by the odometer, we use the function `MapThread`.

Here is the program that more or less immediately returns the answer(s).

As expected, there is only one odometer reading that works; that is, only one choice of domino placements solves the puzzle. The generalized odometer method works best for situations with a large number of variables taking on values that can be calculated in advance, particularly if the possible values are the same for all variables or vary in a way that can be easily specified. Here the options have to be recomputed for each new puzzle, making it less efficient than the previous method.

A “quadrilles” puzzle [5], an idea credited to French mathematician Edouard Lucas, can be divided into blocks, each containing the same number. Since the following figure does not completely fill a rectangular array, we add empty strings.

This particular quadrille has only one solution. At each step there are a large number of forced locations or pieces, and all 28 dominoes are placed in only four iterations.

Now for a puzzle with so many different ways to solve it that one feels that almost anything will work [5]!

If a puzzle is nonrectangular or has intentional gaps in it, such as the one shown below [4], simply embed it in a larger rectangle, and indicate the gaps by empty strings.

It seems likely that the online or downloadable domino puzzle generators effectively lay out the dominoes to create a grid that is guaranteed to be solvable. But even if all puzzles presented can be solved, a number of questions spring to mind:

For given grid dimensions, how many different solutions are there? (The three methods derived above solve individual puzzles, but what if the numbers are rearranged in a given grid in all possible ways?)

For given grid dimensions, what fraction of the possible puzzles has only one solution, and in general, for all , what fraction of the puzzles has solutions? What is the largest number of solutions possible?

Bear in mind that in the sense of the functions developed here, a “solution” is a merely a list of domino locations, so different puzzles of the same dimensions can have the same solution just by permuting the underlying grid numbers or rearranging them in other valid ways. In the interest of increased clarity, define a *solution schema* as a layout of dominoes face-down on a board. Now we can talk about the number of possible distinct schemas for a given puzzle grid.

What about writing a program that generates all solution schemas for a given board, ignoring the numbers? This could be done by modifying either the function `solvePuzzle` or the function `odometerSolve`, neither of which can quite do the job as written. (Yes, I did try them on a board filled with 0 entries, but they would need to be tweaked to expect a bone pile of double-zero dominoes.)

Finally, it is interesting that the first solution method worked so well, basically following how a human would decide which domino to play next. The code for the brute-force method is the simplest, but impractical without massive parallel processing. The odometer method works well, but here not as fast as the “human” method, and in any case may not be as transparent to the reader. There is more than one way to solve a puzzle! And if you spend much time thinking about a puzzle, other methods and other questions will probably occur to you.

I thank my colleagues at Southern Adventist University who have encouraged me, the folks at Wolfram Research who have occasionally helped me, and Claryce, who has put up with me in all my most puzzling moods.

[1] | E. W. Weisstein. “Domino Tiling” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/DominoTiling.html. |

[2] | Domino-Games.com. “Domino Puzzles.” (Sep 4, 2014) www.domino-games.com/domino-puzzles.html. |

[3] | “Dominosa.” (Sep 4, 2014) www.puzzle-dominosa.com. |

[4] | Yoogi Games. “Domino Puzzle Puzzles.” (Sep 4, 2014) syndicate.yoogi.com/domino. |

[5] | J. Köller. “Domino Puzzles.” (Sep 4, 2014) www.mathematische-basteleien.de/dominos.htm. |

K. E. Caviness, “Three Ways to Solve Domino Grids,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-10. |

Ken Caviness teaches at Southern Adventist University, a liberal arts university near Chattanooga. Since obtaining a Ph.D. in physics (relativity and nuclear physics) from the University of Massachusetts Lowell, he has taught math and physics in Rwanda, Texas, and Tennessee. His interests include both computer and human languages (including Esperanto), and he has used *Mathematica* since Version 1, both professionally and recreationally.

**Kenneth E. Caviness**

*Department of Physics & Engineering
Southern Adventist University
PO Box 370
Collegedale, TN 37315-0370*

Evaluating molecular integrals has been an active field since the middle of the last decade. Efficient algorithms have been developed and implemented in various programs. Detailed accounts of molecular integrals can be found in the references of [1]. In this article, the third in a series describing algorithms for evaluating molecular integrals, we detail the evaluation of the nuclear-electron attraction energy integrals from a more didactic point of view, following the approach of Rys, Dupuis, and King [2] as implemented in the OpenMol program [3].

The energy caused by the attraction between an electron in the region described by the overlap of the orbitals , and a nuclear of charge located at is expressed by the nuclear-electron attraction integral

(1) |

in which is an unnormalized Cartesian Gaussian primitive.

Using the Gaussian product (see, for example [1]) and defining the angular part as :

(2) |

The pole problem can be solved by the Laplace transform

(3) |

which turns the integral into

(4) |

where for now, we have ignored the factor . In the following steps, we will make certain modifications, knowing in advance that they will help simplify the expressions later on. We first reduce the upper limit of to unity by making the changes of variable (recall from the Gaussian product that ):

(5) |

and

(6) |

Replace , , and in , to get

(7) |

We now multiply by the factor :

(8) |

Again, by inserting , we get

(9) |

Having arrived at the desired form, we reinsert the value of the angular part into the expression and separate the term enclosed by the curly brackets into three components , , and :

(10) |

Defining as the function of the component inside the bracket,

(11) |

and similarly for and , we rewrite the integral as

(12) |

We will show that the integrand in the expression for is in fact an overlap between two one-dimensional Gaussians, and we may use the results that have been developed in [1]. First, we expand the exponential parts of the integrand

(13) |

regrouping in terms of and , we have

(14) |

which becomes

(15) |

where and . These definitions let us compare this equation with the result of the Appendix, in which we see that equation (15) is simply

(16) |

where

(17) |

Substituting and ,

(18) |

into equation (13), we have

(19) |

Substitute this result into the definition of to get

(20) |

The integral has the same form as a one-dimensional overlap integral where the integrand is a Gaussian function centered at with an exponential coefficient .

From the observation above, we make use of the results developed for overlap integrals in [1]. For example, for ,

(21) |

In particular, we have the transfer equations

(22) |

The and functions take similar forms. The product is a polynomial in , and if we replace , then the integral in equation (12) is

(23) |

where is the said polynomial. The integral is a combination of the Boys function (see, for example, Reference 4 of [1])

(24) |

a strictly positive and decreasing function.

Aside from the obvious choice of using *Mathematica* to evaluate the Boys function, there are several ways of evaluating the integral. In practice, most programs store pretabulated values of the function at different intervals and interpolation is done as needed (e.g. by Chebyshev polynomials). Here we use the Gauss-Chebyshev quadrature numerical integration [4]. For simplicity, we have adopted almost verbatim the F77 code in [4, p. 46].

The function `Nea` evaluates the nuclear-electron attraction integral of two Gaussian primitives; here `alpha`, `beta`, `RA`, `RB`, `LA`, and `LB` are , , , , , and as defined earlier; `RR` is the nuclear position.

As in our two earlier articles [1, 5], we use the same data for the water molecule (, , the geometry optimized at the HF/STO-3G level). The molecule lies in the - plane with Cartesian coordinates in atomic units.

In the STO-3G basis set, each atomic orbital is approximated by a sum of three Gaussians; here are their unnormalized primitive contraction coefficients and orbital exponents.

Here are the basis function origins and Cartesian angular values of the orbitals, listed in the order , , , , , , and .

Specifically, for the nuclear-electron attraction energy integral between the first primitive of the orbital of hydrogen atom 1, , the first primitive of the orbital of the oxygen atom, , and atom 1 () is

(25) |

We have

From the Gauss-Chebyshev quadrature, the integral in equation (23) yields . The nuclear-electron integral (25) is . This is calculated as follows.

We would first need the normalization factor before evaluating the nuclear-electron energy matrix.

We have provided a didactic introduction to the evaluation of nuclear-electron attraction-energy integrals involving Gaussian-type basis functions by use of recurrence relations and a numerical quadrature scheme. The results are sufficiently general so that no modification of the algorithm is needed when larger basis sets with more Gaussian primitives or primitives with larger angular momenta are employed.

Consider the Gaussian product: . Combine and expand the coefficients to get

Let and substitute in the exponent to get

The first three terms inside the second bracket factor to , and the last two can be reduced to . The original Gaussian product is thus

Here is a verification.

[1] | M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals I,” The Mathematica Journal, 14(3), 2012. doi:10.3888/tmj.14-3. |

[2] | J. Rys, M. Dupuis, and H. F. King, “Computation of Electron Repulsion Integrals Using the Rys Quadrature Method,” Journal of Computational Chemistry, 4(2), 1983 pp. 154-157. doi:10.1002/jcc.540040206. |

[3] | G. H. F. Diercksen and G. G. Hall, “Intelligent Software: The OpenMol Program,” Computers in Physics, 8(2), 1994 pp. 215-222. doi:10.1063/1.168520. |

[4] | J. Pérez-Jorda and E. San-Fabián, “A Simple, Efficient and More Reliable Scheme for Automatic Numerical Integration,” Computer Physics Communications, 77(1), 1993 pp. 46-56. doi:10.1016/0010-4655(93)90035-B. |

[5] | M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals II,” The Mathematica Journal, 15(1), 2013. doi:10.3888/tmj.15-1. |

M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-9. |

Minhhuy Hô received his Ph.D. in theoretical chemistry at Queen’s University, Kingston, Ontario, Canada, in 1998. He is currently a professor at the Centro de Investigaciones Químicas at the Universidad Autónoma del Estado de Morelos in Cuernavaca, Morelos, México.

Julio-Manuel Hernández-Pérez obtained his Ph.D. at the Universidad Autónoma del Estado de Morelos in 2008. He has been a professor of chemistry at the Facultad de Ciencias Químicas at the Benemérita Universidad Autónoma de Puebla since 2010.

**Minhhuy Hô**

*Universidad Autónoma del Estado de Morelos
Centro de Investigaciones Químicas
Ave. Universidad, No. 1001, Col. Chamilpa
Cuernavaca, Morelos, Mexico CP 92010
*

**Julio-Manuel Hernández-Pérez
**

Facultad de Ciencias Químicas

Ciudad Universitaria, Col. San Manuel

Puebla, Puebla, Mexico CP 72570

A cellular automaton (CA) is a dynamical system with arbitrarily complex global behavior, despite being governed by very simple local rules [1]. In order to better understand how that kind of complex behavior emerges, many explorations have been made in the context of the power implicit in CA rules. For instance, classical benchmark problems have been used for this, including the density classification task [2, 3] and the parity problem [4]. The density classification task tries to discover the most frequent bit in the initial configuration of the lattice; the parity problem tries to find the parity of the number of 1s in the initial configuration of the lattice. One of the approaches in these contexts is to evaluate every possible CA of a given family in terms of its capabilities to solve the target problem. This approach is possible in small CA families, like the elementary space (composed of 256 CAs), but is not feasible in larger families, like the one-dimensional binary CA family with radius 3, composed of rules.

As a strategy to search for CAs in large rule families, evolutionary computation has been extensively used, relying on measures of properties of the candidate rules, such as their degree of internal symmetry, so as to discard or keep candidates according to these property values. This was a key aspect, for instance, that led to finding WdO, currently the best one-dimensional radius-3 rule for the density classification task [5].

An alternative is to constrain the search space to only the CAs that are known to present specific properties. The challenge here is how to constrain the space without the need to enumerate the entire subspace of interest. Here, we introduce the concept of a CA template as a possible way to achieve this goal. A CA template is a data structure associated with the rule tables of the members of a CA family that relies on the use of variables. The introduction of these variables makes it possible for a CA template to represent a set of rules, unlike the standard -ary rule table representation that can only represent one individual CA. By making use of *Mathematica*’s built-in equation-solving capabilities and algorithms that allow finding equality relations among CAs with a given property, we are able to create templates that represent number-conserving CAs (those that, in a sense, preserve the number of states of the initial configuration; more details below), as well as those with maximal internal symmetry (those displaying invariance under some transformations in their rule tables; also to be explained below). These two cases are given here as examples of the applicability of the template idea, but other properties can also be accounted for.

In the following section, basic notions about CAs are given, followed by a section that presents details about important properties related to the density classification task. Section 4 explains the notion of *template* and presents the implemented algorithms. Section 5 concludes the text, with a discussion on the advantages and limitations of using templates, and gives some ideas for future work.

Cellular automata constitute a class of decentralized dynamical systems, usually discrete in space, time, and states [1]. As systems governed by relatively simple rules, CAs represent a meaningful model for tackling the issue of how interaction among simple components can lead to the solution of global problems.

CAs are composed of a regular lattice of cells whose states change through time, according to a local rule. The lattice can be deployed in any number of dimensions (most commonly one, two, or three) and may have an infinite or fixed number of cells. Cells’ states are commonly represented by numbers or colors out of possibilities ranging from 0 to . The local rule of the CA acts on the neighborhood of every cell, which is the set of neighboring cells meant to influence its subsequent states. The neighborhood is usually expressed by its radius (or range) , meaning the range of cells on each side affecting the one in question. By defining values for these two parameters, a CA rule space or family is defined. The values of and in the one-dimensional case (i.e. a neighborhood has three cells; a cell has two possible states) give rise to the elementary rule space, which is the most well-studied family, due to its small size of only 256 rules but extremely rich phenomenology [1].

For present purposes, whenever we refer to cellular automata, we mean one-dimensional, binary () CAs, with a fixed number of cells in the lattice and periodic boundary conditions (i.e. the lattice is closed at its ends, like a ring).

Every CA is governed by a rule that relates the neighborhood of a cell to the state it takes on at the next time step. Its most common representation is the rule table, which is an explicit listing of every possible state configuration of the neighborhoods, lexicographically ordered, and a corresponding cell state for each. Here we use Wolfram’s lexicographical ordering, where the leftmost neighborhood is formed by the neighborhood configuration where all cells are in the () state, all the way down to the rightmost neighborhood with all cells in the 0 state.

As an illustration, this is the rule table of the elementary CA for rule 184.

This is the ordered set of output cell states from that rule table, the -ary form.

By converting the binary sequence that defines the -ary form into a decimal representation, one obtains the CA rule number, which serves as a unique identifier of a CA in a given rule space [1].

In order to handle operations concerning rule tables, various *Mathematica* functions are defined. So, given a rule table in its -ary form, the function `RuleTableFromkAry` transforms it to its classical representation.

The function `kAryFromRuleTable` reverses the process.

Given a CA’s rule number, `RuleTableFromRuleNumber` determines its rule table.

The inverse function `RuleNumberFromRuleTable` yields the rule number from the rule table.

`WellFormedRuleTableQ` is a predicate that checks whether a rule table in -ary form is valid according to its values of and .

`RuleOutputFromNeighbourhood` is a utility function to get the output corresponding to a particular neighborhood in a rule table.

Finally, `AllNeighbourhoods` is a utility function giving all possible neighborhoods of a certain rule space.

All these functions are handy to perform rule table manipulation and are used throughout this article.

In the one-dimensional case, it is possible to visualize the system’s evolution using a space-time diagram, in which time goes from top to bottom, and cell states are represented by colors. For binary CAs, white cells are in the 0 state and black cells in the 1 state. In order to obtain and plot the space-time diagram resulting from a rule execution on a given lattice, one can use *Mathematica*’s built-in functions `CellularAutomaton` and `ArrayPlot`.

In order to better understand the computational power implicit in a CA rule, benchmark problems have been defined for it to tackle; among them, the most common is the density classification task (DCT). In the classical definition of DCT, a one-dimensional binary CA has to lead an arbitrary initial odd-sized configuration into a fixed-point state of all blacks, if the initial condition has a larger number of black cells, or into a fixed-point state of all whites otherwise.

It has been proved that in order to solve the DCT perfectly, a CA would need to be number conserving, that is, it should not change the number of cells in each state from any given initial condition [6]. This fact stands as a contradiction against the classical definition of the DCT, since in order for it to evolve to an all-black or all-white configuration, it would obviously need to change the number of cells in each state throughout time. This means that DCT is unsolvable when formulated according to its classical definition [2, 3].

Currently, the best imperfect DCT solver (known as Wd0) was found in [5], by means of a sophisticated evolutionary algorithm that used, among other important properties, the internal symmetry of a rule in its fitness function. In tune with the fact that a perfect DCT solver would need to be number conserving, Wd0 and other good DCT solvers are known to have a very small Hamming distance from number-conserving rules of the same rule space [7].

All in all, number conservation and internal symmetry are two important properties when determining the ability of a CA to solve the DCT, and serve as good examples for the notion of CA templates. Both are described in detail in the following subsections. But notice, upfront, that these two properties are amenable to being addressed in templates, since they derive from well-established relations among state transitions.

Number conservation is a property presented by some CAs, in which the sum of the states of the individual cells in any initial configuration does not change during the space-time evolution; in particular, for binary CAs, this means that the number of 1s always remains the same. This kind of CA is useful, for instance, to model systems like car traffic, in which a car cannot appear or disappear as time goes by [7]. Elementary CA 184 is an example of a number-conserving CA.

In order for a one-dimensional CA rule to be number conserving, it is established in [8] that the local rule with neighborhood size must respect the following necessary and sufficient conditions for every state transition:

where corresponds to a sequence of 0s of length .

A simplification of the original algorithm from [8] is provided in [9]. Basically, it was shown that for any given rule, it suffices to analyze the state transitions associated with the neighborhood made up of only 0s and the neighborhoods not starting with 0. This is a total, therefore, of neighborhoods instead of , as stated in [8]. This is the condition we employ to obtain templates that represent number-conserving CAs, as will be shown below.

Apart from number conservation, a rule’s internal symmetry also plays an important role in solving the DCT. In order to fully understand how this property works, an explanation about rule transformations and dynamically equivalent rules is required; the presentation is restricted to binary rules, even though this notion extends to the arbitrary -ary case.

Given the rule table of a CA, one can apply three types of transformations on it that will result in dynamically equivalent rules. For the binary case, `BlackWhiteTransform` is obtained by switching the state of all cells in a rule table. The second type of transformation, `LeftRightTransform`, is obtained by reversing the bits of the neighborhoods in a rule table and reordering the set of state transitions. The composition in either order of the latter two transformations (they commute) yields the third type, `LeftRightBlackWhiteTransform` or `BlackWhiteLeftRightTransform`.

Here is how they work on rule 110.

This checks the first one, `BlackWhiteTransform`.

With these transformations, it becomes straightforward to see which CAs in a given space have equivalent dynamical behavior. For instance, by applying the three transformations on a given CA, say elementary rule 110, elementary rules are obtained. These four rules are said to be in the same dynamical equivalence class. It is easy to see why, by looking at their space-time diagrams.

By comparing the rule table of a CA with the one that resulted from its equivalent rule obtained out of a given transform, it is possible to count the number of state transitions they share. In a sense, this provides a measure of the amount of *internal symmetry* of a CA with respect to that transformation, whichever it is. For instance, elementary CA 110 has an internal symmetry value of 2 with respect to the black-white transformation, since it shares two state transitions with its black-white symmetrical rule, which is elementary rule 137.

Repeating this process with rule 150, on the other hand, yields a different result. Rule 150 has an internal symmetry value of 8 according to the black-white transformation. This is the maximum possible value of this measure with elementary CAs. This is quite predictable, as the black-white transformation of rule 150 is rule 150 itself. In fact, any of the three transformations applied to rule 150 yields rule 150 itself, indicating it has the maximum internal symmetry value according to any of the three transformations.

The degree of internal symmetry of a rule can be a relevant measure in any context where a property is shared among all members of a class of dynamical equivalence. In [5] and [7], for instance, rules with maximal internal symmetry with the composite transformation were key for their findings related to DCT.

A CA template is an enhancement over the rule table representation, obtained by allowing it to have variables in the place of simple cell states as its results. As a consequence, a CA template has the power to represent whole subsets of CA rule spaces, instead of only a single rule.

As a simple example, consider the template . It represents the subset of the elementary CAs with fixed bits at positions 1, 3, 5, 6, and 8 in the list, free variables at positions 2 and 4, and complement bits at positions 2 and 7.

Using *Mathematica*’s built-in transformation rules, one can obtain the four CAs represented by this template, as well as their corresponding rule numbers.

The function `RuleTemplateVars` lists the variables in a template.

Extracting the variables from a template and applying a value to each, the template is transformed into one of its represented rule tables. Every template has a number of possible substitutions equal to ; however, as will be seen later, some of those may not be valid.

The function `ExpandTemplate` performs this operation by applying values to each variable of a given template. It may receive as an optional argument an integer called `ithSubstitution` in the range `0` to , representing which substitution should be made. If omitted, it performs all the possible substitutions for a given template.

After the expansion, one can obtain the list of valid rules represented by the template by using the function `RuleNumbersFromTemplate`.

With *Mathematica*’s built-in symbolic computation features, it is easy to create templates that represent a whole space. The space of elementary CAs would be represented by the following template.

In [4], the authors analytically found which transitions needed to be fixed, variable, or dependent on other transitions in a CA rule table, in order to have a chance to solve the parity problem perfectly. By fixing those transitions, they restrained the rule space of one-dimensional, binary, radius-2 CAs, composed of 4,294,967,296 rules, to only 16 candidates for perfect parity solvers. Although they used the de Bruijn graph as the primary structure to represent this rule space subset, it could have been easily represented with CA templates.

Empowered by *Mathematica*’s built-in equation-solving capabilities, algorithms can be developed that find the fixed, variable, and dependent state transitions on a rule table, thus leading to templates that are representatives of CAs that share the properties of number conservation and maximal internal symmetry; these are shown below.

In [8], Boccara and Fukś established necessary and sufficient conditions that a CA rule table must meet in order to be conservative (which is another way to say number conserving). These conditions can be translated into an algorithm `BFConservationTemplate` that finds a set of equations that, when solved by *Mathematica*, yields the equivalent of a template that represents all conservative CAs of a determined space.

By running this function for the elementary space, the following template is obtained.

When expanded, the latter yields the following representations.

However, it is clear that not all -ary representations above are valid, since some of them rely on state values outside the range , namely, the states 2 and . Hence, by discarding those three, we get the complete set of five number-conserving rules of the elementary space.

It is important to notice that this kind of strategy can only be employed on properties that derive directly from the CA rule table.

As the internal symmetry of a CA is also a property that derives directly from its rule table, it is a valid candidate to be generalized into a template. By listing a CA rule table along with its respective transformations, it is possible to establish equality relations between them that, when solved by *Mathematica*, yield a template that represents all CAs that have the maximal possible value of internal symmetry, according to any subset of the three transformations.

By establishing that all of the results of the rule tables have to be the same in both the CA and its transformed counterpart, the following function `MaxSymmTemplate` achieves the goal of finding a template that represents all CAs of a given space that present the *maximum* value of internal symmetry, according to a list of transformations received as arguments.

In order to find a template that represents all elementary CAs with maximum symmetry according to the black-white transformation, it suffices to run `MaxSymmTemplate`, then expand the template to generate the rule numbers.

The verification of this result can be achieved by guaranteeing that all these rule numbers yield the same rule tables when transformed.

We can analogously obtain a template representing all CAs with maximum symmetry according to all transformations, from which their expansions also lead to the corresponding rule numbers.

And again, their validity can be checked.

Both the `BFConservationTemplate` and the `MaxSymmTemplate` functions can take another template as an optional argument, which is meant to be used as the starting point of the algorithms. This is the current way to compose the intersection of templates that share a common structure. For instance, in order to generate all the elementary conservative CAs with maximum internal symmetry values according to the black-white transformation, it becomes straightforward to use the template for number-conserving rules of the elementary space as the starting point of `MaxSymmTemplate`. This leads to a template that, once again, can be expanded so as to yield the target rule numbers.

Alternatively, the template with maximal internal symmetry could be used as the starting point of the `BFConservationTemplate` algorithm to obtain the same result.

The concept of CA templates was introduced, a rule table enhancement capable of representing a subset of a CA rule space, where the rules in the set can share a common property. Although the examples used for illustration only referred to one-dimensional, binary rules (the elementary space), the idea seems readily applicable to larger CAs with a larger number of states and more dimensions.

We have shown some of the operations applicable to CA templates, as well as some cases of use, in the form of *Mathematica* functions that yield templates representing subsets of the elementary space of CAs with properties related to number conservation and maximum internal symmetry. With respect to the latter, templates can be derived for any subset of the three symmetry-related transformations.

Templates for the rules in the same dynamical class in the elementary space have appeared previously in the CA literature, such as in [10]. But in these cases, the notion was not at all couched in the conceptual framework we have put forward, which allows templates to be effectively defined for rules having maximal internal symmetry value, let alone the possibility of representing further CA properties.

The properties used as examples here can be couched in terms of well-established relations among the state transitions of the CA, which are a necessary condition for a property to be addressed in the form of templates. As a counterpoint, the notion of reversibility of one-dimensional rules does not seem to be, at least in principle, amenable to template representation, since it is currently not known how to characterize reversibility in terms of the rule table of a CA.

It stands as future work to find new algorithms that would allow template representations of other properties, as well as the enhancement of the current algorithm related to internal symmetry templates, so as to extend the current constraint of only generating maximal internal symmetry toward also allowing the generation of templates with specific values of internal symmetry, not necessarily maximal.

Currently, because of computational demands, template expansion does not scale up well to very big templates; this should also be addressed in a follow-up. In particular, it might be worth defining operations of union and intersection of templates, which might be used to preprocess a template before the operation of template expansion.

Pedro de Oliveira thanks FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo). Maurício Verardo is thankful for a fellowship provided by CAPES, the Brazilian agency of its Ministry of Education.

[1] | S. Wolfram, A New Kind of Science, Champaign, IL: Wolfram Media Inc., 2002. |

[2] | P. P. B. de Oliveira, “Conceptual Connections around Density Determination in Cellular Automata,” Cellular Automata and Discrete Complex Systems (Lecture Notes in Computer Science), 8155, 2013 pp. 1-14. doi:10.1007/978-3-642-40867-0_ 1. |

[3] | P. P. B. de Oliveira, “On Density Determination with Cellular Automata: Results, Constructions and Directions,” Journal of Cellular Automata, forthcoming. |

[4] | H. Betel, P. P. B. de Oliveira, and P. Flocchini, “Solving the Parity Problem in One-Dimensional Cellular Automata,” Natural Computing, 12(3), 2013 pp. 323-337. doi:10.1007/s11047-013-9374-9. |

[5] | D. Wolz and P. P. B. de Oliveira, “Very Effective Evolutionary Techniques for Searching Cellular Automata Rule Spaces,” Journal of Cellular Automata, 3(4), 2008 pp. 289-312. |

[6] | H. Fukś, “A Class of Cellular Automata Equivalent to Deterministic Particle Systems,” in Hydrodynamic Limits and Related Topics, (S. Feng, A. T. Lawniczak, and S. R. S. Varadhan, eds.), Providence, RI: American Mathematical Society, 2000 pp. 57-69. |

[7] | J. Kari and B. Le Gloannec, “Modified Traffic Cellular Automaton for the Density Classification Task,” Fundamenta Informaticae, 116 (1-4), 2012 pp. 141-156. doi:10.3233/FI-2012-675. |

[8] | N. Boccara and H. Fukś, “Number-Conserving Cellular Automaton Rules,” Fundamenta Informaticae, 52(1-3), 2002 pp. 1-13. |

[9] | A. Schranko and P. P. B. de Oliveira, “Towards the Definition of Conservation Degree for One-Dimensional Cellular Automata Rules,” Journal of Cellular Automata, 5(4-5), 2010 pp. 383-401. |

[10] | W. Li and N. Packard, “The Structure of the Elementary Cellular Automata Rule Space,” Complex Systems, 4(3), 1990 pp. 281-297. www.complex-systems.com/pdf/04-3-3.pdf. |

P. P. B. de Oliveira and M. Verardo, “Representing Families of Cellular Automata Rules,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-8. |

Pedro de Oliveira has been a faculty member since 2001 of the School of Computing and Informatics and of the Postgraduate Program in Electrical Engineering at Mackenzie Presbyterian University, São Paulo, Brazil. His research interests are cellular automata, evolutionary computation, and cellular multi-agent systems. Pedro is an alumnus of the 2003 NKS Summer School.

Maurício Verardo is a post-graduate student in Electrical Engineering at Mackenzie Presbyterian University, working with cellular automata ever since his undergraduate senior project for his computer science degree from Mackenzie. Maurício is an alumnus of the 2011 NKS Summer School.

**Pedro P. B. de Oliveira**

*Faculdade de Computação e Informática & Pós-Graduação em Engenharia Elétrica
Universidade Presbiteriana Mackenzie
Rua da Consolação, 930
São Paulo, 01302-907 – Brazil
*

**Maurício Verardo**

*Pós-Graduação em Engenharia Elétrica*

Universidade Presbiteriana Mackenzie

Rua da Consolação, 930

São Paulo, 01302-907 – Brazil

*mauricio.verardo@gmail.com*

Among its many interpretations, the term reliability most commonly refers to the ability of a device or system to perform a task successfully when required. More formally, it is described as the probability of functioning properly at a given time and under specified operating conditions [1]. Mathematically, the reliability function is defined by

where is a nonnegative random variable representing the device or system lifetime.

For a system composed of at least two components, the system reliability is determined by the reliability of the individual components and the relationships among them. These relationships can be depicted using a reliability block diagram (RBD).

Simple systems are usually represented by RBDs with components in either a series or parallel configuration. In a series system, all components must function satisfactorily in order for the system to operate. For a parallel system to operate, at least one component must function correctly. Systems can also contain components arranged in both series and parallel configurations. If an RBD cannot be reduced to a series, parallel, or series-parallel configuration, then it is considered a complex system.

This article deals with the generation of an exact analytical expression for the reliability of a complex system. The demonstrated method relies on finding all paths between the source and target vertices in a directed acyclic graph (i.e., RBD), as well as the inclusion-exclusion principle for probability.

**A Note on Timings**

The timings reported in this article were measured on a custom workstation PC using the built-in function `Timing`. The system consists of an Intel® Core i7 CPU 950 @ 4 GHz and 24 GB of DDR3 memory. It runs Microsoft® Windows 7 Professional (64-bit) and scores 1.32 on the *MathematicaMark9* benchmark.

We begin by considering a directed graph that consists of a finite set of vertices together with a finite set of ordered pairs of vertices called directed edges. The built-in function `Graph` can be used to construct a graph from explicit lists of vertices and edges.

This two-dimensional grid graph, labeled , can be constructed much more efficiently by using the built-in function `GridGraph`. Throughout this section, we utilize it to illustrate our functions.

Now, for a vertex , we define the set of out-neighbors as

where is taken to mean a directed edge from to . This is implemented in the function `VertexOutNeighbors`.

`VertexOutNeighbors` behaves similarly to the built-in function `VertexOutDegree`. That is, given a graph and a vertex , the function returns a list of out-neighbors for the specified vertex.

If, however, only the graph is specified, the function will give a list of vertex out-neighbors for all vertices in the graph.

The order in which the out-neighbors are displayed is determined by the order of vertices returned by `VertexList`.

We can implement similar functions to obtain the set of in-neighbors by simply changing to .

The next step toward our goal is to consider a method of traversing a graph. One common approach of systematically visiting all vertices of a graph is known as depth-first search (DFS). In its most basic form, a DFS algorithm involves visiting a vertex, marking it as “visited,” and then recursively visiting all of its neighbors [2]. The function `DepthFirstSearch` implements this algorithm for directed graphs.

Given a graph and a starting vertex , `DepthFirstSearch` returns a list of vertices in the order in which they are visited.

We compare this with the result of the built-in function `DepthFirstScan`.

Next, let us define the function `DirectedAcyclicGraphQ`.

If the graph is both directed and acyclic, `DirectedAcyclicGraphQ` yields `True`. Otherwise, it yields `False`.

Finally, we consider the problem of finding all paths in a directed acyclic graph between two arbitrary vertices . Typically, we refer to as the source and as the target. A path in is defined as a sequence of vertices such that for . Since we have constrained ourselves to a directed acyclic graph, all paths are simple. That is to say, all vertices in a path are distinct.

By modifying the depth-first search algorithm, we arrive at a solution.

Like the original DFS algorithm, we visit a vertex and then recursively visit all of its neighbors. However, instead of checking if a vertex has been marked “visited,” we compare the current vertex to the target. If they do not match, we continue to traverse the graph. Otherwise, the target has been reached and we store the path for later output.

For a given directed acyclic graph , a source vertex , and a target vertex , `FindPaths` returns a list of all paths connecting to .

In this particular instance, the function takes approximately 0.85 milliseconds to return the result.

`FindPaths` works for any pair of vertices.

If no path is found, the function returns an empty list.

Up to this point, we have been working with graphs in an abstract, mathematical sense. We now make the transition from directed acyclic graph to reliability block diagram by associating vertices with components in a system and edges with relationships among them.

Consider a single component in an RBD. Let us imagine a “flow” moving from a source, through the component, to a target. The component is deemed to be functioning if the flow can pass through it unimpeded. However, if the component has failed, the flow is prevented from reaching the target.

The “flow” concept can be extended to an entire system. A system is considered to be functioning if there exists a set of functioning components that permits the flow to move from source to target. We define a path in an RBD as a set of functioning components that guarantees a functioning system. Since we have chosen to use a directed acyclic graph to represent a system’s RBD, all paths are minimal. That is to say, all components in a path are distinct.

Once the minimal paths of a system’s RBD have been obtained, the principle of inclusion-exclusion for probability can be employed to generate an exact analytical expression for reliability. Let be the set of all minimal paths of a system. At least one minimal path must function in order for the system to function. We can write the reliability of the system as the probability of the union of all minimal paths:

This is implemented in the function `SystemReliability`.

Given a system’s RBD (represented by a directed acyclic graph ), a source vertex , and a target vertex , `SystemReliability` returns an exact analytical expression for the reliability.

Consider the RBD of a simple system with four components in a series configuration.

The reliability of the system is given in terms of the reliability of its four components.

Consider the RBD of a simple system with four components in a parallel configuration.

The “start” and “end” components are not part of the actual system. They are added to ensure the RBD meets the criteria for a directed acyclic graph.

Furthermore, these nonphysical components are taken to have perfect reliability, that is, . Since they have no effect on the system’s reliability, they can be safely removed from the resulting analytical expression. To do so, we simply define a list of replacement rules and apply it to the result of `SystemReliability`.

The reliability of the system is given in terms of the reliability of its four components.

Next, we examine the RBDs of two simple systems with components in a series-parallel configuration.

Component is in series with component , and both components are in parallel with component .

As in previous examples, we use `SystemReliability` to obtain an exact analytical expression for the reliability.

Finally, we examine the RBDs of two complex systems.

The reliability of the system is given in terms of the reliability of its six components.

The result is returned after approximately 0.59 milliseconds.

The reliability of the system is given in terms of the reliability of its fourteen components.

The result is returned after approximately 0.33 seconds.

We now turn our attention to the derivation of a time-dependent expression for the reliability of a complex system based on information contained within its reliability block diagram.

Let us imagine that we have a generic system composed of six subsystems and we know the reliability relationships among them. In addition, the underlying statistical distributions and parameters used to model the subsystems’ reliabilities are known.

We begin by creating the system’s RBD.

In defining the RBD, we have made use of the `Property` function to store information associated with each subsystem. For instance, the custom property `"Distribution"` is used to store a parametric statistical distribution. Labels, images, and other properties can also be specified.

Next, we use `SystemReliability` to generate an exact analytical expression for the reliability.

Now, the reliability function of the subsystem is given by

where is the corresponding cumulative distribution function (CDF). For each subsystem, we use `PropertyValue` to extract the symbolic distribution stored in the RBD, and then use the built-in function `CDF` to construct its reliability function.

We extract additional information, for example, subsystem labels, from the RBD and combine it with the reliability functions to create plots for comparison.

In order to transform our static analytical expression into a time-dependent function, we first define a list of replacement rules.

Next, we apply the list of rules to the expression for system reliability.

The result is a time-dependent reliability function for the complex system described by the RBD.

Finally, we generate a plot of the system’s reliability over time.

We have demonstrated a method of generating an exact analytical expression for the reliability of a complex system using a directed acyclic graph to represent the system’s reliability block diagram. In addition, we have shown how to convert an analytical expression for system reliability into a time-dependent function based on statistical information stored in an RBD. While our focus has been on the analysis of complex systems, we have also shown that the combination of path finding and the inclusion-exclusion principle is equally applicable to simple systems in series, parallel, or series-parallel configurations.

Knowing the static analytical expression or time-dependent solution of a system allows us to perform a more advanced reliability analysis. For instance, we can easily calculate the Birnbaum importance

of the component using the result of `SystemReliability`. Similarly, we can derive the hazard function, or failure rate, from the system’s time-dependent reliability function.

There are several ways in which the functionality demonstrated in this article can be improved and expanded:

- Increase the efficiency of
`SystemReliability`by implementing improvements to the classical inclusion-exclusion principle [3]. - Add functions related to common tasks in reliability analysis, for example, reliability importance, failure rate, and so on.
- Add support for -out-of- structures, that is, redundancy.
- Add the ability to export and import complete RBDs.
- Add a mechanism, for example, a graphical user interface (GUI), to facilitate the construction and modification of RBDs.

Finally, the code can be combined into a user-friendly package with full documentation.

[1] | W. Kuo and M. Zuo, Optimal Reliability Modeling: Principles and Applications, Hoboken, NJ: John Wiley & Sons, 2003. |

[2] | S. Skiena, The Algorithm Design Manual, 2nd ed., London, UK: Springer-Verlag, 2008. |

[3] | K. Dohmen, “Improved Inclusion-Exclusion Identities and Inequalities Based on a Particular Class of Abstract Tubes,” Electronic Journal of Probability, 4, 1999 pp. 1-12. doi:10.1214/EJP.v4-42. |

T. Silvestri, “Complex System Reliability,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-7. |

Todd Silvestri received his undergraduate degrees in physics and mathematics from the University of Chicago in 2001. As a graduate student, he worked briefly at the Thomas Jefferson National Accelerator Facility (TJNAF) where he helped to construct and test a neutron detector used in experiments to measure the neutron electric form factor at high momentum transfer. From 2006 to 2011, he worked as a physicist at the US Army Armament Research, Development and Engineering Center (ARDEC). During his time there, he cofounded and served as principal investigator of a small laboratory focused on improving the reliability of military systems. He is currently working on several personal projects.

**Todd Silvestri**

*New Jersey, United States*

*todd.silvestri@optimum.net*

The aim of canonical correlation analysis is to find the best linear combination between two multivariate datasets that maximizes the correlation coefficient between them. This is particularly useful to determine the relationship between *criterion measures* and the set of their *explanatory factors*. This technique involves, first, the reduction of the dimensions of the two multivariate datasets by projection, and second, the calculation of the relationship (measured by the correlation coefficient) between the two projections of the datasets.

While the correlation coefficient measures the relationship between two simple variables, canonical correlation analysis measures the relationship between two *sets* of variables. Although the correlation measure employed for both techniques is the same, namely

(1) |

the distinction between the two techniques must be clear: while for the correlation coefficient and must be -dimensional vectors containing realizations of the random variables, for canonical correlation analysis (CCA) has to be an and an matrix, with and at least 2. In the latter case, is the number of realizations for all random variables, where is the number of random variables contained in the set and is the number of random variables in the set .

This article calculates, through CCA, the relationship between stock markets of developed and developing countries and performs Bartlett’s test for the statistical significance of the canonical correlation found.

For an introduction to statistics in financial markets, see [1].

The data employed for the CCA in the present work was obtained directly from *Mathematica*’s function. The variables are divided into two groups: the ETFs representing developed nations and the ETFs representing developing countries. The first group is treated as independent variables and the second group as dependent variables. The idea here is to analyze the relationship between stock markets in these two groups of countries through ETFs traded at the New York Stock Exchange (NYSE).

Although there are several country-specific ETFs traded on the NYSE, not all of them were chosen. The idea is to select, for each group, those ETFs representing countries with large stock markets according to a market capitalization criterion. The market capitalization of all stock markets was obtained from the website of the World Federation of Exchanges (www.world-exchanges.org/statistics). All countries with stock markets greater than 500 billion US dollars in December 2012 were chosen, and only one ETF per country was selected.

These six ETFs were included in the group of developed nations: EWA (Australia), EWC (Canada), EWG (Germany), EWJ (Japan), EWU (UK), and SPY (USA).

Eight ETFs were included in the group of developing countries: EWZ (Brazil), FXI (China), EPI (India), EWW (Mexico), RSX (Russia), EWS (Singapore), EWY (South Korea), and EWT (Taiwan).

These are the monthly returns for the five-year period between March 2008 and February 2013 (60 months).

This checks the number of observations for each variable. Evaluate the previous command again if the lengths are not all 60.

This plots the data for all the variables.

This plots the price behavior of the six ETFs representing developed countries for the 60-month period.

This plots the price behavior of the eight ETFs representing developing countries for the 60-month period.

According to [2], “to use canonical correlation analysis safely for descriptive purposes requires no distributional assumptions.” However, they still state that “to test the significance of the relationships between canonical variates, (…), the data should meet the requirements of multivariate normality and homogeneity of variance” ([2], p. 339). Is the data normally distributed in this sense?

As can be seen, the null hypothesis of normality cannot be rejected for all variables at the 5% confidence level.

In order to perform the canonical correlation analysis, it is necessary to organize the data into two groups of variables: (representing the developed countries) and (representing the developing countries);

where to represent the developed countries’ ETFs and to represent the developing countries’ ETFs.

In canonical correlation analysis, and , and the problem is to find the “most interesting” linear combinations

for the two sets of variables, that is, those values that maximize

(2) |

Let be the concatenation of the matrices and ,

so

where and are the (empirical) variance-covariance matrices and and are the mean vectors of and , respectively. represents the covariance matrix of and , and is its transpose.

From equation (1) and from the properties

(3) |

(4) |

where and are conformable and is a constant,

(5) |

where

CCA can be performed either on variance-covariance matrices or on correlation matrices. If the random variables and are standardized to have unit variance, the variance-covariance matrix becomes a correlation matrix.

After partitioning the variance-covariance matrix, and given equation (5), the main objective is to solve

(6) |

subject to

To solve this problem, define:

(7) |

A singular value decomposition of gives

(8) |

where

(9) |

(10) |

(11) |

and are column orthonormal matrices , and is a diagonal matrix with positive elements, namely, the eigenvalues of . (For detailed information about singular value decomposition, see [3].) From the property

and from equation (7),

For this solution procedure, the largest eigenvalue of is the canonical correlation of our analysis. and can also be found through

(12) |

(13) |

The problem in this case is to solve the following canonical equations [2, 4]:

(14) |

and

(15) |

where is the identity matrix and is the largest eigenvalue for the characteristic equations

(16) |

and

(17) |

The largest eigenvalue of the product matrices

is the squared canonical correlation coefficient. Furthermore, it can be shown that

(18) |

and

(19) |

which means that only one of the characteristic equations needs to be solved in order to find or .

This transposes the data.

This checks the dimensions of `Z`; it has 60 rows (months) and 14 columns (ETFs).

There are 14 random variables (six in the first set and eight in the second); the dimensions of the submatrices are 60×6 for , 60×8 for , 6×6 for , 6×8 for , 8×6 for , and 8×8 for .

Define `M1` to be the variance-covariance matrix of `Z`. Here are the first seven columns of `M1`.

Partition `M1` into the four submatrices , , , and .

To better understand the relationship between the random variables, here is `M2`, the correlation matrix of `Z`.

This defines `K`.

This performs the singular value decomposition on `K`.

This is the largest eigenvalue of `K`.

This checks by computing the square root of the eigenvalues of

and

according to the second solution procedure. (`Chop` replaces numbers that are close to zero by the exact integer 0.)

Performing a spectral decomposition on and and calculating the square roots of their eigenvalues is another check of the canonical correlation coefficient.

The checks agree.

The last step in this analysis is to find the canonical correlation vectors, which maximize the correlation between the canonical variates. According to equations (12) and (13), this computes the canonical correlation vectors.

The canonical correlation matrix ` B` is computed using , not , because

Given that

the canonical correlation vectors and are the columns of and .

In terms of the canonical correlation vectors, the canonical variates are

where, as before,

Given that

(20) |

only and are needed in order to find . Thus, the only canonical variates needed are and .

The interpretation of canonical correlation coefficients, canonical correlation vectors, and canonical variates is one of the most difficult tasks in the whole analysis. CCA would be better understood relating the original data matrix to the matrix computed using the canonical correlation vectors, which is simply a reduction of the data matrix through linear combinations of its elements. It should be easier to understand that the canonical correlation coefficient is merely the ordinary Bravais-Pearson correlation between the two columns of the reduced matrix.

In principle, one can say that the highest canonical correlation coefficient that was found is the maximum possible correlation between the two columns of the reduced matrix. In this case, it is usual to say that this coefficient represents the relationship between the two datasets, and , in the sense of a correlation measure. Thus, if is the matrix containing the explanatory factors of , the matrix containing the criterion measures (or criterion variables), it is possible to say that the explanatory factors would perfectly explain the criterion variables if . If , the explanatory factors have no influence on the criterion variables, and any value between 1 and 0 is merely an interpolation of these extreme cases.

In the next inputs we will compute and show (partially) the reduced data matrix. In order to demonstrate the validity of the CCA theory, we also compute the correlation for the other (not so interesting for our analysis) canonical variates. We start by defining and .

The first column of our reduced data matrix is .

The first value of , for instance, refers to the linear combination between EWA, EWC, EWG, EWJ, EWU, and SPY for March 2008, such that

We can also define .

Thus, after assigning the values to the canonical variates, , , , and , we have four vectors with the values of the linear combinations of and . Now we can simply compute the Bravais-Pearson correlation between all the canonical variables.

We also verify equation (20).

The correlation between the canonical variates can be better interpreted graphically. First we show the reduced matrix computed using the canonical correlation vectors and , whose canonical correlation coefficient is .

Now we show the reduced matrix computed using the canonical correlation vectors and , whose canonical correlation coefficient is .

Finally, we compute the *canonical loadings*, that is, the correlation between every single ETF and its respective canonical variate.

We can also compute the *canonical cross-loadings*, that is, the correlation between every single ETF and its opposite canonical variate.

It might be of interest to compute the canonical loadings for the *second canonical variate*, that is, the linear combination of variables with correlation coefficient .

Finally, we compute the canonical cross-loadings for the second canonical variate, that is, the linear combination of variables with correlation coefficient .

It is possible to compute canonical loadings and cross-loadings for all the six canonical variates. However, only the first two are shown here for descriptive purposes.

In this section we test the hypothesis of no correlation between the two sets and . An approximation for large was provided in [5]:

(21) |

where

We can also test the hypothesis that the individual canonical correlation coefficients are different from zero:

(22) |

where is a parameter to select the canonical correlation coefficient to be tested.

This defines the Bartlett variable.

This assigns values to the .

We calculate Bartlett’s statistic (equation (21)) to test if the two sets of variables and are uncorrelated. Our hypotheses are:

This computes the 99% quantile of the chi-square distribution with 48 () degrees of freedom, .

**Test Conclusion**: The hypothesis of no correlation between the two sets has to be rejected once the Bartlett statistic (here 249.415) is greater than the 99% quantile of the chi-square distribution with 48 degrees of freedom (here 73.6826).

This article analyzed the relationship between two sets of variables, namely financial assets represented by NYSE-traded country-specific ETFs. The ETFs were divided into two sets representing developed and developing countries. In the first set a total of six ETFs (representing developed countries) were included, while in the second set a total of eight ETFs were included (representing developing countries). Using monthly return data for a five-year period it was possible to show, through canonical correlation analysis (CCA), that there is a significant relationship between these two sets of ETFs. The highest correlation coefficient found in the present study was and, in an analogous manner to statistics in regression analysis, we could interpret its squared value as the explanatory power of the canonical correlation analysis. In other words, the squared canonical correlation coefficient indicates the proportion of variance a dependent variable linearly shares with the independent variable generated from the observed variable’s set (i.e., the canonical variates).

[1] | J. Franke, W. Härdle, and C. Hafner, Einführung in die Statistik der Finanzmärkte, Berlin: Springer Verlag, 2001. |

[2] | W. R. Dillon and M. Goldstein, Chap. 9 in Multivariate Analysis: Methods and Applications, New York: Wiley, 1984. |

[3] | K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis, London: Academic Press, 1979. |

[4] | T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 3rd ed., New York: Wiley, 2003. |

[5] | M. S. Bartlett, “A Note on Tests of Significance in Multivariate Analysis,” Proceedings of the Cambridge Philosophical Society, 35(2), pp. 180-185, 1939. doi:10.1017/S0305004100020880. |

R. L. Malacarne, “Canonical Correlation Analysis,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-6. |

Rodrigo Loureiro Malacarne is a professor of financial mathematics and financial management at the Faculdades Integradas Espirito Santenses (FAESA). His areas of research include statistics of financial markets and financial time series analysis.

**Rodrigo Loureiro Malacarne
**

Faculdades Integradas Espirito Santenses (FAESA)

Av. Vitória, 2.220 – Monte Belo

Vitória, ES, Brazil – CEP 29.053-360

malacarne@gmail.com

Motivated by the computational advantages offered by *Mathematica,* I decided some time ago to embark on collecting and implementing properties of the fascinating geometric figure called the arbelos. I have since been impressed by the large number of surprising discoveries and computational challenges that have sprung out of the growing literature concerning this remarkable object. I recall its resemblance to the lower part of the iconic canopied penny-farthing bicycle of the 1960s TV series *The Prisoner*, Punch’s jester cap (of *Punch and Judy* fame), and a yin-yang symbol with one arc inverted; see Figure 1. There is now an online specialized catalog of Archimedean circles (circles contained in the arbelos) [1] and important applications outside the realm of mathematics and computer science [2] of arbelos-related properties.

Many famous names are involved in this fascinating theme, among them Archimedes (killed by a Roman soldier in 212 BC), Pappus (320 AD), Christian O. Mohr (1835-1918), Victor Thébault (1882-1960), Leon Bankoff (1908-1997), and Martin Gardner (1914-2010). Recently, they have been succeeded by Clayton Dodge, Peter Y. Woo, Thomas Schoch, Hiroshi Okumura, and Masayuki Watanabe, among others.

Leon Bankoff was the person who stimulated the extraordinary attention on the arbelos over the last 30 years. Schoch drew Bankoff’s attention to the arbelos in 1979 by discovering several new Archimedean circles. He sent a 20-page handwritten note to Martin Gardner, who forwarded it to Bankoff, who then gave a 10-chapter manuscript copy to Dodge in 1996. Due to Bankoff’s death, a planned joint work was interrupted until Dodge reported some discoveries [3]. In 1999 Dodge said that it would take him five to ten years to sort all the material in his possession, then filling three suitcases. Currently this work is still forthcoming. Not surprisingly, like Volume 4 of *The Art of Computer Programming*, it appears that important work needs a substantial time to be developed.

**Figure 1.** *The Prisoner’*s penny-farthing bicycle, Punch and Judy, a physical arbelos.

The arbelos (“shoemaker’s knife” in Greek) is named for its resemblance to the blade of a knife used by cobblers (Figure 1). The arbelos is a plane region bounded by three semicircles sharing a common baseline (Figure 2). Archimedes appears to have been the first to study its mathematical properties, which he included in propositions 4 through 8 of his *Liber assumptorum* (or* Book of Lemmas*). This work might not be entirely by Archimedes, as was recently revealed through an Arabic translation of the *Book of Lemmas* that mentions Archimedes repeatedly without fully recognizing his authorship (some even believe this work to be spurious [4]). The *Book of Lemmas* also contained Archimedes’s famous *Problema Bovinum* [5].

This article aims at systematically enumerating selected properties of the arbelos, without attempting to be exhaustive. Our purpose is to develop a uniform computational methodology in order to tackle those properties in a pedagogical setting. A sequence of properties is arranged and subsequently verified by testing the computationally equivalent predicates. This work includes some discoveries and extensions contributed by the author.

We refer to the largest semicircle as the *top arc* and the two small ones as the left and right *side* *arcs,* or just the *side* *arcs* when there is no need to distinguish them. We use and to denote their respective radii (the top arc thus has radius ). A *segment* between two points is an undirected line segment going from one point to the other, while a *line* through two points is the infinite straight line through the two points. A traditional abuse of notation uses for both the line segment joining the points and and the length of the segment, depending on the context; modern usage is to write for the length of the segment.

This function displays the arbelos.

This draws the basic arbelos.

**Figure 2.** The arbelos.

**Property 1**

In other words, the total length of the side arcs equals the length of the top arc. This property is related to an intriguing paradox [6].

**Property 2**

This was lemma 4 of the *Book of Lemmas *(see Figure 3) [7, 8].

These two properties are easily verified by simultaneously testing two equalities.

The function `drawpoints` is used to display specific points as red disks.

**Figure 3.** The area of the circle of diameter (the radical circle) is equal to the area of the arbelos.

The circle in Figure 3 is called the *radical circle* of the arbelos and the line is its *radical axis* (this terminology will be clarified in Generalizations). To illustrate properties 3-11 and 25, 26, we draw and label points and show some coordinates, lines, and circles in Figure 4.

**Figure 4.** Labels, coordinates, lines, and circles referred to in properties 3 through 11 and 25, 26.

**Property 3**

The lines and are orthogonal and intersect the side arcs at points and , joining a common tangent to the side arcs.

To verify the orthogonality of the lines and , we take the inner product of the vectors and .

We employ the following result to obtain the slopes at the points and .

**Theorem 1**

The function `PQ` finds the coordinates of the tangent points and by solving a system of four equations, which places them on the arcs and sets their tangent slopes according to theorem 1.

Besides `PQ`, other definitions in this article for points and quantities are: `VWS`, `HK`, `U`, `EF`, `IJr`, and `LM`.

The function `dSq` computes the square of the distance between two given points.

**Property 4**

As is a diameter of the radical circle, we only need to verify the equality of the distances of and to the center of the radical circle, namely the point .

**Property 5**

Let the line intersect the top arc at points and . Then and lie on a circle with center and radius .

We get the coordinates of the points and by solving a system of equations that places them on the top arc and on the line .

This verifies property 5 by checking that the distances of and to are the same as the distance from to .

**Property 6**

This is equivalent to the fact that the determinant (cross product) of the vectors and is zero.

**Property 7**

This is equivalent to the fact that the inner product of the vectors and is zero.

Let us use the notation for a circle with center and radius .

**Property 8**

The inversion of a point in the circle , is defined to be the unique point such that [9]. The function `inversion` implements this idea.

This verifies property 8, recalling the coordinates of are .

**Property 9**

Let be the circle of inversion. The points , , invert to themselves. The segment inverts to the arc and the segment inverts to the arc . The arcs and invert to themselves. The radical circle inverts to the line .

**Property 10**

This is the same as claiming that the corresponding arcs are orthogonal to the radical circle. By property 8, the arcs are orthogonal to the circle with diameter as they pass through inverse pairs [10, 11].

**Property 11**

This is one of Bankoff’s surprises [12, 13, 14]. As all four points are on the radical circle, we need to verify only that bisects .

The following `Manipulate` illustrates properties 3-11. The easiest way to define the points `P`, `Q`, `H`, `K` is to copy and paste the formulas for them.

Now consider the circle tangent to the side arcs and the top arc, the *incircle* with tangent points , , and as shown in Figure 5 [15, 16]. We also consider points and at the tops of the side arcs.

**Figure 5.** The incircle and coordinates, lines, and points referred to in properties 12 through 15.

Proposition 6 of the *Book of Lemmas* included the value of , the radius of the incircle. The function `U` calculates the coordinates of the center and the radius .

The coordinates of the tangent points , , and are obtained as the intersections of the lines joining the centers of the three arcs of the arbelos and the incircle.

**Property 12**

The points , , and are collinear. The points , , and are collinear. The lines and intersect in a point lying on the incircle.

Using the criterion of the determinant to check for collinearity, we verify the first two claims.

Let be the point of intersection of the lines and . Confirming that its distance to is equal to verifies the third claim.

**Property 13**

The points , , , and are on a circle with center . Similarly, the points , , , and are on a circle with center .

The following `Manipulate` illustrates property 13 [17]. The option for showing the Bankoff circle as the incircle of the triangle joining the center of the arcs and the incircle corresponds to property 23.

**Property 14**

Let be the diameter of the incircle parallel to and let be the projection of onto . The rectangle between the segments and is a square.

This property is illustrated in the next `Manipulate` and is readily verified here.

**Property 15**

Let and be the intersections of the lines and with the side arcs. Then is a square of almost the same size as the one mentioned in property 14.

First we obtain points and as the intersections of their respective lines and their respective arcs, and keep the result in the variable `replaceEF`.

We verify property 15 by setting to be equal to the vector obtained by rotating around by 90° and setting to be equal to the vector obtained by translating by .

Assuming and, the following plot compares the sizes of the two squares.

This `Manipulate` illustrates properties 14 and 15.

Consider the two gray circles tangent to the radical axis, a side arc, and the top arc in Figure 6. They are called *the twins*, or the *Archimedean circles*. Due to the following remarkable property, they have been extensively studied. We collect many of their extraordinary occurrences in our list of properties [3, 18, 19].

**Figure 6.** The twins.

**Property 16**

The two circles tangent to the radical axis, the top arc, and one of the side arcs of an arbelos have the same radius.

This property appeared as proposition 5 in the *Book of Lemmas*. Solving the following system of six equations finds the values of the radii, verifies they are equal, and computes the centers , .

These four solutions give the centers in pairs: , , , , where and are the reflections of and in the diameter of the arbelos; only the last expression is valid. The result also shows that the twins are indeed of the same radius . Any circle with radius equal to the twins’ radius is called *Archimedean*. A nice interpretation of arises when considering and as resistances: then is the resistance resulting from connecting and in parallel; that is, . The function `IJr` computes the value of the centers and the common value of the radius of the twins.

**Property 17**

Consider a circle tangent to both twins, with center at point and radius . Then there are two possible values of .

To find the extrema of , we set the derivative of each of the above expressions to zero and solve for .

So the centers of the smallest and largest circles tangent to the twins lie on the radical axis. Moreover, they are concentric, as this result confirms.

Thus, by using property 2, we confirm that the largest tangent circle, which is the smallest enclosing the twins, satisfies property 17. The following `Manipulate` shows the circles tangent to the twins as you vary the radius of the left side arc.

The following plot compares the radii of the two circles tangent to the twins with centers on the radical axis.

**Figure 7.** Labels and lines referred to in properties 18 through 24.

**Property 18**

The common tangent of the left arc and its tangent twin at passes through . Similarly, the common tangent of the right arc and its tangent twin at passes through (see Figure 7).

This computes the tangent points and .

By using theorem 1, we verify both claims.

**Property 19**

We verify both claims simultaneously.

However, the points , , and are not on a circle centered at , nor are the points , , and on a circle centered at ; otherwise, the following expression would be zero.

**Property 20**

As the length of the segment is the ordinate of and the length of the segment is the ordinate of , we only need to verify that the midpoints of those segments lie on the mentioned lines by checking slopes.

**Property 21**

Those circles are the fourth and fifth Archimedean circles discovered by Bankoff [20]. In order to verify this property, we use the following result [21]:

**Theorem 2**

This directed distance is positive if the triangle is traversed counterclockwise and negative otherwise. The function `dAB` implements this.

Let and be the center and radius of the blue circle on the left side of point in Figure 7. Solving the following system finds the value of .

Similarly, this calculates the radius of the blue circle to the right of , which equals .

Thus, both circles are Archimedean as claimed. The following `Manipulate` shows the twins and these two other circles.

**Property 22**

Archimedes discovered the original twins; Bankoff improved on this by discovering this third circle in 1950 [22]. The coordinates of the center of the Bankoff circle are obtained by equating the distances of to the points , , and .

**Property 23**

The Bankoff circle is the incircle of the triangle formed by joining the centers of the side arcs and the center of the incircle of the arbelos.

Using theorem 2 to compute the distance of to the sides of the triangle, we verify this property (as `dAB` computes a directed distance, the order of the arguments describing the line is important).

**Property 24**

This computes the values of and .

The circle is the one where the ordinate of is positive. Note that is not on the radical axis.

**Property 25**

The circles and tangent to the radical axis, one passing through and the other passing through the point , are both Archimedean (see Figure 4).

**Property 26**

A circle with center and radius tangent to the line is such that the distance from to is

, so this equation holds:

Because the circle passes through ,

Because the circle is tangent to the top arc,

This input uses explicit expressions for , , and that satisfy these three equations.

**Property 27**

Consider the two (red) segments connecting the center of the top arc to the top points and of the left and right arcs of the arbelos. These segments have the same length and are orthogonal. The tangent circles and at and to those lines and the top arc are Archimedean (see Figure 8).

This property was discovered in the summer of 1998 [23].

**Figure 8.** The two pairs of Archimedean circles from property 27.

We have seen that there are some Archimedean circles other than the twins, namely the Bankoff circle and those mentioned in properties 21 through 27. There are also *non-Archimedean twins*, that is, pairs of circles of the same radius, different than that of the twins, appearing at significant places within the arbelos.

The discovery of the *slanted twins *arose from the initial assumption that, besides being tangent to either side arc and the top arc, the two circles-to-be-twins could be tangent to themselves and not necessarily to the radical axis. Clearly there are an infinite number of solutions if we do not require these circles to be of equal radius. The idea was that if we started by assuming they are of equal radius, we might end up discovering they are tangent to the radical axis. This turned out not to be the case. Let us consider circles with centers at the points and with common radius . The value of can be obtained by solving a system of five equations.

These expressions involve square roots differing in sign. The ones using the plus sign diverge at and are rejected.

The other one converges.

We conclude that the slanted twins are indeed congruent and that their common radius is

The following comparison between the radii of the twins and the slanted twins shows that their difference turns out to be very small.

This gives the coordinates of the centers of the slanted twins.

The following `Manipulate` shows the slanted twins and, optionally, the twins, as you vary .

In this section we generalize the shape of an arbelos by allowing the arcs to cross and by considering a 3D version. To set the context of the first of those generalizations, we need the concept of the *radical axis of two circles*.

Let be a point and be the circle . The *power* of with respect to is defined to be the real number . The power of is positive, zero, or negative depending on whether lies outside, on, or inside [12]. Let ; if the points of satisfy the equation , then an alternative way to define the power of is to evaluate . (A similar result applies if , when the circle degenerates to a line, in which case the sign of indicates whether is above, on, or below the line.)

Here is a very interesting property of the power of a point. Given a circle and a point , choose an arbitrary line through meeting the circle at points and . Then the product depends only on —it is independent of the choice of line through . This product is equal to the power of .

In the following `Manipulate`, drag the four locators to vary the size of the circle, the position of , and the slope of the line through .

Given two circles with different centers, their *radical axis* is defined to be the line consisting of all points that have equal powers with respect to each of the two circles. Proofs of the following can be found in [10].

**Theorem 3**

If two circles intersect at two points and , then their radical axis is the common secant . If two circles are tangent at , then their radical axis is their common tangent at .

**Corollary 1**

Given three circles with noncollinear centers, the three radical axes of the circles taken in pairs are distinct concurrent lines.

**Theorem 4**

The radical axis of two circles is the locus of points from which tangents drawn to both circles have the same length.

The following `Manipulate` shows two circles; one is fixed, and you can vary the center and size of the other one by dragging the locator or changing its radius with the slider. You can use the other slider to move the red point on the radical axis to illustrate theorem 4.

The following `Manipulate` illustrates two generalizations.

**Property 28**

The inscribed circles tangent to the radical axis of the side arcs and the top arc and either of the arcs of the generalized arbelos have the same radius.

Let be the length of the *gap* between the bases (so that the diameter of the top arc is ) and let be the abscissa of the intersection of the radical axis with the axis, assuming the origin is at the leftmost point of the arbelos [10].

**Theorem 5**

With the help of this theorem, we compute the value of .

We can assume without loss of generality that , , and ( can be negative). Let the inscribed circles be and . The values of these parameters are obtained as follows.

Then, although some centers can be disregarded, the radius is the same in all cases.

Finally, here are three more properties of the arbelos. See if you can guess what property is involved by experimenting with the controls [24, 25].

This first `Manipulate` lets you move the side arcs in a systematic way.

This second `Manipulate` lets you rotate a line around the point of tangency of the side arcs.

Finally, the third `Manipulate` shows an infinite family of twins.

[1] | F. van Lamoen. “Online Catalogue of Archimedean Circles.” (Jan 22, 2014) home.planet.nl/~lamoen/wiskunde/arbelos/Catalogue.htm. |

[2] | S. Garcia Diethelm. “Planar Stress Rotation” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/PlanarStressRotation. |

[3] | C. W. Dodge, T. Schoch, P. Y. Woo, and P. Yiu, “Those Ubiquitous Archimedean Circles,” Mathematical Magazine, 72(3), 1999 pp. 202-213. www.jstor.org/stable/2690883. |

[4] | H. P. Boas, “Reflection on the Arbelos,” American Mathematical Monthly, 113(3), 2006 pp. 236-249. |

[5] | H. D. Dörrie, 100 Great Problems of Elementary Mathematics: Their History and Solution (D. Antin, trans.), New York: Dover Publications, 1965. |

[6] | J. Rangel-Mondragón. “Recursive Exercises II: A Paradox” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/RecursiveExercisesIIAParadox. |

[7] | R. B. Nelsen, “Proof without Words: The Area of an Arbelos,” Mathematics Magazine, 75(2), 2002 p. 144. |

[8] | A. Gadalla. “Area of the Arbelos” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/AreaOfTheArbelos. |

[9] | J. Rangel-Mondragón, “Selected Themes in Computational Non-Euclidean Geometry. Part 1. Basic Properties of Inversive Geometry,” The Mathematica Journal, 2013. www.mathematica-journal.com/2013/07/selected-themes-in-computational-non-euclidean-geometry-part-1. |

[10] | D. Pedoe, Geometry: A Comprehensive Course, New York: Dover, 1970. |

[11] | M. Schreiber. “Orthogonal Circle Inversion” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/OrthogonalCircleInversion. |

[12] | M. G. Welch, “The Arbelos,” Master’s thesis, Department of Mathematics, University of Kansas, 1949. |

[13] | L. Bankoff, “The Marvelous Arbelos,” The Lighter Side of Mathematics (R. K. Guy and R. E. Woodrow, eds.), Washington, DC: Mathematical Association of America, 1994. |

[14] | G. L. Alexanderson, “A Conversation with Leon Bankoff,” The College Mathematics Journal, 23(2),1992 pp. 98-117. |

[15] | S. Kabai. “Tangent Circle and Arbelos” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/TangentCircleAndArbelos. |

[16] | G. Markowsky and C. Wolfram. “Theorem of the Owl’s Eyes” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/TheoremOfTheOwlsEyes. |

[17] | P. Y. Woo, “Simple Constructions of the Incircle of an Arbelos,” Forum Geometricorum, 1, 2001 pp. 133-136. forumgeom.fau.edu/FG2001volume1/FG200119.pdf. |

[18] | B. Alpert. “Archimedes’ Twin Circles in an Arbelos” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/ArchimedesTwinCirclesInAnArbelos. |

[19] | J. Rangel-Mondragón. “Twins of Arbelos and Circles of a Triangle” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/TwinsOfArbelosAndCirclesOfATriangle. |

[20] | H. Okumura, “More on Twin Circles of the Skewed Arbelos,” Forum Geometricorum, 11, 2011 pp. 139-144. forumgeom.fau.edu/FG2011volume11/FG201114.pdf. |

[21] | E. W. Weisstein. “Point-Line Distance—2-Dimensional” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/Point-LineDistance2-Dimensional.html. |

[22] | L. Bankoff, “Are the Twin Circles of Archimedes Really Twins?,” Mathematics Magazine, 47(4), 1974 pp. 214-218. |

[23] | F. Power, “Some More Archimedean Circles in the Arbelos,” Forum Geometricorum, 5, 2005 pp. 133-134. forumgeom.fau.edu/FG2005volume5/FG200517.pdf. |

[24] | A. V. Akopyan, Geometry in Figures, CreateSpace Independent Publishing Platform, 2011. |

[25] | H. Okumura and M. Watanabe, “Characterizations of an Infinite Set of Archimedean Circles,” Forum Geometricorum, 7, 2007 pp. 121-123. forumgeom.fau.edu/FG2007volume7/FG200716.pdf. |

J. Rangel-Mondragón, “The Arbelos,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-5. |

Jaime Rangel-Mondragón received M.Sc. and Ph.D. degrees in applied mathematics and computation from the University College of North Wales in Bangor, UK. He has been a visiting scholar at Wolfram Research, Inc. and has held positions in the Faculty of Informatics at UCNW, the College of Mexico, the Center for Research and Advanced Studies, the Monterrey Institute of Technology, the Queretaro Institute of Technology, and the University of Queretaro in Mexico, where he is presently a member of the Faculty of Informatics. His current research includes combinatorics, the theory of computing, computational geometry, urban traffic, and recreational mathematics.

**Jaime Rangel-Mondragón**

*UAQ, Facultad de Informatica
Queretaro, Qro. Mexico*

Generally, to carry out a regression procedure one needs to have a model , an error definition , and the probability density function of the error . Considering the set as measurement points, the maximum likelihood approach aims at finding the parameter vector that maximizes the likelihood of the joint error distribution. Assuming that the measurement errors are independent, we should maximize (see eg. [1])

(1) |

Instead of maximizing this objective, we minimize

(2) |

Consider the Gaussian-type error distribution as ; then our estimator is

(3) |

In our case the model is a line,

(4) |

It can be seen that (in the case of Gaussian-type measurement noise) only the type of the error model determines the parameter values, since we should always minimize the least squares of the errors. There are different error models, which can be applied to fitting a line in a least-squares sense. The error model frequently employed, assuming an error-free independent variable , is the ordinary least squares model ()

(5) |

Similarly, one may also consider an error-free dependent variable . Then the error model () is

(6) |

These approaches are called the *algebraic approach*.

Another error model considers the geometrical distance between the data point and the line to be fitted. This type of fitting is also known as *orthogonal regression*, since the distances of the sample points from the line are evaluated by computing the orthogonal projection of the measurements on the line itself. The error in this case [2] is

(7) |

This *geometrical approach* or *total least squares* () approach can also be considered as an optimization problem with constraints; namely, one should minimize the errors in both variables [3]:

(8) |

under the conditions

(9) |

In addition, one can also combine and to construct an error model. The first possibility is to consider the geometric mean of these two types of errors,

(10) |

These error models are illustrated in Figure 1.

**Figure 1.** The different error models in the case of fitting a straight line.

This model is also called the *least* *geometric mean deviation* approach or model (see [4)]. As a second possibility, one may consider and as competing functions of the parameters and find their Pareto-front representing a set of optimal solutions for the parameters . Since this multi-objective problem is convex, the objective can be expressed as a linear combination of these error functions, namely

(11) |

where is a parameter, , and the set of optimal solutions of the parameters belonging to the Pareto-front is . You can choose the value of depending on your trade-off preference between and [5].

Symbolic computation can be used to avoid direct minimization and to get an explicit formula for the estimated parameters. We apply the *Mathematica* function `SuperLog` developed in [6], which uses pattern matching that enhances *Mathematica*’s ability to simplify expressions involving the natural logarithm of a product of algebraic terms.

Let us activate this function.

Then this is the ML estimator for Gaussian-type noise.

Now let us consider the problem.

Here are the necessary conditions for the optimum.

Let us introduce the following constants:

(12) |

(13) |

(14) |

(15) |

(16) |

In those terms, here are the necessary conditions for the optimum.

Then this is the optimal solution of the parameters.

Although the equation system for the parameters of is linear, for other error models we get a multivariable algebraic system. Now consider the problem. Here is the maximum likelihood function.

Therefore here is the equation system to be solved.

Since , the conditions are as follows.

A Gröbner basis solves this system, eliminating .

Since the second equation is linear, it is reasonable to compute first, then .

The error model also leads to a second-order polynomial equation system. Now here is the ML estimator.

Consequently, here is the system to be solved for the parameters.

Assume .

Again a Gröbner basis gives a second-order system.

When is known, the other parameter can be computed.

In the case of the Pareto approach, the system is already fourth order.

Here is the system.

Here is the system in compact form.

Here is the Gröbner basis for the first parameter.

Assume that .

After solving this polynomial for , the other parameter can be solved from the second equation, which is linear in .

Consider some data on rainfall (in mm) and the resulting groundwater level changes (in cm) from a landslide along the Ohio River Valley near Cincinnati, Ohio [7].

There are 14 measurements.

This displays the measured data.

**Figure 2.** The measured data: rainfall versus water level change in dimensional form.

The constants , , , , and in equations (12) to (16) are needed.

This separates the data.

This transforms the data into dimensionless form.

**Figure 3.** The measured data: rainfall versus water level change in dimensionless form.

Now the constants can be computed.

Here are the estimated parameters employing the explicit solutions.

This checks the result.

Figure 4 shows the estimated line with the sample points.

**Figure 4.** The sample points with the line estimated with .

Here are the first and second parameters.

Here is a check of this result on the basis of the definition. Equation (8) gives the objective function.

The constraints are .

The unknown variables are not only the parameters, but the adjustments as well.

This uses a built-in global optimization method. (This takes a long time to compute.)

The estimation gives a result quite different from the model; see Figure 5.

**Figure 5.** The lines estimated with the (red) and (green) models.

Since the constraints are linear, the optimization can be written in unconstrained form, reducing the original number of variables to .

Now here is the first parameter.

This uses the result.

Here is a numerical check of the objective.

Figure 6 shows this result together with the and models.

**Figure 6.** The lines estimated with the (red), (green), and (blue) models.

The first parameter is a fourth-order polynomial.

The best trade-off between and is to let .

This is the real positive solution.

Using this value gives the second parameter.

We compute the solution using direct global minimization. Here is the objective.

This gives the result.

Figure 7 shows this solution with the results of the other models.

**Figure 7.** The lines estimated with the (red), (green), and (blue) models, and the Pareto approach with (magenta).

The numerical computations show that the formulas developed by an ML estimator via symbolic computation to determine the parameters of a straight line to be fitted provide correct results and require considerably less computation time than the direct methods based on global minimization of the residuals. Our examples also illustrate that the , , and Pareto approaches give more realistic solutions than the traditional , since Figure 7 shows there are at least two outliers in the sample set.

[1] | W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C, 2nd ed., Cambridge: Cambridge University Press, 1992. |

[2] | M. Zuliani. “RANSAC for Dummies.” (Jan 10, 2014) vision.ece.ucsb.edu/~zuliani/Research/RANSAC/docs/RANSAC4Dummies.pdf. |

[3] | B. Schaffrin, “A Note on Constrained Total Least-Squares Estimation,” Linear Algebra and Its Applications, 417(1), 2006 pp. 245-258. doi:10.1016/j.laa.2006.03.044. |

[4] | C. Tofallis, “Model Fitting for Multiple Variables by Minimising the Geometric Mean Deviation,” in Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications (S. Van Huffel and P. Lemmerling, eds.), Dordrecht: Kluwer, 2002. |

[5] | B. Paláncz and J. L. Awange, “Application of Pareto Optimality to Linear Models with Errors-in-All-Variables,” Journal of Geodesy, 86(7), 2012 pp. 531-545.doi:10.1007/s00190-011-0536-1. |

[6] | C. Rose and M. D. Smith, “Symbolic Maximum Likelihood Estimation with Mathematica,” The Statistician, 49(2), 2000 pp. 229-240. www.jstor.org/stable/2680972. |

[7] | W. C. Haneberg, Computational Geosciences with Mathematica, Berlin: Springer, 2004. |

B. Paláncz, “Fitting Data with Different Error Models,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-4. |

Béla Paláncz received his D.Sc. degree in 1993 from the Hungarian Academy of Sciences and has wide-ranging experience in teaching and research (RWTH Aachen, Imperial College London, DLR Köln, and Wolfram Research). His main research fields are mathematical modeling and symbolic-numeric computation.

**Béla Paláncz**

*Department of Photogrammetry and Geoinformatics,
Budapest University of Technology and Economics
1521 Budapest, Hungary *

The Karush-Kuhn-Tucker equations (under suitable assumptions) provide necessary and sufficient conditions for the solution of the problem of maximizing (minimizing) a concave (convex) function.

For an excellent reference, see the tutorial in [2]. Here we modify the code of [1] by correcting minor typos, simplifying, and letting the user specify restrictions on the exogenous parameters of the model.

The inputs of `KT` are the objective function to be maximized, the list of constraints, and the list of choice variables. Here is an example from consumer choice theory: maximize a utility function, subject to a budget constraint.

Several of the solutions do not make economic sense, because they do not use the fact that the income, price of good , and price of good are all positive. However, `KT` lets the user specify restrictions on the exogenous parameters of the model.

An important advantage of `KT` over other optimization functions (such as `Maximize` or `Minimize`) is that `KT` returns the value of the Kuhn-Tucker multipliers. These multipliers have an important economic interpretation: they are shadow prices for the constrained resources. In the above example, for instance, the value of is the “infinitesimal” increment in the utility function of the consumer that is generated when the budget constraint is relaxed by increasing the consumer’s income by an “infinitesimal amount.”

Nash equilibrium is the main solution concept in game theory. It is a crucial tool for economics and political science models. Essentially, a Nash equilibrium is a profile of strategies (one strategy for each player), such that if a player takes the choices of the others as given (i.e. as parameters), then the player’s strategy must maximize his or her payoff.

The function `Nash` takes as input the payoff function of player 1, the payoff function of player 2, and the actions available to players 1 and 2. It returns the entire set of Nash equilibria.

There are many versions of Colonel Blotto’s game; this is a simple one taken from [3]. General A (row player) has three divisions to defend a city; she has to choose how many divisions to place at the north road and how many divisions at the south road. General B (column player) has two divisions to try to invade the city; he also has to choose how many divisions to be assigned to the north road and how many to the south road. If General A has at least as many divisions as General B at a given road, General A wins the battle there (defense is favored in the case of a tie). To win the game, however, A must defeat B on both battlefields. Thus, A has four possible strategies and B has three strategies. The table below summarizes the players’ strategies and payoffs (, for the whole campaign). For example, in the first row and first column the entry is , which means A won and B lost; A chose three divisions for the north road and none for the south road; B chose two for the north and none for the south. Because and , A won both battles.

A Nash equilibrium for this game is a probability distribution over strategies; use `P` for the probabilities chosen by General A and `Q` for the probabilities chosen by General B.

The game has many Nash equilibria, but we still can make predictions: General B is never going to spread his forces evenly (the probability of his second strategy is zero in any equilibrium, ); with probability , B’s two divisions are placed at the north road () and with probability , they are placed at the south road (). As for General A, the probability that she places all of her three divisions on one front is less than half (i.e. and ). Also, the probability that General A places two or more divisions at the north (or south) is always equal to half (i.e. and ).

This game is also borrowed from [3]. A deck has two cards, one high and one low. Each player places one dollar into the pot. Player 1 gets one card from the deck. Player 2 does not see Player 1’s card. Player 1 decides whether to raise (by placing another dollar in the pot) or not raise. Player 2 observes 1’s action and then has to decide whether to match the bet or fold. If Player 2 folds, then Player 1 wins the contents of the pot. However, if Player 2 matches, Player 2 places another dollar into the pot if Player 1 had previously raised. Player 1 reveals her card. If it is the high card, Player 1 wins the pot; otherwise, Player 2 wins it.

See Figure 1 for the corresponding game tree. We introduce a fictitious player, Nature, who randomly decides if the card is high or low. We depict the bimatrix representation of the game. Player 1 has four strategies: always raise (RR), always not raise (NN), raise if the card is high and not otherwise (RN), and not raise if the card is high and raise otherwise (NR). Player 2 also has four strategies: always match (MM), always fold (FF), match only if Player 1 raised (MF), and fold only if Player 1 raised (FM). For simplicity, in the bimatrix representation, we write the expected payoffs of Player 1 and omit Player 2’s payoffs (this is without loss of generality in zero-sum games).

**Figure 1.** Game tree of the card game.

In this case, the Nash equilibrium delivers a sharp prediction. When Player 1 has the high card, she always raises (), but when she has the low card, she bluffs with probability (the probability of RR is ). When Player 1 does not raise, Player 2 always matches (). If Player 1 raises, Player 2 still may match, but with probability (the probability of always matching MM is ).

We extended the code of [1] to solve for Kuhn-Tucker conditions with additional assumptions on parameters and, more importantly, using the Kuhn-Tucker equations we provide a program to compute all the Nash equilibria of finite bimatrix games.

We presented a program to compute the set of all Nash equilibria in finite bimatrix games. Its intended goal is as a classroom tool for students and instructors. Needless to say, the code is not efficient. For larger inputs (say bimatrix games with five or more actions per player), `Reduce` often fails to solve the system of Kuhn-Tucker equations. For optimizing algorithms, we suggest [4]. Nevertheless, with continuous improvement of hardware and algorithms for solving semialgebraic systems (see [5]), these methods may become useful for research applications sooner than we think. Finally, as algorithmic game theory courses become more popular in computer science departments, it seems that the time to bring computational methods and algorithms to economics departments is already overdue.

[1] | F. J. Kampas, “Tricks of Using Reduce to Solve Khun-Tucker Equations,” The Mathematica Journal, 9(4), 2005 pp. 686-689.www.mathematica-journal.com/issue/v9i4/contents/Tricks9-4/Tricks9-4_ 2.html. |

[2] | M. J. Osborne. “Optimization: The Kuhn-Tucker Conditions for Problems with Inequality Constraints,” from Mathematical Methods for Economic Theory: A Tutorial. (Jan 8, 2014)www.economics.utoronto.ca/osborne/MathTutorial/MOIF.HTM. |

[3] | M. Osborne, “An Introduction to Game Theory,” New York: Oxford University Press, 2004. |

[4] | R. D. McKelvey, A. M. McLennan, and T. L. Turocy. “Gambit: Software Tools for Game Theory.” (Jan 8, 2014) www.gambit-project.org. |

[5] | Wolfram Research, “Real Polynomial Systems” from Wolfram Mathematica Documentation Center—A Wolfram Web Resource.reference.wolfram.com/mathematica/tutorial/RealPolynomialSystems.html. |

S. O. Parreiras, “Using Reduce to Compute Nash Equilibria,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-3. |

Sérgio O. Parreiras is an associate professor at the UNC-Chapel Hill Department of Economics. His research focus is on game theory and its applications to auctions, mechanism design, and contests. He is also interested in computational economics, general equilibrium theory, algorithmic game theory, and evolutionary anthropology.

**Sérgio O. Parreiras**

*UNC, Department of Economics
Gardner Hall, 200B
Chapel Hill, N.C. 27599-3305
*

The infinite Fibonacci word,

is certainly one of the most studied words in the field of combinatorics on words [1-4]. It is the archetype of a Sturmian word [5]. The word can be associated with a fractal curve with combinatorial properties [6-7].

This article implements *Mathematica* programs to generate curves from and a set of drawing rules. These rules are similar to those used in L-systems.

The outline of this article is as follows. Section 2 recalls some definitions and ideas of combinatorics on words. Section 3 introduces the Fibonacci word, its fractal curve, and a family of words whose limit is the Fibonacci word fractal. Finally, Section 4 generalizes the Fibonacci word and its Fibonacci word fractal.

The terminology and notation are mainly those of [5] and [8]. Let be a finite alphabet, whose elements are called symbols. A word over is a finite sequence of symbols from . The set of all words over , that is, the free monoid generated by , is denoted by . The identity element of is called the empty word. For any word , denotes its length, that is, the number of symbols occurring in . The length of is taken to be zero. If and , then denotes the number of occurrences of in .

For two words and in , denote by the concatenation of the two words, that is, . If , then ; moreover, by denote the word ( times). A word is a subword (or factor) of if there exist such that . If , then and is called a prefix of ; if , then and is called a suffix of .

The reversal of a word is the word and . A word is a palindrome if .

An infinite word over is a map , written as . The set of all infinite words over is denoted by .

**Example 1**

The word , where if is a prime number and otherwise, is an example of an infinite word. The word is called the characteristic sequence of the prime numbers. Here are the first 50 terms of .

**Definition 1**

There is a special class of words with many remarkable properties, the so-called Sturmian words. These words admit several equivalent definitions (see, e.g. [5], [8]).

**Definition 2**

Let . Let , the complexity function of , be the map that counts, for all integer , the number of subwords of length in . An infinite word is a Sturmian word if for all integer .

For example, .

Since for any Sturmian word, , Sturmian words have to be over two symbols. The word in example 1 is not a Sturmian word because .

Given two real numbers , with irrational and , , define the infinite word as . The numbers and are the slope and the intercept, respectively. This word is called mechanical. The mechanical words are equivalent to Sturmian words [5]. As a special case, gives the characteristic words.

**Definition 3**

On the other hand, note that every irrational has a unique continued fraction expansion

where each is a positive integer. Let be an irrational number with and for . To the directive sequence , associate a sequence of words defined by , , , .

Such a sequence of words is called a standard sequence. This sequence is related to characteristic words in the following way. Observe that, for any , is a prefix of , which gives meaning to as an infinite word. In fact, one can prove that each is a prefix of for all and [5].

**Definition 4**

Fibonacci words are words over defined inductively as follows: , , and , for . The words are referred to as the finite Fibonacci words. The limit

(1) |

It is clear that , where is the Fibonacci number, recalling that the Fibonacci number is defined by the recurrence relation for all integer and with initial values . The infinite Fibonacci word is a Sturmian word [5]; exactly, , where is the golden ratio.

Here are the first 50 terms of .

**Definition 5**

The Fibonacci word satisfies and for all .

Here are the first nine finite Fibonacci words.

**Definition 6**

The following proposition summarizes some basic properties about the Fibonacci word.

**Proposition 1**

- The words 11 and 000 are not subwords of the Fibonacci word.
- Let be the last two symbols of . For , if is even and if is odd.
- The concatenation of two successive Fibonacci words is almost commutative; that is, and have a common prefix of length , for all .
- is a palindrome for all .
- For all , , where ; that is, exchanges the two last symbols of .

The Fibonacci word can be associated with a curve using a drawing rule. A particular action follows on the symbol read (this is the same idea as that used in L-systems [9]). In this case, the drawing rule is called “*the odd-even drawing rule*” [7].

**Definition 7**

The Fibonacci curve, denoted by , is the result of applying the odd-even drawing rule to the word . The Fibonacci word fractal is defined as

The program `LShow` is adapted from [10] to generate L-systems.

Figure 1 shows an L-system interpretation of the odd-even drawing rule.

**Figure 1. **Interpretation of the odd-even drawing rule.

Here are the curves for .

The next proposition about properties of the curves and comes directly from the properties of the Fibonacci word from Proposition 1. More properties can be found in [7].

**Proposition 2**

- is composed only of segments of length 1 or 2.
- The number of turns in the curve is the Fibonacci number .
- The curve is similar to the curve .
- The curve is symmetric.
- The curve is composed of five curves: , where is the result of applying the odd-even drawing rule to the word .

The next figure shows the curve and the five curves; here .

The Fibonacci word and other words can be derived from the dense Fibonacci word, which was introduced in [7].

**Definition 8**

(2) |

Given a drawing rule, the global angle is the sum of the successive angles generated by the word through the rule. With the natural drawing rule, , , , then .

For a drawing rule, the resulting angle of a word is the function that gives the global angle. A morphism preserves the resulting angle if for any word , ; moreover, a morphism inverts the resulting angle if for any word , .

The dense Fibonacci word is strongly linked to the Fibonacci word fractal because can generate a whole family of curves whose limit is the Fibonacci word fractal [7]. All that is needed is to apply a morphism to that preserves or inverts the resulting angle.

Here are some examples.

Here are some examples with other angles.

This section introduces a generalization of the Fibonacci word and the Fibonacci word fractal [11].

**Definition 9**

The 2-Fibonacci word is the classical Fibonacci word. Here are the first six -Fibonacci words.

The following proposition relates the Fibonacci word to .

**Proposition 3**

(3) |

**Definition 10**

The -Fibonacci numbers are the Fibonacci numbers and the -Fibonacci numbers are the Fibonacci numbers shifted by one. The following table shows the first terms in the sequences and their reference numbers in the On-Line Encyclopedia of Sequences (OIES) [12].

**Proposition 4**

- The word 11 is not a subword of the -Fibonacci word, .
- Let be the last two symbols of . For , if is even and if is odd, .
- The concatenation of two successive -Fibonacci words is almost commutative; that is, and have a common prefix of length for all and .
- is a palindrome for all .
- For all , , where .

**Theorem 1**

For the proof, see [11]. This theorem implies that -Fibonacci words are Sturmian words.

Note that

where is the golden ratio.

**Definition 11**

The Fibonacci curve, denoted by , is the result of applying the odd-even drawing rule to the word . The -Fibonacci word fractal is defined as

Here are the curves for .

**Proposition 5**

- The Fibonacci fractal is composed only of segments of length 1 or 2.
- The curve is similar to the curve .
- The curve is composed of five curves: .
- The curve is symmetric.
- The scale factor between and is .

This section applies the above ideas to generate new curves from characteristic words (see

Definition 3).

**Conjecture 1**

Here are seven examples.

The first author was partially supported by Universidad Sergio Arboleda under grant number USA-II-2012-14. The authors would like to thank Borut Jurčič-Zlobec from Ljubljana University for his help during the development of this article.

[1] | J. Cassaigne, “On Extremal Properties of the Fibonacci Word,” RAIRO—Theoretical Informatics and Applications, 42(4), 2008 pp. 701-715. doi:10.1051/ita:2008003. |

[2] | W. Chuan, “Fibonacci Words”, Fibonacci Quarterly, 30(1), 1992 pp. 68-76. www.fq.math.ca/Scanned/30-1/chuan.pdf. |

[3] | W. Chuan, “Generating Fibonacci Words,” Fibonacci Quarterly, 33(2), 1995 pp. 104-112. www.fq.math.ca/Scanned/33-2/chuan1.pdf. |

[4] | F. Mignosi and G. Pirillo, “Repetitions in the Fibonacci Infinite Word,” RAIRO—Theoretical Informatics and Applications, 26(3), 1992 pp. 199-204. |

[5] | M. Lothaire, Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications), Cambridge: Cambridge University Press, 2005. |

[6] | A. Blondin Massé, S. Brlek, A. Garon, and S. Labbé, “Two Infinite Families of Polyominoes That Tile the Plane by Translation in Two Distinct Ways,” Theoretical Computer Science, 412(36), 2011 pp. 4778-4786. doi:10.1016/j.tcs.2010.12.034. |

[7] | A. Monnerot-Dumaine, “The Fibonacci Word Fractal,” preprint, 2009. hal.archives-ouvertes.fr/hal-00367972/fr. |

[8] | J.-P. Allouche and J. Shallit, Automatic Sequences: Theory, Applications, Generalizations, Cambridge: Cambridge University Press, 2003. |

[9] | P. Prusinkiewicz and A. Lindenmayer, The Algorithmic Beauty of Plants, New York: Springer-Verlag, 1990. |

[10] | E. Weisstein. “Lindenmayer System” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/LindenmayerSystem.html. |

[11] | J. Ramírez, G. Rubiano, and R. de Castro, “A Generalization of the Fibonacci Word Fractal and the Fibonacci Snowflake,” 2013. arxiv:1212.1368v2. |

[12] | OEIS Foundation, Inc. “The On-Line Encyclopedia of Integer Sequences.” (Aug 9, 2013) oeis.org. |

J. L. Ramírez and G. N. Rubiano, “Properties and Generalizations of the Fibonacci Word Fractal,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-2. |

**José L. Ramírez**

*Instituto de Matemáticas y sus Aplicaciones
Universidad Sergio Arboleda
Calle 74 no. 14 – 14 Bogotá, Colombia*

**Gustavo N. Rubiano **

*Departamento de Matemáticas
Universidad Nacional de Colombia
AA 14490, Bogotá, Colombia*