There exists a range of explicit and approximate solutions to the cubic polynomial Rayleigh equation for the speed of surface waves across an elastic half-space. This article presents an alternative approach that uses Padé approximants to estimate the Rayleigh wave speed with five different approximations derived for two expansions about different points. Maximum relative absolute errors of between 0.34% and 0.00011% occur for the full range of the Poisson ratio from to 0.5. Even smaller errors occur when the Poisson ratio is restricted within a range of 0 to 0.5. For higher-order approximants, the derived expressions for the Rayleigh wave speed are more accurate than previously published solutions, but incur a slight cost in extra arithmetic operations, depending on the desired accuracy.

In 1885 Lord Rayleigh published his paper “On Waves Propagated along the Plane Surface of an Elastic Solid” [1] and observed that:

It is proposed to investigate the behavior of waves upon the plane surface of an infinite homogeneous isotropic elastic solid, their character being such that the disturbance is confined to a superficial region, of thickness comparable with the wave-length. …

It is not improbable that the surface waves here investigated play an important part in earthquakes, and in the collision of elastic solids.Diverging in two dimensions only, they must acquire at a great distance from the source a continually increasing preponderance.

The italicized phrase above is a supreme understatement, given the ensuing history of seismology. In any case, Rayleigh proved the theoretical existence of surface waves on an elastic half-space and showed that the speed of such waves may be calculated from the real roots of a cubic polynomial whose coefficients are all real and depend on the ratio of the S-wave velocity to the P-wave velocity, or alternatively on the Poisson ratio of the elastic half-space. He provides the solutions for harmonic waves for both incompressible and compressible half-spaces, shows the elliptic orbit of points on the surface as the wave travels across the surface, demonstrates that the motion is restricted to within approximately one wavelength of the surface, and states that the Poisson ratio may lie between 0.5 and for an elastic material. However, he does not provide explicit expressions for the Rayleigh wave speed.

It appears that the first paper that published explicit expressions for the Rayleigh wave speed for the full range of elastic material properties was by Rahman and Barber [2]. Since that time, a number of authors have sought to develop alternative analytical expressions for the Rayleigh wave speed [3-11]. It is noted that the solutions provided cannot be used indiscriminately, as care is required on occasions to choose the correct root to ensure a smooth and continuous estimate of the Rayleigh wave speed [3, 5, 7]. A parallel effort has attempted to derive approximate expressions for the Rayleigh wave speed [3, 12-18].

Complete analytical derivations were provided by [2, 4, 7, 9, 11]. Others have used computer algebra to assist in their derivations [3, 5, 6]. The original approach given in [4] contains unspecified typographical errors [5, 9]. Indeed, the recent solution given in [9] appears to have been derived earlier and independently but with typographical errors [6]. It has also been shown to be identical to the solution provided in [5]. Cardano’s formula for the roots of a cubic polynomial with real coefficients [19, 20] is used by [2, 5, 7, 9, 11], although the starting point in [9] appears to be different from the other solutions. A more recent solution appears to use Cardano’s formula (but referred to as Shengjin’s formula) [11]. As an aside, an interesting history of Cardano’s formula appears in [21].

Cardano’s formula was published in 1545, and it is perhaps surprising that no explicit solution for the Rayleigh speed was available until the Rahman and Barber publication [2]. It would appear unreasonable to expect that Rayleigh was not aware of the Cardano formula. In any case, and as just one example of prior publication of the Rayleigh wave speed, we may refer to the work in J. E. White’s book, *Underground Sound* [22]. Given without derivation, and in somewhat standard notation often associated with the use of Cardano’s formula, we find by simple algebra that White’s solution is identical to that of Rahman and Barber. It may be speculated that similar solutions were found even earlier.

The approximate expressions for the Rayleigh wave speed of surface waves have been derived by various methods, including Taylor series expansion of the Rayleigh equation [3], approximation of the Rayleigh equation to lower-degree polynomials using the Lanczos method [13], minimization of the integral of the Rayleigh equation with arbitrary coefficients using a least-squares approach [14], least-squares minimization [15, 18] given a known exact solution [5, 7], use of a bilinear function for the root and applying least squares to determine the coefficients of the bilinear function [16], and an iterative method with asymptotic quadratic convergence [17].

In this article, we follow the approach given by Liner [3], but rather than using a Taylor series expansion of the Rayleigh equation, we use Padé approximants. Padé approximants of various orders are described and their accuracy compared to the exact solution given in [7]. The complexity of the various derived solutions is assessed in terms of the number of numerical operations required to calculate the Rayleigh wave speed.

Functions may be approximated in various ways. Perhaps the best known is the Taylor series expansion, whereby a function can be expressed as an infinite series expanded around the

point as

(1) |

where is the derivative of the function evaluated at the point . It is usual to truncate the series to use the lower-order terms as an approximation to the function.

An alternative method to approximate a function is to use Padé approximants. Here the function is approximated as a rational function of two truncated polynomials expanded around the point :

(2) |

where the numerator and denominator are polynomials of degree at most and at most , respectively. The approximant is referred to as a Padé approximant of type . The approximation given in equation (2) has parameters.

The Rayleigh equation [1] for an elastic half-space is given by

(3) |

where , , is the Rayleigh wave speed, and and are the P-wave and S-wave velocities of the medium, respectively. Real roots of equation (3) represent the normalized Rayleigh wave speed, and in the case of several real roots, the smallest is taken. The Poisson ratio may be calculated to give .

The Padé approximant is found for the left-hand side of equation (3), using computer algebra to calculate the necessary coefficients [23]. Equating the numerator polynomial in equation (2) to zero yields the estimate of the normalized Rayleigh speed. Multiple solutions are possible, and these depend on the Padé approximant type, . Given that the Rayleigh equation is a cubic polynomial, the approximations are restricted to and . In particular, the following Padé approximant types are examined: , , , , and .

The five sets of Padé approximants obtained yield solutions for the value of in equation (2) with the following forms: for types , , and , the solutions are ratios of polynomials in of degree 2, 3, and 4, respectively. In each case, both the numerator and denominator polynomials have identical degree, and a general expression for them is

(4) |

for the type . For the types , and , the general expression is

(5) |

where for type and for type . The coefficient for all cases considered here, except in the single case of type for expansion about , in which case .

Like the Taylor series expansion, the Padé approximants are expansions about a given value. In the first instance, expansions were obtained around unity, following previous work [3], but after some trial and error, expansions around were found to be superior in terms of minimizing the absolute relative error across the full range of values for . It is worth repeating that the expansions are done for the ratio of the Rayleigh wave speed to the S-wave velocity squared , unlike the Taylor expansion in Liner [3] that is done about the square root of . Clearly, it is possible to seek the value around which expansions are minimized using a least-squares approach, some aspects of which are discussed below. Below are shown the results obtained for the coefficients in (4) and (5) for expansions about (suitable for the full range of Poisson’s ratio) and (suitable for non-negative Poisson’s ratio), respectively. The coefficient values were obtained using *Mathematica* 9 and are exact.

The relative absolute errors in percentages for the range of are shown below for some of the cases considered. The errors are defined as

(6) |

where and denote numerical and analytical solutions for the Rayleigh wave speed, respectively, and the analytical solution refers to that in [7].

Figure 1 shows a comparison of the exact analytical solution [7], the solution obtained using the Padé approximant , and the solution for a Taylor expansion of the Rayleigh equation given in equation (3). The solution based on the Padé approximant was for an expansion with , while the Taylor series solution was for an expansion with . The relative absolute errors for these two approximate solutions and the one provided by Liner [3] are shown in Figure 2. The Liner solution was a Taylor series expansion in the normalized Rayleigh wave speed, whereas the expansion in its square is shown in Figure 1 and labeled as “Taylor.” It is clear that both Taylor series expansions have increasing errors as increases. The solution based on the Padé approximant has larger errors than either of the Taylor expansion solutions for smaller values of but performs better for larger values of .

Figures 3 and 4 show the relative absolute errors for solutions based on the Padé approximants , and , expanded around the point . As expected, the errors are smaller for the solutions based on higher-order Padé approximants.

The following function gives an exact analytical expression for the Rayleigh wave speed [7, 10].

The function `taylorR` gives the Taylor series expansion in the square of the ratio of Rayleigh wave velocity to the S-wave velocity. (The expansion is not in the ratio alone as in the Liner Taylor series expansion given below.)

The function `padeR` calculates the Rayleigh wave speed based on using Padé approximants. Here `mdegree` is the numerator degree, `ndegree` is the denominator degree, `avalue` is the expansion point, and `root1or2` is 1 or 2, so as to ensure the smallest root is chosen. (Normally `root1or2` is 1, and an incorrect choice is obvious when overlaying the Rayleigh velocity estimate and the exact analytical Rayleigh wave velocity.)

The function `legName` is for the labels in the legend of some of the plots. `rspeed` is used to switch to one of the functions defined for the Rayleigh wave speed; they are defined in the body of the article or in the Appendix.

`plotRayleighSpeedEstimate` is a plotting function that enables comparison of the Rayleigh wave speed obtained by using two functions given in either the body of the report or in the Appendix, and also one derived from a Padé approximant (see the function `legName`). The input parameters are `rspeed1`, `rspeed2` (for the two different functions), and the Padé approximant parameters. `max` is for the full range of Poisson’s ratio and for positive Poisson’s ratio. Note that when , ; , ; , .

**Figure 1.** The normalized Rayleigh wave speed using the Padé approximant expanded around and the Taylor series expanded around . The solid curve is the analytical solution given in Vinh and Ogden [7].

We define a function for the approximation to the Rayleigh wave speed given by Liner [3]. The approximation is based on a Taylor series expansion in the ratio of the Rayleigh wave speed to the shear wave speed.

`plot3RayleighWaveSpeedRelativeError` is a plotting function for comparing the relative absolute errors for the Rayleigh speed estimates from two functions and an estimate obtained from a Padé approximant with similar inputs, described above. `ymax` is the maximum range for the relative error and is adjusted to scale the plot.

**Figure 2.** Relative absolute error for Rayleigh wave speed using the Padé approximant expanded around compared to the two Taylor series expansions, both expanded around .

`plot2RayleighWaveSpeedRelativeError` is a plotting function for comparing the relative absolute errors for the Rayleigh speed estimates obtained using two Padé approximants. The function has similar inputs described above, and here the suffix 1 or 2 refers to the given Padé approximant.

**Figure 3.** Relative absolute error for Rayleigh wave speed using the Padé approximants and expanded around .

**Figure 4.** Relative absolute error for Rayleigh wave speed using the Padé approximants and expanded around .

It is possible to generate the expressions for equations (4) and (5) based on a Padé approximant simply by squaring the Rayleigh wave speed estimate. The square roots of the full set of results given below are good estimates of the Rayleigh wave speed for different ranges of the Poisson ratio. Simple arithmetic yields the results in the forms given in equations (4) and (5).

Here are some useful and accurate expressions for the Rayleigh wave speed squared over the full range of the Poisson ratio with .

Here are some useful and accurate expressions for the Rayleigh wave speed squared over the positive range of Poisson’s ratio with .

Polynomial expressions for the Rayleigh wave speed are given in equations (4) and (5) with respective coefficients given above. These expressions have been derived using Padé approximants around and for (Poisson ratio in the range 0.5 to ) and (Poisson ratio in the range 0.5 to 0), respectively. The two expansion values ( in equations (1) and (2)) were obtained by examining the area under the relative absolute error curve for increments of between 0.5 and 1. In principle, a least-squares solution for the value of producing the minimum error would be feasible for each Padé type, but in so doing we ignore the detailed shape of the error curve across the range of values. For example, Figure 5 shows the relative absolute error for the Rayleigh wave speed based on Padé type for expansions around and . While the error is less for the expansion around for most of the range of values, we observe that for larger values of , the errors become larger and exceed those for the expansion around . This basic behavior is characteristic of the errors obtained (but not shown) using the Padé types and . It is worth noting that for non-negative Poisson’s ratio of , the errors are smaller for expansions about for the three Padé types , , and . The question arises, is it possible to expand around another value that will have smaller errors for non-negative Poisson’s ratio? Figure 6 shows the relative absolute error for the Padé type expanded about and . The error in the latter case is smaller for all values up to approximately , after which it increases beyond that for the expansion about , but not significantly. The maximum error is about the same for both cases. Once more, similar behavior is observed for the other two Padé types and .

Figure 7 shows the relative absolute errors for the Rayleigh wave speed based on using the Padé type expanded around and . The behavior of these curves is different from those shown earlier—here there is a monotonic decrease in the error for most of the full range of as it increases. The errors are one hundred times smaller than those based on the Padé type shown in Figure 6. A somewhat extraordinary reduction of a further five times (so a total of a 500-fold reduction) in relative absolute error occurs for the Rayleigh wave speed estimate based on the Padé type expanded around and shown in Figure 8.

**Figure 5.** Relative absolute error for Rayleigh wave speed using the Padé approximants expanded around the points and .

**Figure 6.** Relative absolute error for Rayleigh wave speed using the Padé approximants expanded around the points and .

**Figure 7.** Relative absolute error for Rayleigh wave speed using the Padé approximants expanded around the points and .

`plot1RayleighWaveSpeedRelativeError` is a plotting function for the relative absolute error of the Rayleigh speed obtained using a single Padé approximant. The function has similar inputs described earlier.

**Figure 8.** Relative absolute error for Rayleigh wave speed using the Padé approximant expanded around .

It is worth summarizing the various error estimates arising from the Rayleigh wave speed based on the Padé approximant types studied for the two ranges of Poisson’s ratios (Table 1). These are compared to other published estimates given in Table 2. Tables 1 and 2 show this data and also the number of arithmetic operations required to compute the value of (equation (3)), from which we estimate any given Rayleigh wave speed. The latter information shows the tradeoff between the accuracy of the Rayleigh wave speed estimate and the associated computational effort. The operations included are addition, subtraction, multiplication, division, and the taking of a root or a trigonometric function; raising to a power is taken as a series of multiplications, and the minimum number of such operations is used wherever possible. It is assumed that an analytical solution is exact, that is known, and that for each approximate solution all coefficients are known and retained as integers until the calculation commences. In the case of the exact solution based on Cardano’s formula, two different expressions exist for different values of , and in this case, the number of operations is given for each path separated by a forward slash. Functions not previously introduced in the body of the text and required to calculate the estimates in Table 2 may be found in the Appendix.

The data in Tables 1 and 2 shows that the computational effort is generally greatest for the three analytical solutions. The approximate solutions based on the Padé approximants improve their error estimates approximately tenfold as we move from Padé types to for expansions around . The number of arithmetic operations increases by about five for Padé types to , without a significant change in the number of operations as we move to Padé types and . For the Padé types expanded around and for non-negative Poisson’s ratio, we find a remarkable reduction in the errors, commencing with a thirtyfold reduction in the Padé type , followed by a twentyfold reduction as we move from Padé types to , and a further tenfold reduction in the errors as we move to Padé types to . The Padé type expanded around has a fifteenfold smaller error than either of the Taylor expansions, and with fewer operations. The methods based on least-squares solutions that minimize the area under the absolute error curves perform strongly, with a good mix of relatively small errors and a small number of arithmetic operations. It is necessary to go to the approximation based on Padé type before the error is less than that for the solutions based on least squares, but this requires more arithmetic operations.

The Rayleigh wave speed was estimated based on expansions of the Rayleigh equation by Padé approximants and equating the numerator of that representation to zero. The numerator polynomials were solved for the normalized Rayleigh wave speed and provide five distinct solutions. The solutions have varying degrees of accuracy, depending on the value about which the Rayleigh equation is expanded. Good, accurate solutions occur for the full range of Poisson’s ratio, and even more accurate solutions are found for non-negative Poisson’s ratio. It is concluded that these expressions for the Rayleigh wave speed provide a useful approximation, with a balance between accuracy and number of arithmetic operations required.

This Appendix contains a number of functions that estimate the Rayleigh wave speed, which may be used to reproduce some results for the approximate solutions found in Table 2. Other needed functions have been presented in the body of the main text.

Here is the Bergmann formula from Vinh and Malischewsky [24].

Here is the Vinh and Malischewsky [18] formula to degree 2 in Poisson’s ratio.

Here is the Vinh and Malischewsky [18] formula to degree 3 in Poisson’s ratio.

Here is the Vinh and Malischewsky [18] formula to degree 4 in Poisson’s ratio.

Here is the Rahman and Michelitsch [18] formula using the Lanczos approximation.

Here is the Li [14] formula for the full range of Poisson’s ratio.

Here is the Li [14] formula for the positive range of Poisson’s ratio.

[1] | L. Rayleigh, “On Waves Propagated along the Plane Surface of an Elastic Solid,” Proceedings of the London Mathematical Society, S1-17(1), 1885 pp. 4-11. doi:10.1112/plms/s1-17.1.4. |

[2] | M. Rahman and J. R. Barber, “Exact Expressions for the Roots of the Secular Equation for Rayleigh Waves,” Journal of Applied Mechanics, 62(1), 1995 pp. 250-252. doi:10.1115/1.2895917. |

[3] | C. L. Liner, “Rayleigh Wave Approximations,” Journal of Seismic Exploration, 3, 1994 pp. 273-281. |

[4] | D. Nkemzi, “A New Formula for the Velocity of Rayleigh Waves,” Wave Motion, 26(2), 1997 pp. 199-205. doi:10.1016/S0165-2125(97)00004-8. |

[5] | P. G. Malischewsky, “Comment to ‘A New Formula for the Velocity of Rayleigh Waves’ by D. Nkemzi [Wave Motion 26 (1997) 199-205],” Wave Motion, 31(1), 2000 pp. 93-96. doi:10.1016/S0165-2125(99)00025-6. |

[6] | H. Mechkour, “The Exact Expressions for the Roots of Rayleigh Wave Equation,” Proceedings of the 2nd International Colloquium of Mathematics in Engineering and Numerical Physics (MENP-2), Bucharest, 2002, Geometry Balkan Press, 2003 pp. 96-104. |

[7] | P. C. Vinh and R. W. Ogden, “On Formulas for the Rayleigh Wave Speed,” Wave Motion, 39(3), 2004 pp. 191-197. doi:10.1016/j.wavemoti.2003.08.004. |

[8] | P. G. Malischewsky Auning, ” A Note on Rayleigh-Wave Velocities as a Function of the Material Parameters,” Geofísica internacional, 43(3), 2004 pp. 507-509. www.geofisica.unam.mx/unid_apoyo/editorial/publicaciones/investigacion/geofisica_internacional/anteriores/2004/03/Malischewsky.pdf. |

[9] | D. W. Nkemzi, “A Simple and Explicit Algebraic Expression for the Rayleigh Wave Velocity,” Mechanical Research Communications, 35(3), 2008 pp. 201-205. doi:10.1016/j.mechrescom.2007.10.005. |

[10] | P. G. Malischewsky, “Reply to Nkemzi, D. W., ‘A Simple and Explicit Algebraic Expression for the Rayleigh Wave Velocity,’ [Mechanical Research Communications (2007), doi:10.1016/j.mechrescom.2007.10.005],” Mechanical Research Communications, 35(6), 2008 p. 428. doi:10.1016/j.mechrescom.2008.01.011. |

[11] | X.-F. Liu and Y.-H. Fan, “A New Formula for the Rayleigh Wave Velocity,” Advanced Materials Research, 452-453, 2012 pp. 233-237. |

[12] | D. Royer, “A Study of the Secular Equation for Rayleigh Waves Using the Root Locus Method,” Ultrasonics, 39(3), 2001 pp. 223-225. doi:10.1016/S0041-624X(00)00063-9. |

[13] | M. Rahman and T. Michelitsch, “A Note on the Formula for the Rayleigh Wave Speed,” Wave Motion, 43(3), 2006 pp. 272-276. doi:10.1016/j.wavemoti.2005.10.002. |

[14] | X.-F. Li, “On Approximate Analytic Expressions for the Velocity of Rayleigh Waves,” Wave Motion, 44(2), 2006 pp. 120-127. doi:10.1016/j.wavemoti.2006.07.003. |

[15] | P. C. Vinh and P. G. Malischewsky, “Explanation for Malischewsky’s Approximate Expression for the Rayleigh Wave Velocity,” Ultrasonics, 45(1-4), 2006 pp. 77-81. doi:10.1016/j.ultras.2006.07.001. |

[16] | D. Royer and D. Clorennec, “An Improved Approximation for the Rayleigh Wave Equation,” Ultrasonics, 46(1), 2007 pp. 23-24. doi:10.1016/j.ultras.2006.09.006. |

[17] | A. V. Pichugin. “Approximation of the Rayleigh Wave Speed.” (Jan 10, 2008) people.brunel.ac.uk/~mastaap/draft06rayleigh.pdf. |

[18] | P. C. Vinh and P. G. Malischewsky, “Improved Approximations of the Rayleigh Wave Velocity,” Journal of Thermoplastic Composite Materials, 21(4), 2008 pp. 337-352. doi:10.1177/0892705708089479. |

[19] | M. Abramowitz and I. A. Segun, eds., Handbook of Mathematical Functions, New York: Dover Publications, 1965. |

[20] | D. Zwillenger, ed., Standard Mathematical Tables and Formulae, 31st ed., Boca Raton: CRC Press, 2003. |

[21] | P. R. Hewitt. “Cardano’s Formulas or a Pivotal Moment in the History of Algebra.” (Apr 7, 2009) livetoad.org/Courses/Documents/bb63/Notes/cardanos_formulas.pdf. |

[22] | J. E. White, Underground Sound: Application of Seismic Waves, New York: Elsevier, 1983. |

[23] | Mathematica, Release Version 9.0, Champaign: Wolfram Research, Inc., 2013. |

[24] | P. C. Vinh and P. G. Malischewsky, “An Approach for Obtaining Approximate Formulas for the Rayleigh Wave Velocity,” Wave Motion, 44(7-8), 2007 pp. 549-562. doi:10.1016/j.wavemoti.2007.02.001. |

A. T. Spathis, “Use of Padé Approximants to Estimate the Rayleigh Wave Speed,” The Mathematica Journal, 2015. dx.doi.org/doi:10.3888/tmj.17-1. |

A. T. (Alex) Spathis works in rock mechanics and rock dynamics. He has measured the stresses in the Earth’s crust at shallow depths of tens of meters down to moderate depths of over 1000 meters. This data assists in understanding earthquakes and in the design of safe underground mines. He has formal qualifications in applied mathematics, electrical engineering, and rock mechanics, and has a Ph.D. in geophysics.

**A. T. Spathis**

*Orica Technical Centre
PO Box 196
Kurri Kurri NSW
Australia 2327*

Remarkable advances have been made in radiotherapy for cancer. Tumors inside the body that move with respiration can now be tracked during irradiation or can be irradiated at a specific phase of respiration [1]. Since radiation of these wavelengths is invisible to the human eye, it is not possible to see whether a tumor that is moving during respiration is being irradiated accurately. Therefore, simulations that use the actual respiration wave to show the tumor being irradiated are useful for training and to explain the procedure to patients.

Radiation generated by radiotherapy equipment has a dosage rate, which is defined as the strength over time. Therefore, the duration of irradiation of a tumor inside the body at a certain dose is determined by the dose rate. For tumors in the brain, head, and neck region, fixing the target area in place can prevent tumor movement during irradiation. However, with tumors of the lung or abdomen, the displacement of the organs during respiration is substantial, and the tumor moves with respiration during the irradiation time. In conventional radiotherapy, respiratory displacement of the tumor is taken into account by targeting a large radiation field. However, when a large radiation field is used, the large radiation dose delivered to the patient in one session can cause severe side effects; thus, irradiation has to be carried out over a longer period with smaller doses.

For small tumors located inside the brain, precise irradiation of a small field has been shown to enable delivery of a single large dose without adverse side effects. An attempt has been made to apply this technology to tumors of the trunk, where the most important issue is countering the respiratory movement of the tumor. One way to achieve this is to detect respiratory displacement and conduct irradiation according to the positional information. One such method is to carry out irradiation when the tumor is in position within a certain range, and another method is to track respiratory displacement during irradiation. Revolutionary advances in radiation therapy technology have enabled both these methods to be used in practice [1].

Depicting the form of irradiation of a small tumor inside the body with countermeasures against respiratory movement in a three-dimensional simulation could promote better understanding of this therapeutic method. In the simulation, a target volume is set to encompass the shape of a small tumor. The tumor moves along the vertical axis of the trunk ( axis). For irradiation, either the radiation field moves in accordance with the tumor movement, or irradiation is performed only when the tumor has completely entered the radiation field in a set respiratory phase. At the same time, the respiration wave is output and the position of the tumor on that waveform is recreated.

Respiratory displacement was sampled at a rate of 25 times/s using a pressure sensor attached to the body. The sampled data was stored in a CSV file to be read into *Mathematica*. The read-in data is shown in the output of the list plot below. In the graph, the minimum value on exhalation during free respiration was defined as 0, and the maximum value on inhalation was defined as 100.

The notebook defines 3407 triplets.

This extracts the respiration wave data and plots it.

A single malignant tumor cell generated by the effects of carcinogenic factors at several stages forms a visible tumor mass through continued monoclonal proliferation. Cell density is high in the center of the tumor and is assumed to spread out in three dimensions in a normal distribution.

This defines the 3D normal distribution of tumor morphology.

This defines the location of the tumor at a given time with a given color.

The center of the tumor moves along the axis with amplitude given by the respiratory wave and shown on the left with a red point. Think of the front of the body as the front plane (the plane parallel to the - given by ).

The patient is lying on the surface of the bed (with respect to the graph, the patient lies on the - plane with the head in the negative direction, the feet in the positive direction, and looking up in the direction). As the patient breathes, the tumor moves in the direction of the length of the body (i.e. along the axis). A tumor-tracking beam acts on the - plane set at five times the standard deviation of the tumor spread. The beam moves with the respiratory wave in the same way as the tumor.

In this situation, the beam is set to turn on when it enters a set position in the respiratory wave amplitude (“Threshold ”). On and off gating of the radiation beam is easily observable on the respiratory waveform, the % threshold value being clearly indicated.

This shows the irradiation synchronized to respiration.

Stereotactic radiotherapy was originally developed for the treatment of diseased regions inside the cranium, which do not move once fixed in position. It has been demonstrated that targeting a small lesion from multiple directions enables delivery of a large radiation dose in one session, without causing side effects to the adjacent normal tissue. To apply this technology to tumors in the trunk region, which move in conjunction with respiration, the tumor needs to be immobilized. An attempt was made to irradiate the tumors from multiple directions with minimized respiratory movement by fixing the body in position and compressing the abdomen to reduce the movement of the diaphragm to very slight movement [2, 3]. This method was shown to be safe and effective and represents a technological innovation in stereotactic irradiation of the trunk region. Having patients hold their breath during irradiation is the simplest method, which is becoming widespread—a method equivalent to stereotactic radiotherapy used for the cranium. On the other hand, irradiation conducted under free-breathing conditions is the most appropriate physiological method, but requires a mechanism for irradiating during a phase synchronized with the respiratory wave [4, 5]. Moreover, the method of irradiating the tumor while tracking its movement with respiration has presented difficulties in controlling the motion of a movable section of the irradiation equipment [6, 7]. However, recent advances in irradiation equipment and control computers have now enabled both of these methods to be utilized.

To make these complicated therapies understandable is difficult with only written descriptions and static images. We have therefore attempted to develop a dynamic simulation model based on actual respiratory wave data. This simulation model has been well received by students and medical residents and is also effective in explaining the procedure to patients.

[1] | S. B. Jiang, “Technical Aspects of Image-Guided Respiration-Gated Radiation Therapy,” Medical Dosimetry, 31(2), 2006 pp. 141-151. doi:10.1016/j.meddos.2005.12.005. |

[2] | V. M. Remouchamps, F. A.Vicini, M. B. Sharpe, L. L. Kestin, A. A. Martinez, and J. W. Wong, “Significant Reductions in Heart and Lung Doses Using Deep Inspiration Breath Hold with Active Breathing Control and Intensity-Modulated Radiation Therapy for Patients Treated with Locoregional Breast Irradiation,” International Journal of Radiation Oncology *Biology *Physics, 55(2), 2003 pp. 392-406. doi:10.1016/S0360-3016(02)04143-3. |

[3] | M. Nakamura, K. Shibuya, A. Nakamura, T. Shiinoki, Y. Matsuo, M. Nakata, A. Sawada, T. Mizowaki, and M. Hiraoka, “Interfractional Dose Variations in Intensity-Modulated Radiotherapy with Breath-Hold for Pancreatic Cancer,” International Journal of Radiation Oncology *Biology *Physics, 82(5), 2012 pp.1619-1626. doi:10.1016/j.ijrobp.2011.01.050. |

[4] | Y. Otani, I. Fukuda, N. Tsukamoto, Y. Kumazaki, H. Sekine, E. Imabayashi, O. Kawaguchi, T. Nose, T. Teshima, and T. Dokiya, “A Comparison of the Respiratory Signals Acquired by Different Respiratory Monitoring Systems Used in Respiratory Gated Radiotherapy,” Medical Physics, 37, 2010 pp. 6178-6186. doi:10.1118/1.3512798. |

[5] | S. S. Vedam, P. J. Keall, V. R. Kini, and R. Mohan, “Determining Parameters for Respiration-Gated Radiotherapy,” Medical Physics, 28, 2001 pp. 2139-2146. doi:10.1118/1.1406524. |

[6] | A. Schweikard, H. Shiomi, and J. Adler, “Respiration Tracking in Radiosurgery,” Medical Physics, 31, 2004 pp. 2738-2741. doi:10.1118/1.1774132. |

[7] | A. Schweikard, H. Shiomi, and J. Adler, “Respiration Tracking in Radiosurgery without Fiducials,” International Journal of Medical Robotics and Computer Assisted Surgery, 1(2), 2005 pp. 19-27. doi:10.1002/rcs.38. |

H. Sekine and Y. Otani, “Development of Simulation Models of Respiratory Tracking and Synchronizing for Radiotherapy,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-12. |

Hiroshi Sekine is a professor in the Department of Radiology, Jikei University, School of Medicine, Tokyo. His recent research interests relate to quantitative analysis of the time-dose-fractionation relationship of radiation therapy.

Yuki Otani is a physicist in the Department of Radiation Oncology, Graduate School of Osaka University, Osaka, Japan. His recent research interests relate to exact tracking of moving tumors in the body.

**Hiroshi Sekine**

*Department of Radiology
Jikei University School of Medicine
Jikei Daisan Hospital
4-11-1, Izumi-Honcho, Komae, Tokyo,
Japan. 201-8601*

**Yuki Otani**

Department of Radiation Oncology

Graduate School of Medicine, Osaka University

2-2, Yamadaoka, Suita, Osaka,

Japan. 565-0871

*y.otani@radonc.med.osaka-u.ac.jp*

This article carries out the evaluation of nuclear-electron attraction energy integrals using Gaussian-type functions with arbitrary Cartesian angular values. As an example, we calculate the corresponding matrix for the water molecule in the STO-3G basis set.

Evaluating molecular integrals has been an active field since the middle of the last decade. Efficient algorithms have been developed and implemented in various programs. Detailed accounts of molecular integrals can be found in the references of [1]. In this article, the third in a series describing algorithms for evaluating molecular integrals, we detail the evaluation of the nuclear-electron attraction energy integrals from a more didactic point of view, following the approach of Rys, Dupuis, and King [2] as implemented in the OpenMol program [3].

The energy caused by the attraction between an electron in the region described by the overlap of the orbitals , and a nuclear of charge located at is expressed by the nuclear-electron attraction integral

(1) |

in which is an unnormalized Cartesian Gaussian primitive.

Using the Gaussian product (see, for example [1]) and defining the angular part as :

(2) |

The pole problem can be solved by the Laplace transform

(3) |

which turns the integral into

(4) |

where for now, we have ignored the factor . In the following steps, we will make certain modifications, knowing in advance that they will help simplify the expressions later on. We first reduce the upper limit of to unity by making the changes of variable (recall from the Gaussian product that ):

(5) |

and

(6) |

Replace , , and in , to get

(7) |

We now multiply by the factor :

(8) |

Again, by inserting , we get

(9) |

Having arrived at the desired form, we reinsert the value of the angular part into the expression and separate the term enclosed by the curly brackets into three components , , and :

(10) |

Defining as the function of the component inside the bracket,

(11) |

and similarly for and , we rewrite the integral as

(12) |

We will show that the integrand in the expression for is in fact an overlap between two one-dimensional Gaussians, and we may use the results that have been developed in [1]. First, we expand the exponential parts of the integrand

(13) |

regrouping in terms of and , we have

(14) |

which becomes

(15) |

where and . These definitions let us compare this equation with the result of the Appendix, in which we see that equation (15) is simply

(16) |

where

(17) |

Substituting and ,

(18) |

into equation (13), we have

(19) |

Substitute this result into the definition of to get

(20) |

The integral has the same form as a one-dimensional overlap integral where the integrand is a Gaussian function centered at with an exponential coefficient .

From the observation above, we make use of the results developed for overlap integrals in [1]. For example, for ,

(21) |

In particular, we have the transfer equations

(22) |

The and functions take similar forms. The product is a polynomial in , and if we replace , then the integral in equation (12) is

(23) |

where is the said polynomial. The integral is a combination of the Boys function (see, for example, Reference 4 of [1])

(24) |

a strictly positive and decreasing function.

Aside from the obvious choice of using *Mathematica* to evaluate the Boys function, there are several ways of evaluating the integral. In practice, most programs store pretabulated values of the function at different intervals and interpolation is done as needed (e.g. by Chebyshev polynomials). Here we use the Gauss-Chebyshev quadrature numerical integration [4]. For simplicity, we have adopted almost verbatim the F77 code in [4, p. 46].

The function `Nea` evaluates the nuclear-electron attraction integral of two Gaussian primitives; here `alpha`, `beta`, `RA`, `RB`, `LA`, and `LB` are , , , , , and as defined earlier; `RR` is the nuclear position.

As in our two earlier articles [1, 5], we use the same data for the water molecule (, , the geometry optimized at the HF/STO-3G level). The molecule lies in the - plane with Cartesian coordinates in atomic units.

In the STO-3G basis set, each atomic orbital is approximated by a sum of three Gaussians; here are their unnormalized primitive contraction coefficients and orbital exponents.

Here are the basis function origins and Cartesian angular values of the orbitals, listed in the order , , , , , , and .

Specifically, for the nuclear-electron attraction energy integral between the first primitive of the orbital of hydrogen atom 1, , the first primitive of the orbital of the oxygen atom, , and atom 1 () is

(25) |

We have

From the Gauss-Chebyshev quadrature, the integral in equation (23) yields . The nuclear-electron integral (25) is . This is calculated as follows.

We would first need the normalization factor before evaluating the nuclear-electron energy matrix.

We have provided a didactic introduction to the evaluation of nuclear-electron attraction-energy integrals involving Gaussian-type basis functions by use of recurrence relations and a numerical quadrature scheme. The results are sufficiently general so that no modification of the algorithm is needed when larger basis sets with more Gaussian primitives or primitives with larger angular momenta are employed.

Consider the Gaussian product: . Combine and expand the coefficients

to get

Let and substitute in the exponent to get

The first three terms inside the second bracket factor to , and the last two can be reduced to . The original Gaussian product is thus

Here is a verification.

If one opts to use *Mathematica*’s own incomplete gamma function , equation (24) can be defined in closed form.

Here is an example.

This is the result.

Here is a comparison with the quadrature result.

Understandably, direct use of the `Gamma` function is much faster and yields more accurate results. We thank Paul Abbott for providing the example.

[1] | M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals I,” The Mathematica Journal, 14(3), 2012. doi:10.3888/tmj.14-3. |

[2] | J. Rys, M. Dupuis, and H. F. King, “Computation of Electron Repulsion Integrals Using the Rys Quadrature Method,” Journal of Computational Chemistry, 4(2), 1983 pp. 154-157. doi:10.1002/jcc.540040206. |

[3] | G. H. F. Diercksen and G. G. Hall, “Intelligent Software: The OpenMol Program,” Computers in Physics, 8(2), 1994 pp. 215-222. doi:10.1063/1.168520. |

[4] | J. Pérez-Jorda and E. San-Fabián, “A Simple, Efficient and More Reliable Scheme for Automatic Numerical Integration,” Computer Physics Communications, 77(1), 1993 pp. 46-56. doi:10.1016/0010-4655(93)90035-B. |

[5] | M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals II,” The Mathematica Journal, 15(1), 2013. doi:10.3888/tmj.15-1. |

M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-9. |

Minhhuy Hô received his Ph.D. in theoretical chemistry at Queen’s University, Kingston, Ontario, Canada, in 1998. He is currently a professor at the Centro de Investigaciones Químicas at the Universidad Autónoma del Estado de Morelos in Cuernavaca, Morelos, México.

Julio-Manuel Hernández-Pérez obtained his Ph.D. at the Universidad Autónoma del Estado de Morelos in 2008. He has been a professor of chemistry at the Facultad de Ciencias Químicas at the Benemérita Universidad Autónoma de Puebla since 2010.

**Minhhuy Hô**

*Universidad Autónoma del Estado de Morelos
Centro de Investigaciones Químicas
Ave. Universidad, No. 1001, Col. Chamilpa
Cuernavaca, Morelos, Mexico CP 92010
*

**Julio-Manuel Hernández-Pérez
**

Facultad de Ciencias Químicas

Ciudad Universitaria, Col. San Manuel

Puebla, Puebla, Mexico CP 72570

julio.hernandez@correo.buap.mx

This article presents a compact analytic approximation to the solution of a nonlinear partial differential equation of the diffusion type by using Bürmann’s theorem. Expanding an analytic function in powers of its derivative is shown to be a useful approach for solutions satisfying an integral relation, such as the error function and the heat integral for nonlinear heat transfer. Based on this approach, series expansions for solutions of nonlinear equations are constructed. The convergence of a Bürmann series can be enhanced by introducing basis functions depending on an additional parameter, which is determined by the boundary conditions. A nonlinear example, illustrating this enhancement, is embedded into a comprehensive presentation of Bürmann’s theorem. Besides a recursive scheme for elementary cases, a fast algorithm for multivalued Bürmann expansions and inverse functions is developed using integer partitions. The present approach facilitates the search for expansions of analytic functions superior to commonly used Taylor series and shows how to apply these expansions to nonlinear PDEs of the diffusion type.

For most nonlinear problems in physics, analytic closed-form solutions are not available. Thus the investigator initially searches for an approximate analytic solution. This approximation must be reliable enough to correctly describe the dependence of the solution on all essential parameters of the system. This article aims to show that Bürmann’s theorem can serve as a powerful tool for gaining approximations fulfilling such demands. We have chosen canonical examples [1, 2, 3] from the field of linear and nonlinear heat transfer to illustrate this technique.

A Bürmann series may be regarded as a generalized form of a Taylor series: instead of a series of powers of the independent variable , we have a series of powers of an analytic function

:

Starting at an elementary level, we present a recursive calculation scheme for the coefficients of a Bürmann series. Such a recursive formulation is easily implemented in *Mathematica* and can find the Bürmann expansion for all elementary cases. For instances where we have to deal with series expansions of in terms of powers of functions of the form

that is, functions for which the first derivatives vanish at some point of the complex plane, we approach the limits of the recursive account. To calculate such expansions using *Mathematica* efficiently, we give a generalized formulation of the coefficients of the Bürmann series, using the expansion coefficients of the reciprocal power of an analytic function :

This formulation avoids the time-consuming process of symbolic differentiation usually used. The calculation of the coefficients is based on finding all partitions of the index in terms of the frequencies of the part of ,

These sets of frequencies for the partitions are tabulated by using the function `FrobeniusSolve`.

Once the coefficients are determined by using the tabulated solutions for , the calculation of the coefficients of a generalized Bürmann series is a simple task. A special case of a Bürmann series, representing a function as a series of powers of its own derivative, is of particular importance:

Expansions of this type will be applied to functions defined by integrals. For linear and also for nonlinear processes of heat transfer, these series expansions will serve us to find valuable approximations. This is due to the fact that the integral representation for the error function leads to an expansion in fractional powers of the integrand. It turns out that a similar strategy can be applied to find approximate solutions for nonlinear cases, since these solutions obey integral equations closely related to the integral defining the error function. Finally, by introducing a free parameter, the convergence of a Bürmann expansion can be improved. The free parameter is determined by the boundary conditions. By this procedure, we find reliable analytical approximations for the heat transfer in ZnO [3], comprising only a few terms.

The common analytic solutions to these problems use Taylor series or numerical evaluations, which do not exploit the structure revealed by the integral relation fulfilled by the exact solutions. We mention here that a similar procedure can also be applied successfully to the diffusion of metal cations in a solid solvent [4].

The article is organized in such a way as to offer the formulas to the reader, together with brief remarks concerning their origin. Necessary details of deriving the formulas are displayed in the corresponding appendices.

Bürmann’s theorem [5] states that it is possible to find a convergent expansion of an analytic function as a sum of powers of another analytic function . The simplest form of such an expansion, supposed to be valid around some point in the complex plane, is given by

(1) |

or transferred to another notation, for some purposes more convenient,

(2) |

where the functions and are called the basis functions of the Bürmann series. The functions and have to fulfill certain conditions in order to guarantee the convergence of the series in (1) and (2). These conditions will be discussed later in this article. In their classic work *A Course of Modern Analysis* [5], Whittaker and Watson give a formula for the coefficient (Bürmann coefficient) of a Bürmann series. Transferred to the notation used in (1), their formula is

(3) |

This formula is widely cited by numerous authors [6, 7]. Actually determining the limit value of a higher-order derivative is a cumbersome procedure, which is shown in an example. Expanding the function in powers of around gives the following coefficients , for which CPU time increases dramatically for .

This section shows how to calculate the coefficients of the Bürmann series recursively, which is easier to handle than (3) and more efficient when translated to symbolic programs. If we use (1) and (2) to find convergent series representations of solutions to differential equations, it is important to simplify the algorithms necessary to determine the expansion coefficients.

For basis functions of the general form

(4) |

we get the recursion in terms of the representation used in (1),

(5) |

with the initial coefficient given by

(6) |

Hence, the nested expression for is

(7) |

The recursion (5) is more efficient to calculate than the expression in (3) and is easily implemented.

The Bürmann series for up to order in is calculated with .

We now show the expansion for the same problem shown in the previous section (i.e. expanding around into powers of ). It can be easily expanded to order 25 in a reasonable amount of time. This is the explicit truncated Bürmann series.

The result is validated in terms of a Taylor series. This shows that the error is at least of order 26.

Here are the coefficients as they are calculated with this procedure in their explicit analytic form, according to the definition given in (2).

A useful application of the recursion (5)-(7) to the case of the expansion of the inverse function of around follows immediately: we have and it follows that the inverse function is the Bürmann series of in powers of (see [8]). By writing

(8) |

we obtain

(9) |

The following program calculates the first three coefficients for the inverse function in general, which corresponds to the expression shown in [8].

As an example, the inverse of the sine function is expanded, and the result is displayed for order 11.

Compare it to the result of the *Mathematica* built-in function `InverseSeries` (see [9]).

If we choose the basis function to be equal to the first derivative of , we find, by using formula (5), the recursive expression and the first three coefficients:

(10) |

The idea of expanding an analytic function using its derivative as a basis function is fruitful for cases where the function is defined by an integral. It will be shown that solutions to linear and nonlinear problems of diffusion or heat transfer can be expressed as integrals. We get

To illustrate the advantage of this technique, we choose the expansion of the transcendental function . The function is defined by the integral

Using the results listed in (10) for the Bürmann coefficients, we arrive at once at the expansion around :

Here is the series to order 11.

It can be simplified.

Although the representation given by the recursive formula in (5) is more efficient in terms of CPU time compared to formula (3), there is still the restriction of using basis functions with nonvanishing first derivative at the expansion position , since appears in the denominator of the coefficients in (5)-(7). To overcome this limitation, we introduce an alternative representation of the Bürmann coefficients based on a combinatorial approach [10] that can be generalized.

Actually, the Bürmann expansions are related to Taylor series of reciprocal powers of analytic functions represented by

(11) |

The function is uniquely determined by the basis function . We will show that explicit expressions for the expansion of and the corresponding Bürmann coefficients as defined in (2) can be derived using the coefficients . The Bürmann expansion in this representation reads as

(12) |

The formula for the Bürmann coefficient in terms of the coefficients is thus given by

(13) |

The explicit formulation of generalized Bürmann series using powers of functions with derivatives, vanishing up to the order at , is given by

(14) |

and the expression for the Bürmann coefficient is

(15) |

Also, the special case of expanding the inverse function of can be derived in this general way, and one gets

(16) |

The standard case (i.e. and ) is obtained by setting in (14)-(16). By using the combinatorial approach as shown in the next subsection, one can also evaluate the coefficients in expansions resulting from the theorem of Teixeira [5], which is a generalization of Bürmann’s theorem to singular functions.

The approach is explained in more detail and demonstrated with examples coded in *Mathematica* in the following.

A partition of the positive integer is a sequence of positive integers with such that , for example, , a partition of 10 that is usually written as . The number of times a part occurs in a partition is its frequency . In the example, the parts occur with frequencies ; the example partition can be written as .

For a partition of , define the vector of frequencies, . Then

(17) |

for convenience, define

(18) |

In the example, and .

In *Mathematica*, gives all possible partitions of .

The frequencies of the parts, , can be found with .

However, `FrobeniusSolve` is slow for integers larger than about 30.

The function `PartitionsM`, based on `IntegerPartitions`, is significantly faster.

The function `PartitionsJ`, based on an undocumented but highly efficient function [11], is even faster.

According to

(19) |

the coefficients of reciprocal powers of analytic functions can be derived explicitly on the basis of combinatorics and analysis, recapitulated in appendix A. When is an integer, the coefficients are

(20) |

The case when is rational, , is relevant for Bürmann expansion with basis functions with vanishing derivatives . In that case,

(21) |

Now we show how to calculate (21) symbolically. For example, choosing the function , let us calculate .

We use the fact that the *Mathematica* functions `Times`, `Plus`, and `Total` work with empty lists.

To avoid the undefined expression , the differentiation is performed analytically first on the symbolic function . Then raising to the power of (which can be zero) is performed, and finally the symbolic function is substituted out by the function .

Here are the for the symbolic function expanded at .

Now we expand at . For convenience, define `auxf`.

While the expansion is valid for complex-valued functions, the plot shows only the real part of .

Explicit expressions for the Bürmann coefficients as they are defined in (2) can be defined with respect to the coefficients .

Again using (see appendix A), we can formulate the general expressions for Bürmann series using functions with vanishing derivatives . For instance, series of powers of functions of this type can give convergent expansions for functions that are defined by integrals, like the error function

(22) |

which plays a key role in the theory of linear and nonlinear heat transfer [1]. Defining the integrand as the basis function of a Bürmann series, as explained in (10), we will find a rapidly converging series representation of .

All the results of the previous section can be applied to get a formula for the Bürmann coefficients that is efficiently implemented in a simple function. The starting point of the derivation of this expression is a formulation of the Bürmann expansion in terms of a complex contour integral, as it is given in [5]. This approach can be found in various presentations [6, 7]. The evaluation of the integral representation of the Bürmann expansion results in

(23) |

The formula for the Bürmann coefficient in terms of the coefficients is thus

given by

(24) |

The function `fbür` shows how to apply (23) and (24) to the expansion of in powers of , the same example as presented in the first section. The series is expanded up to order 15, so that the error is at least of order 16.

For the special case of the inverse function of given by (12), in using the transformations indicated in (8),

(25) |

So the expansion of the inverse function can be expressed as

(26) |

Equation (26) is the compact formulation of a result given by Morse and Feshbach [12]. As a result of our approach, we have developed formulas for Bürmann coefficients and for the expansion coefficients of inverse functions that reveal the close relationship of these coefficients to the coefficients for reciprocal powers of an analytic function defined in (20). In the following section, we present a generalization of Bürmann’s theorem, using the solutions of equation (17).

Inspecting formulas (23) and (26), we notice that they cannot be evaluated for cases where or vanishes. This shortcoming must be overcome, for in some cases of interest we will be forced to find Bürmann series using basis functions whose first derivatives at vanish:

(27) |

To this end, define the multivalued function

(28) |

which is cast into the form

(29) |

The function in (29) can be expanded into a Taylor series with , and hence (28) fulfills the condition violated by . Thus, instead of expanding in powers of , we expand in powers of . A reformulation of the contour integral [5] results in the generalized form of Bürmann’s theorem given in (14). Actually, the introduction of the root function (28) in (14) leads to several solution branches. For real-valued functions , the use of the `Sign` and `Abs` functions in the following code extracts the correct branch of the root function for numerical purposes. For a formal proof of the equivalence of the Bürmann series and the Taylor series for , the `Sign` and `Abs` functions can be omitted.

As an example, we calculate an expansion of up to order 15 according to (14) in powers of , a basis function with a vanishing first derivative at (i.e. ):

Use this definition for the formal proof of equivalence.

Use this definition for numerical purposes, such as for plotting.

A faster convergence can be achieved by using a basis function of the form , for which the first and second derivatives vanish at the point (i.e. ).

Using formula (14), it is easy to deduce the expansion of the inverse function of an analytic function of the form

(30) |

The inverse function of comes from (14) by setting and :

(31) |

The error function

is the first example demonstrating the efficiency of Bürmann series using the first derivative as a basis function. We define the function and the basis function by

The error function will be expanded around the origin , where we find that . This expansion thus calls for the application of the generalized form of the Bürmann expansion given in (14). Hence we have to set, according to (28),

To evaluate (14), we use the following relations for the derivatives of the integrand:

The result of this calculation performed up to order nine in is

(32) |

A function calculating the expansion (32) is given below. To show that this approach is superior to a common Taylor expansion in a plot, we calculate the power series in up to order 10.

The plot shows that the series in (32) has only a small constant offset error for larger values of , whereas the Taylor expansion dramatically deviates for smaller values of , although it converges uniformly for all values of . The series in (32) converges uniformly for all and gives the exact value for the error function. The rearrangement of terms, leading to , is thus justified. Even for the lowest order, we will find a result that shows no unbounded error, unlike the Taylor series.

Due to the uniform convergence of (32), we can write:

(33) |

Using and in (33), we find . So by reordering the sums, we automatically get rid of the offset error at infinity. In fact, we can furthermore achieve a practical application of (33) by keeping only a few coefficients . For example, using only and requesting the correct slope at , one gets an approximation of the error function with a relative error smaller than 1.2%. Taking additional terms of (33) with meaningful conditions further improves the approximating series in (33), as also shown in the following plot (choosing and ).

This section applies the concept of Bürmann series to get solutions of nonlinear differential equations. After a short introduction, an example from the field of the diffusion type is presented.

In studying nonlinear ordinary differential equations, we cannot, in general, expect to find an exact solution expressible in terms of commonly used algebraic or transcendental functions. This difficulty is illustrated by the equations studied by Fujita, Lee, and Crank, which we will encounter later [13-17]. For the case of a general nonlinear second-order equation

(34) |

where denotes an analytic function of its arguments, one approach is to cast a solution into the form of a series of powers of the independent variable . Depending on the complexity of the expression on the right-hand side of the equation (34), determining the coefficients of this expansion by collecting the powers of and solving the resulting system of equations is a cumbersome procedure. Equations of the form (34) are often encountered in physics, and either their solutions can be determined numerically or their behavior is known qualitatively from experiments. Guided by this prior knowledge about the nature of , we can eventually construct or guess a function that is a more favorable base for a power series than the independent variable itself. We have to cast the representation of the solution of equation (34) into the form

and so we expand the solution using the recursive formula (5). The code below gives an expansion of the first four terms.

The free parameter occurring in will then be determined by the boundary conditions for . In the following, this kind of expansion with suitable basis functions will be applied to the solution of a problem of nonlinear heat transfer, and its convergence will be treated as far as relevant for this special case. For all the cases investigated in this article, we apply the recursive approach (7), since the structure of the chosen basis functions is relatively simple. For more sophisticated basis functions, the combinatorial formulas (12) and (14) have to be implemented in order to reduce CPU time. This may be done in future investigations.

As a canonical example, we study the partial differential equation of transient nonlinear heat transfer with temperature-dependent thermal conductivity . We demonstrate the application of Bürmann series to the practical problem of heat transfer in ZnO ceramics The half space is filled by this material, which has initial constant temperature , and the temperature as . At the surface temperature at is instantaneously raised to a constant temperature . Using the results of measurements of the thermal conductivity in ZnO [3], we can formulate the problem by writing

(35) |

Using the transformation

Kirchhoff’s transformation

and Boltzmann’s transformation

we find a nonlinear ordinary differential equation that has been extensively studied by Fujita [13-15], Lee [16, 17], and Crank [2]:

(36) |

While the first boundary condition in (35) is regular (i.e. and ), the second one is given in terms of the asymptotic expression . The equivalent value of the derivative has to be estimated, which is performed by calculating the time evolution of the total thermal energy of the semi-infinite half space [18]. The approximate value of this energy integral (in terms of an algebraic expression) is determined in appendix B, where we use the Bürmann series (10) to approximate the energy integral. The explicit expression is displayed in appendix B, equation (56), which describes the dependence of on the parameters with a relative accuracy of 0.37%.

To apply the methods developed in the preceding sections, we establish a convergent iteration scheme for equation 1 of (36). To this end, we transform this equation into an integral equation:

(37) |

The solutions for 1 of equation (36) are strictly decreasing functions of , which implies that for all positive values of we have

From this relation, we conclude that we can define a system of functions , , , … of the form

(38) |

Taking the first term of the expansion (32) of the error function, we have

(39) |

Since the system (39) converges toward the solution , a possible choice for a basis function would be . Instead of , we prefer to introduce a less complicated function that simplifies the calculations. According to (39), this function has to fulfill the following conditions:

The function can be constructed in such a way as to guarantee that all essential boundary conditions are fulfilled by :

The parameter is as yet undetermined. A useful basis for an expansion is obtained by taking

(40) |

which will lead, according to (6) and (7), to the Bürmann series calculated as follows.

Expressed in terms of , this is

(41) |

If we consider in (40) and (41) as a free parameter, we have to investigate how its choice influences the convergence of the corresponding Bürmann series. We write, using (37) and omitting the index for ,

All integrands are representable by uniformly convergent series expansions in powers of , and thus the argument is similar to the one used in proving the convergence of the system (39). Furthermore, we observe that the expansion of the exponential function converges more quickly for higher values of . Thus, as long as can be chosen to guarantee the condition

for some value , an iterative system of functions can be constructed that converges to the solution of equation 1 of (36). This fact can be exploited by determining in such a way as to assure that the approximation

assumes the correct value at infinity. So we have

(42) |

resulting in an algebraic equation of the order in with roots ,

(43) |

where satisfies the condition

(44) |

If there is a solution to equation (43) simultaneously fulfilling condition (44), we get an approximation that converges to the correct value of as . For the third-order approximation , we get from (41) and (43) the cubic equation

(45) |

The relevant real solution to this equation is

(46) |

with

(47) |

We display the explicit expression for the third-order approximation below:

(48) |

In the next two plots, we show a comparison between the approximation (48) and the exact solution found by applying `NDSolve` to 1 of (36). The value for is calculated by using the approximation given by equation 2 of (57). Note that (48) can also be inverted exactly by using common algebraic and transcendent functions. The corresponding procedure is listed below for the same parameters as given in the previous section.

This defines the cubic approximation according to (45)-(48).

For the numerically exact solution, calculated by using `NDSolve`, we impose the condition (i.e. ). This condition is equivalent to selecting a slope at the surface . Bürmann’s theorem is used a second time to find an approximation for the slope by calculating the Bürmann expansion (56) of the energy integral (see appendix B).

The plot shows the numerically exact temperature profiles (the colored curves) and the exact solution’s third-order approximation according to equation (48) (the dotted lines), in the range at .

Additionally the relative error of the third-order approximation (48) is displayed for the same profiles as shown before.

The goal of this work is to give a comprehensive presentation of Bürmann’s theorem and its application to linear and nonlinear DEs and PDEs of heat transfer, using single-valued and multivalued basis functions.

To this end, a reformulation of the formulas of the expansion coefficients of Bürmann series, based on a combinatorial viewpoint, is developed. As a result of this reformulation, an algorithm is presented, which accelerates the calculation of expansion coefficients, compared to standard methods. Using this approach, the expansion of transcendental functions in powers of their derivative is applied to the error integral, to the solution of nonlinear differential equations, and to the evaluation of the heat integral. By combining these methods, it is possible to show that the approximate solution of nonlinear problems of heat transfer can be given in terms of Bürmann expansions. Finally, it is shown that the introduction of an additional parameter in the basis function can significantly enhance the convergence of a Bürmann series. The value of this parameter can be found by solving algebraic equations that result from the boundary conditions of the problems.

The coefficients , defined by

result from elementary considerations. We write

(49) |

The power of is given by

(50) |

Rearranging equation 1 of (50) in increasing powers , we have two conditions

(51) |

Thus, collecting all powers of over all the contributions arising from all with is equivalent to imposing one single condition, replacing the conditions 1 and 2 of (51):

Since , we define :

(52) |

Using equations 1 of (49), 1 of (50), and (52) we finally arrive at the expansion

A similar result, displayed in (21), is obtained for .

We define the temperature difference and the differential equation it obeys:

(53) |

By integrating equation 2 of (53) over , we find

(54) |

Performing the transformations of Kirchhoff and Boltzmann, we arrive at the solution for :

(55) |

On the other hand, using the definition given in (54), we expand in a Bürmann series in powers

of according to (10). This leads to

(56) |

Combining (55) and (56) up to order three gives

According to the transformations in (36), we have

which leads, after some manipulations, to a quartic algebraic equation for and its solution , given by

(57) |

An approximation of (56) up to order six would lead to a sextic equation that is reducible to a cubic equation, and hence to an algebraic expression for . The approximation obtained from the first equation of (57) for is , which shows a maximum relative error of 0.37% compared to the exact value found by numerical methods (i.e. `NDSolve`) using the parameters listed in (35).

The authors wish to thank Prof. Dieter Messner, Lienz, and Dr. Hans Riedler, Graz, for their encouragement and helpful advice. Special thanks go to the editors and reviewers of *The Mathematica Journal* for their constructive suggestions and support.

[1] | H. S. Carslaw and J. C. Jaeger, Conduction of Heat in Solids, 2nd ed., Oxford: Clarendon Press, 1959 pp. 482-484. |

[2] | J. Crank, The Mathematics of Diffusion, 2nd ed., Oxford: Clarendon Press, 1975 pp. 107-110, 119-121. |

[3] | T. Olorunyolemi, A. Birnboim, Y. Carmel, O. C. Wilson Jr., I. K. Loyd, S. Smith, and R. Campbell, “Thermal Conductivity of Zinc Oxide: From Green to Sintered State,” Journal of the American Ceramic Society, 85(5), 2002 pp.1249-53.doi:10.1111/j.1151-2916.2002.tb00253.x. |

[4] | C. Wagner, “Diffusion of Lead Chloride Dissolved in Solid Silver Chloride,” Journal of Chemical Physics, 18, 1950 pp. 1227-1230. doi:10.1063/1.1747915. |

[5] | E. T. Whittaker and G. N. Watson, A Course of Modern Analysis, 4th ed., Cambridge: Cambridge University Press, 1927 pp. 128-132. |

[6] | E. W. Weisstein. “Bürmann’s Theorem” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/BuermannsTheorem.html. |

[7] | Wikipedia. “Lagrange Inversion Theorem.” (Oct 17, 2014) en.wikipedia.org/wiki/Lagrange_inversion_theorem. |

[8] | H. Stenlund, “Inversion Formula.” arxiv.org/abs/1008.0183. |

[9] | E. W. Weisstein, “Series Reversion” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/SeriesReversion.html. |

[10] | L. Comtet, “Advanced Combinatorics: The Art of Finite and Infinite Expansions” (J. W. Nienhuys, trans.), D. Reidel: Dordrecht, 1974. |

[11] | E. W. Weisstein, “Faà di Bruno’s Formula” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/FaadiBrunosFormula.html. |

[12] | P. M. Morse and H. Feshbach, Methods of Theoretical Physics, Vol. I, New York: McGraw-Hill, 1953 pp. 411-413. |

[13] | H. Fujita, “The Exact Pattern of a Concentration-Dependent Diffusion in a Semi-infinite Medium, Part I,” Textile Research Journal, 22(11), 1952 pp. 757-760. doi:10.1177/004051755202201106. |

[14] | H. Fujita, “The Exact Pattern of a Concentration-Dependent Diffusion in a Semi-infinite Medium, Part II,” Textile Research Journal, 22(12), 1952 pp. 823-827. doi:10.1177/004051755202201209. |

[15] | H. Fujita, “The Exact Pattern of a Concentration-Dependent Diffusion in a Semi-infinite Medium, Part III,” Textile Research Journal, 24(3), 1954 pp. 234-240. doi:10.1177/004051755402400304. |

[16] | C. F. Lee “On the Solution of Some Diffusion Equations with Concentration-Dependent Diffusion Coefficients—I,” Journal of the Institute of Mathematics and Its Applications (now IMA Journal of Applied Mathematics), 8(2), 1971 pp. 251-259. doi:10.1093/imamat/8.2.251. |

[17] | C. F. Lee “On the Solution of Some Diffusion Equations with Concentration-Dependent Diffusion Coefficients—II,” Journal of the Institute of Mathematics and Its Applications (now IMA Journal of Applied Mathematics), 10(2), 1972 pp. 129-133. doi:10.1093/imamat/10.2.129. |

[18] | T. R. Goodman, “Application of Integral Methods for Transient Nonlinear Heat Transfer,” in Advances in Heat Transfer, Vol. I (T. F. Irvine, Jr. and J. P. Hartnett, eds.), New York: Academic Press, 1964 pp. 51-122. |

H. M. Schöpf and P. H. Supancic, “On Bürmann’s Theorem and Its Application to Problems of Linear and Nonlinear Heat Transfer and Diffusion,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-11. |

Harald Markus Schöpf was born in Lienz, Austria on May 5, 1965. He studied solid-state physics and technical physics in Graz. After two years as a research assistant at the Graz University of Technology, Austria, he moved to Siemens Matsushita (now EPCOS) in Deutschlandsberg, Austria. Now he works as a teacher in Lienz.

Peter Hans Supancic was born in Graz, Austria on September 7, 1967. He studied theoretical physics in Graz and graduated in materials science at the University of Leoben, Austria. After finishing the habilitation on functional ceramics, he became ao. Prof. at the University of Leoben.

**Harald Markus Schöpf**

*Schillerstrasse 4
A-9900 Lienz*

**Peter Hans Supancic**

Institut für Struktur- und Funktionskeramik/Montanuniversitaet Leoben

*Peter Tunner Strasse 5
A-8700 Leoben *

When my son brought home a paper from school with a grid of numbers on it, I was immediately interested. The goal: cover the puzzle with all the dominoes from the “bone pile,” making sure that each number of the puzzle is covered by the same number on a domino. Many similar puzzles can be found online and in puzzle collections: see [1, 2, 3, 4, 5] for several online resources, which are the source of some of the examples considered here.

**Figure 1.** A partially solved domino grid, with almost half of the 28 dominoes placed on the underlying puzzle grid.

Our first task is to represent the board.

Next, we need the bone pile, the list of available dominoes. In this case, the bone pile consists of all 28 dominoes from the double zero to double six, but the definition is generally valid for any non-negative number , for a total of dominoes.

Find possible locations for a given piece .

This is the workhorse of the entire solution, first dividing the puzzle into pairs along each row and looking for matches to the given pair, then repeating the process on the transposed matrix (i.e. along the columns of the original grid) and noting the locations of any matches found. The location of the pair in the partition gives the location of the first half domino in the original grid, but adding the appropriate offset gives the location of the second half domino as well, and both are included as a domino location in the list of locations found.

Now for functions to highlight the dominoes within a puzzle. The function `frameDomino` generates the options to include in the `Frame` option of `Grid`.

The function `displayPuzzle` accepts a puzzle grid (a matrix) and a domino list (a list of location pairs) and displays the puzzle grid with frames around the dominoes indicated in the list.

For example, there are two possible locations for the domino in the `m9` puzzle.

A `puzzle` object takes three arguments.

- The matrix
`m`contains the puzzle to be solved, a 2D array of integers. - The filled locations list
`filled`is a list of coordinate pairs: , where either and or and . - The bone pile
`bones`, the list of unplayed dominoes, consists of a list of pairs of integers.

The `Format` command defines how to format a puzzle: the puzzle matrix has its filled list of dominoes framed, and a tooltip shows the bone pile, if any.

This section shows examples of various puzzles; mouse over a puzzle to see the bone pile. In this puzzle, no dominoes have been played yet.

Here two pieces have been removed from the bone pile and placed on the board.

To ensure that the squares filled by already placed pieces are no longer included, make a version of the board with the affected squares blanked out.

This function finds the forced locations; only one piece can possibly go into a forced location.

Find the forced locations after two particular dominoes have been played.

The forced locations are shown empty.

- Select the pieces that fit in forced locations.
- Use
`find`to return a list of all possible locations for playable pieces, and select the pieces that have only one possible location:

In this artificial case, there are two forced locations: in each, only one piece can be placed.

The function `step` finds the forced locations and fills them in with the appropriate dominoes taken from the bone pile. Mouse over to see that these dominoes have been removed from the bone pile.

At the beginning, there are no forced locations, but there are four forced pieces: pieces that can only be placed in one location in the puzzle: , , , and . The `step` function plays all four at once.

We are ready to solve the whole puzzle. The next command prints the current state, takes one step, and repeats until the bone pile is empty.

Along the way, multiple partial solutions had to be considered when no forced locations or forced pieces were found, but in the end all but one solution were dropped because of inconsistency. The comments were left in to show the forced locations or forced pieces at each step, but now we turn them off.

There is no reason not to make a prettier display function to show the dominoes with their customary pips (or dots), rather than showing only the grid numbers. We can represent the pip positions by matrices, some of which can be easily created by built-in matrix commands. Since the pip positions of double-9 and double-6 domino sets are consistent, let us build the larger set here. (A double-12 set would require adjusting the pip positions.)

The other matrices could be built by hand or using `SparseArray` or `Table` with appropriate criteria, but it is easy to create them by addition and subtraction.

A pip will be placed on a half-domino square wherever the matrix had a 1.

The function `displayDottedPuzzle` creates a graphical display of the puzzle, optionally replacing numbers by half-domino faces for any locations listed in the “filled list,” outlining any placed dominoes in a way similar to `displayPuzzle`.

The method described here can be thought of as “human-type,” since it uses intelligently chosen criteria for deciding which step to perform and which option to try next. The criteria used can be summarized as follows:

- Seek forced locations: if any locations can take none of the available dominoes, abandon the partial solution currently being constructed; if any locations can take exactly one available domino (and not the same one), fill all of these “forced locations.”
- Else seek forced dominoes: if any of the available dominoes cannot be placed on the board, abandon the partial solution currently being constructed; if any of the available dominoes can only be placed in one location on the board, play all of these “forced dominoes.”
- Else for a minimal case, place one domino in all possible locations, making separate copies of the puzzle for each case.
- Repeat until no further changes occur.

A human can make more complicated arguments eliminating some options; for examples, see the explanations at the sites [1, 2, 3, 4, 5]. (But not all suggested solving strategies turn out to be useful. One common idea, placing the “double” dominoes first, can easily be defeated by a clever puzzle designer.) The order is arbitrary and might be modified, but is far faster than the more simplistic, brute-force method presented in the following section.

Here is a list of all possible locations of all dominoes in our original puzzle.

The number of options for the pieces varies wildly.

(You can easily verify that in this puzzle, all the double dominoes have between four and eight possible placement options, making “place doubles first” a poor strategy in this case.) Taking all possible options for all the pieces gives a very large number.

Too many cases to consider! But this method would work, theoretically: Use `Outer` to get all combinations of choices of these options and then use `Select` on those that have no overlapping dominoes. Here do only the first three dominoes.

Using the first three dominoes, there are possibilities, reduced to 19 after elimination of conflicts. Placing the first 13 dominoes involves considering 653184 cases, of which only four have no conflicts.

So the following code should work, but will take an unreasonable amount of time and memory. It is beautifully simple and short, but do not run it, as it probably would not finish in our lifetimes!

“To a hammer, everything looks like a nail.” A few years ago, I worked out an exhaustive search-and-collision detection algorithm based on a idea of a generalized odometer, and since then I have seen applications for it everywhere. It works here, too.

Create a 28-digit generalized odometer, whose digit refers to which option we are trying for the domino. All digits start as 1; incrementing the odometer does not in general occur at the right end, but at the first digit (from the left) whose domino placement conflicts with that of any previous domino. A digit “rolls over” when it is incremented past its maximum value and must be reset to 1. Whenever a digit rolls over, also increment the digit to its left, just as in a real odometer. Each odometer digit has a separate maximum determined by the number of options available for that domino. When the first digit finally rolls over, all solutions have been found. We also accelerate the procedure by sorting the domino option list in increasing length.

Notice that the first four odometer digits can only be 1; each starts at 1 and has a maximum of 1.

To see or use the parts of options specified by the odometer, we use the function `MapThread`.

Here is the program that more or less immediately returns the answer(s).

As expected, there is only one odometer reading that works; that is, only one choice of domino placements solves the puzzle. The generalized odometer method works best for situations with a large number of variables taking on values that can be calculated in advance, particularly if the possible values are the same for all variables or vary in a way that can be easily specified. Here the options have to be recomputed for each new puzzle, making it less efficient than the previous method.

A “quadrilles” puzzle [5], an idea credited to French mathematician Edouard Lucas, can be divided into blocks, each containing the same number. Since the following figure does not completely fill a rectangular array, we add empty strings.

This particular quadrille has only one solution. At each step there are a large number of forced locations or pieces, and all 28 dominoes are placed in only four iterations.

Now for a puzzle with so many different ways to solve it that one feels that almost anything will work [5]!

If a puzzle is nonrectangular or has intentional gaps in it, such as the one shown below [4], simply embed it in a larger rectangle, and indicate the gaps by empty strings.

It seems likely that the online or downloadable domino puzzle generators effectively lay out the dominoes to create a grid that is guaranteed to be solvable. But even if all puzzles presented can be solved, a number of questions spring to mind:

For given grid dimensions, how many different solutions are there? (The three methods derived above solve individual puzzles, but what if the numbers are rearranged in a given grid in all possible ways?)

For given grid dimensions, what fraction of the possible puzzles has only one solution, and in general, for all , what fraction of the puzzles has solutions? What is the largest number of solutions possible?

Bear in mind that in the sense of the functions developed here, a “solution” is a merely a list of domino locations, so different puzzles of the same dimensions can have the same solution just by permuting the underlying grid numbers or rearranging them in other valid ways. In the interest of increased clarity, define a *solution schema* as a layout of dominoes face-down on a board. Now we can talk about the number of possible distinct schemas for a given puzzle grid.

What about writing a program that generates all solution schemas for a given board, ignoring the numbers? This could be done by modifying either the function `solvePuzzle` or the function `odometerSolve`, neither of which can quite do the job as written. (Yes, I did try them on a board filled with 0 entries, but they would need to be tweaked to expect a bone pile of double-zero dominoes.)

Finally, it is interesting that the first solution method worked so well, basically following how a human would decide which domino to play next. The code for the brute-force method is the simplest, but impractical without massive parallel processing. The odometer method works well, but here not as fast as the “human” method, and in any case may not be as transparent to the reader. There is more than one way to solve a puzzle! And if you spend much time thinking about a puzzle, other methods and other questions will probably occur to you.

I thank my colleagues at Southern Adventist University who have encouraged me, the folks at Wolfram Research who have occasionally helped me, and Claryce, who has put up with me in all my most puzzling moods.

[1] | E. W. Weisstein. “Domino Tiling” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/DominoTiling.html. |

[2] | Domino-Games.com. “Domino Puzzles.” (Sep 4, 2014) www.domino-games.com/domino-puzzles.html. |

[3] | “Dominosa.” (Sep 4, 2014) www.puzzle-dominosa.com. |

[4] | Yoogi Games. “Domino Puzzle Puzzles.” (Sep 4, 2014) syndicate.yoogi.com/domino. |

[5] | J. Köller. “Domino Puzzles.” (Sep 4, 2014) www.mathematische-basteleien.de/dominos.htm. |

K. E. Caviness, “Three Ways to Solve Domino Grids,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-10. |

Ken Caviness teaches at Southern Adventist University, a liberal arts university near Chattanooga. Since obtaining a Ph.D. in physics (relativity and nuclear physics) from the University of Massachusetts Lowell, he has taught math and physics in Rwanda, Texas, and Tennessee. His interests include both computer and human languages (including Esperanto), and he has used *Mathematica* since Version 1, both professionally and recreationally.

**Kenneth E. Caviness**

*Department of Physics & Engineering
Southern Adventist University
PO Box 370
Collegedale, TN 37315-0370*

Evaluating molecular integrals has been an active field since the middle of the last decade. Efficient algorithms have been developed and implemented in various programs. Detailed accounts of molecular integrals can be found in the references of [1]. In this article, the third in a series describing algorithms for evaluating molecular integrals, we detail the evaluation of the nuclear-electron attraction energy integrals from a more didactic point of view, following the approach of Rys, Dupuis, and King [2] as implemented in the OpenMol program [3].

The energy caused by the attraction between an electron in the region described by the overlap of the orbitals , and a nuclear of charge located at is expressed by the nuclear-electron attraction integral

(1) |

in which is an unnormalized Cartesian Gaussian primitive.

Using the Gaussian product (see, for example [1]) and defining the angular part as :

(2) |

The pole problem can be solved by the Laplace transform

(3) |

which turns the integral into

(4) |

where for now, we have ignored the factor . In the following steps, we will make certain modifications, knowing in advance that they will help simplify the expressions later on. We first reduce the upper limit of to unity by making the changes of variable (recall from the Gaussian product that ):

(5) |

and

(6) |

Replace , , and in , to get

(7) |

We now multiply by the factor :

(8) |

Again, by inserting , we get

(9) |

Having arrived at the desired form, we reinsert the value of the angular part into the expression and separate the term enclosed by the curly brackets into three components , , and :

(10) |

Defining as the function of the component inside the bracket,

(11) |

and similarly for and , we rewrite the integral as

(12) |

We will show that the integrand in the expression for is in fact an overlap between two one-dimensional Gaussians, and we may use the results that have been developed in [1]. First, we expand the exponential parts of the integrand

(13) |

regrouping in terms of and , we have

(14) |

which becomes

(15) |

where and . These definitions let us compare this equation with the result of the Appendix, in which we see that equation (15) is simply

(16) |

where

(17) |

Substituting and ,

(18) |

into equation (13), we have

(19) |

Substitute this result into the definition of to get

(20) |

The integral has the same form as a one-dimensional overlap integral where the integrand is a Gaussian function centered at with an exponential coefficient .

From the observation above, we make use of the results developed for overlap integrals in [1]. For example, for ,

(21) |

In particular, we have the transfer equations

(22) |

The and functions take similar forms. The product is a polynomial in , and if we replace , then the integral in equation (12) is

(23) |

where is the said polynomial. The integral is a combination of the Boys function (see, for example, Reference 4 of [1])

(24) |

a strictly positive and decreasing function.

Aside from the obvious choice of using *Mathematica* to evaluate the Boys function, there are several ways of evaluating the integral. In practice, most programs store pretabulated values of the function at different intervals and interpolation is done as needed (e.g. by Chebyshev polynomials). Here we use the Gauss-Chebyshev quadrature numerical integration [4]. For simplicity, we have adopted almost verbatim the F77 code in [4, p. 46].

The function `Nea` evaluates the nuclear-electron attraction integral of two Gaussian primitives; here `alpha`, `beta`, `RA`, `RB`, `LA`, and `LB` are , , , , , and as defined earlier; `RR` is the nuclear position.

As in our two earlier articles [1, 5], we use the same data for the water molecule (, , the geometry optimized at the HF/STO-3G level). The molecule lies in the - plane with Cartesian coordinates in atomic units.

In the STO-3G basis set, each atomic orbital is approximated by a sum of three Gaussians; here are their unnormalized primitive contraction coefficients and orbital exponents.

Here are the basis function origins and Cartesian angular values of the orbitals, listed in the order , , , , , , and .

Specifically, for the nuclear-electron attraction energy integral between the first primitive of the orbital of hydrogen atom 1, , the first primitive of the orbital of the oxygen atom, , and atom 1 () is

(25) |

We have

From the Gauss-Chebyshev quadrature, the integral in equation (23) yields . The nuclear-electron integral (25) is . This is calculated as follows.

We would first need the normalization factor before evaluating the nuclear-electron energy matrix.

We have provided a didactic introduction to the evaluation of nuclear-electron attraction-energy integrals involving Gaussian-type basis functions by use of recurrence relations and a numerical quadrature scheme. The results are sufficiently general so that no modification of the algorithm is needed when larger basis sets with more Gaussian primitives or primitives with larger angular momenta are employed.

Consider the Gaussian product: . Combine and expand the coefficients to get

Let and substitute in the exponent to get

The first three terms inside the second bracket factor to , and the last two can be reduced to . The original Gaussian product is thus

Here is a verification.

[1] | M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals I,” The Mathematica Journal, 14(3), 2012. doi:10.3888/tmj.14-3. |

[2] | J. Rys, M. Dupuis, and H. F. King, “Computation of Electron Repulsion Integrals Using the Rys Quadrature Method,” Journal of Computational Chemistry, 4(2), 1983 pp. 154-157. doi:10.1002/jcc.540040206. |

[3] | G. H. F. Diercksen and G. G. Hall, “Intelligent Software: The OpenMol Program,” Computers in Physics, 8(2), 1994 pp. 215-222. doi:10.1063/1.168520. |

[4] | J. Pérez-Jorda and E. San-Fabián, “A Simple, Efficient and More Reliable Scheme for Automatic Numerical Integration,” Computer Physics Communications, 77(1), 1993 pp. 46-56. doi:10.1016/0010-4655(93)90035-B. |

[5] | M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals II,” The Mathematica Journal, 15(1), 2013. doi:10.3888/tmj.15-1. |

M. Hô and J. M. Hernández-Pérez, “Evaluation of Gaussian Molecular Integrals,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-9. |

Minhhuy Hô received his Ph.D. in theoretical chemistry at Queen’s University, Kingston, Ontario, Canada, in 1998. He is currently a professor at the Centro de Investigaciones Químicas at the Universidad Autónoma del Estado de Morelos in Cuernavaca, Morelos, México.

Julio-Manuel Hernández-Pérez obtained his Ph.D. at the Universidad Autónoma del Estado de Morelos in 2008. He has been a professor of chemistry at the Facultad de Ciencias Químicas at the Benemérita Universidad Autónoma de Puebla since 2010.

**Minhhuy Hô**

*Universidad Autónoma del Estado de Morelos
Centro de Investigaciones Químicas
Ave. Universidad, No. 1001, Col. Chamilpa
Cuernavaca, Morelos, Mexico CP 92010
*

**Julio-Manuel Hernández-Pérez
**

Facultad de Ciencias Químicas

Ciudad Universitaria, Col. San Manuel

Puebla, Puebla, Mexico CP 72570

A cellular automaton (CA) is a dynamical system with arbitrarily complex global behavior, despite being governed by very simple local rules [1]. In order to better understand how that kind of complex behavior emerges, many explorations have been made in the context of the power implicit in CA rules. For instance, classical benchmark problems have been used for this, including the density classification task [2, 3] and the parity problem [4]. The density classification task tries to discover the most frequent bit in the initial configuration of the lattice; the parity problem tries to find the parity of the number of 1s in the initial configuration of the lattice. One of the approaches in these contexts is to evaluate every possible CA of a given family in terms of its capabilities to solve the target problem. This approach is possible in small CA families, like the elementary space (composed of 256 CAs), but is not feasible in larger families, like the one-dimensional binary CA family with radius 3, composed of rules.

As a strategy to search for CAs in large rule families, evolutionary computation has been extensively used, relying on measures of properties of the candidate rules, such as their degree of internal symmetry, so as to discard or keep candidates according to these property values. This was a key aspect, for instance, that led to finding WdO, currently the best one-dimensional radius-3 rule for the density classification task [5].

An alternative is to constrain the search space to only the CAs that are known to present specific properties. The challenge here is how to constrain the space without the need to enumerate the entire subspace of interest. Here, we introduce the concept of a CA template as a possible way to achieve this goal. A CA template is a data structure associated with the rule tables of the members of a CA family that relies on the use of variables. The introduction of these variables makes it possible for a CA template to represent a set of rules, unlike the standard -ary rule table representation that can only represent one individual CA. By making use of *Mathematica*’s built-in equation-solving capabilities and algorithms that allow finding equality relations among CAs with a given property, we are able to create templates that represent number-conserving CAs (those that, in a sense, preserve the number of states of the initial configuration; more details below), as well as those with maximal internal symmetry (those displaying invariance under some transformations in their rule tables; also to be explained below). These two cases are given here as examples of the applicability of the template idea, but other properties can also be accounted for.

In the following section, basic notions about CAs are given, followed by a section that presents details about important properties related to the density classification task. Section 4 explains the notion of *template* and presents the implemented algorithms. Section 5 concludes the text, with a discussion on the advantages and limitations of using templates, and gives some ideas for future work.

Cellular automata constitute a class of decentralized dynamical systems, usually discrete in space, time, and states [1]. As systems governed by relatively simple rules, CAs represent a meaningful model for tackling the issue of how interaction among simple components can lead to the solution of global problems.

CAs are composed of a regular lattice of cells whose states change through time, according to a local rule. The lattice can be deployed in any number of dimensions (most commonly one, two, or three) and may have an infinite or fixed number of cells. Cells’ states are commonly represented by numbers or colors out of possibilities ranging from 0 to . The local rule of the CA acts on the neighborhood of every cell, which is the set of neighboring cells meant to influence its subsequent states. The neighborhood is usually expressed by its radius (or range) , meaning the range of cells on each side affecting the one in question. By defining values for these two parameters, a CA rule space or family is defined. The values of and in the one-dimensional case (i.e. a neighborhood has three cells; a cell has two possible states) give rise to the elementary rule space, which is the most well-studied family, due to its small size of only 256 rules but extremely rich phenomenology [1].

For present purposes, whenever we refer to cellular automata, we mean one-dimensional, binary () CAs, with a fixed number of cells in the lattice and periodic boundary conditions (i.e. the lattice is closed at its ends, like a ring).

Every CA is governed by a rule that relates the neighborhood of a cell to the state it takes on at the next time step. Its most common representation is the rule table, which is an explicit listing of every possible state configuration of the neighborhoods, lexicographically ordered, and a corresponding cell state for each. Here we use Wolfram’s lexicographical ordering, where the leftmost neighborhood is formed by the neighborhood configuration where all cells are in the () state, all the way down to the rightmost neighborhood with all cells in the 0 state.

As an illustration, this is the rule table of the elementary CA for rule 184.

This is the ordered set of output cell states from that rule table, the -ary form.

By converting the binary sequence that defines the -ary form into a decimal representation, one obtains the CA rule number, which serves as a unique identifier of a CA in a given rule space [1].

In order to handle operations concerning rule tables, various *Mathematica* functions are defined. So, given a rule table in its -ary form, the function `RuleTableFromkAry` transforms it to its classical representation.

The function `kAryFromRuleTable` reverses the process.

Given a CA’s rule number, `RuleTableFromRuleNumber` determines its rule table.

The inverse function `RuleNumberFromRuleTable` yields the rule number from the rule table.

`WellFormedRuleTableQ` is a predicate that checks whether a rule table in -ary form is valid according to its values of and .

`RuleOutputFromNeighbourhood` is a utility function to get the output corresponding to a particular neighborhood in a rule table.

Finally, `AllNeighbourhoods` is a utility function giving all possible neighborhoods of a certain rule space.

All these functions are handy to perform rule table manipulation and are used throughout this article.

In the one-dimensional case, it is possible to visualize the system’s evolution using a space-time diagram, in which time goes from top to bottom, and cell states are represented by colors. For binary CAs, white cells are in the 0 state and black cells in the 1 state. In order to obtain and plot the space-time diagram resulting from a rule execution on a given lattice, one can use *Mathematica*’s built-in functions `CellularAutomaton` and `ArrayPlot`.

In order to better understand the computational power implicit in a CA rule, benchmark problems have been defined for it to tackle; among them, the most common is the density classification task (DCT). In the classical definition of DCT, a one-dimensional binary CA has to lead an arbitrary initial odd-sized configuration into a fixed-point state of all blacks, if the initial condition has a larger number of black cells, or into a fixed-point state of all whites otherwise.

It has been proved that in order to solve the DCT perfectly, a CA would need to be number conserving, that is, it should not change the number of cells in each state from any given initial condition [6]. This fact stands as a contradiction against the classical definition of the DCT, since in order for it to evolve to an all-black or all-white configuration, it would obviously need to change the number of cells in each state throughout time. This means that DCT is unsolvable when formulated according to its classical definition [2, 3].

Currently, the best imperfect DCT solver (known as Wd0) was found in [5], by means of a sophisticated evolutionary algorithm that used, among other important properties, the internal symmetry of a rule in its fitness function. In tune with the fact that a perfect DCT solver would need to be number conserving, Wd0 and other good DCT solvers are known to have a very small Hamming distance from number-conserving rules of the same rule space [7].

All in all, number conservation and internal symmetry are two important properties when determining the ability of a CA to solve the DCT, and serve as good examples for the notion of CA templates. Both are described in detail in the following subsections. But notice, upfront, that these two properties are amenable to being addressed in templates, since they derive from well-established relations among state transitions.

Number conservation is a property presented by some CAs, in which the sum of the states of the individual cells in any initial configuration does not change during the space-time evolution; in particular, for binary CAs, this means that the number of 1s always remains the same. This kind of CA is useful, for instance, to model systems like car traffic, in which a car cannot appear or disappear as time goes by [7]. Elementary CA 184 is an example of a number-conserving CA.

In order for a one-dimensional CA rule to be number conserving, it is established in [8] that the local rule with neighborhood size must respect the following necessary and sufficient conditions for every state transition:

where corresponds to a sequence of 0s of length .

A simplification of the original algorithm from [8] is provided in [9]. Basically, it was shown that for any given rule, it suffices to analyze the state transitions associated with the neighborhood made up of only 0s and the neighborhoods not starting with 0. This is a total, therefore, of neighborhoods instead of , as stated in [8]. This is the condition we employ to obtain templates that represent number-conserving CAs, as will be shown below.

Apart from number conservation, a rule’s internal symmetry also plays an important role in solving the DCT. In order to fully understand how this property works, an explanation about rule transformations and dynamically equivalent rules is required; the presentation is restricted to binary rules, even though this notion extends to the arbitrary -ary case.

Given the rule table of a CA, one can apply three types of transformations on it that will result in dynamically equivalent rules. For the binary case, `BlackWhiteTransform` is obtained by switching the state of all cells in a rule table. The second type of transformation, `LeftRightTransform`, is obtained by reversing the bits of the neighborhoods in a rule table and reordering the set of state transitions. The composition in either order of the latter two transformations (they commute) yields the third type, `LeftRightBlackWhiteTransform` or `BlackWhiteLeftRightTransform`.

Here is how they work on rule 110.

This checks the first one, `BlackWhiteTransform`.

With these transformations, it becomes straightforward to see which CAs in a given space have equivalent dynamical behavior. For instance, by applying the three transformations on a given CA, say elementary rule 110, elementary rules are obtained. These four rules are said to be in the same dynamical equivalence class. It is easy to see why, by looking at their space-time diagrams.

By comparing the rule table of a CA with the one that resulted from its equivalent rule obtained out of a given transform, it is possible to count the number of state transitions they share. In a sense, this provides a measure of the amount of *internal symmetry* of a CA with respect to that transformation, whichever it is. For instance, elementary CA 110 has an internal symmetry value of 2 with respect to the black-white transformation, since it shares two state transitions with its black-white symmetrical rule, which is elementary rule 137.

Repeating this process with rule 150, on the other hand, yields a different result. Rule 150 has an internal symmetry value of 8 according to the black-white transformation. This is the maximum possible value of this measure with elementary CAs. This is quite predictable, as the black-white transformation of rule 150 is rule 150 itself. In fact, any of the three transformations applied to rule 150 yields rule 150 itself, indicating it has the maximum internal symmetry value according to any of the three transformations.

The degree of internal symmetry of a rule can be a relevant measure in any context where a property is shared among all members of a class of dynamical equivalence. In [5] and [7], for instance, rules with maximal internal symmetry with the composite transformation were key for their findings related to DCT.

A CA template is an enhancement over the rule table representation, obtained by allowing it to have variables in the place of simple cell states as its results. As a consequence, a CA template has the power to represent whole subsets of CA rule spaces, instead of only a single rule.

As a simple example, consider the template . It represents the subset of the elementary CAs with fixed bits at positions 1, 3, 5, 6, and 8 in the list, free variables at positions 2 and 4, and complement bits at positions 2 and 7.

Using *Mathematica*’s built-in transformation rules, one can obtain the four CAs represented by this template, as well as their corresponding rule numbers.

The function `RuleTemplateVars` lists the variables in a template.

Extracting the variables from a template and applying a value to each, the template is transformed into one of its represented rule tables. Every template has a number of possible substitutions equal to ; however, as will be seen later, some of those may not be valid.

The function `ExpandTemplate` performs this operation by applying values to each variable of a given template. It may receive as an optional argument an integer called `ithSubstitution` in the range `0` to , representing which substitution should be made. If omitted, it performs all the possible substitutions for a given template.

After the expansion, one can obtain the list of valid rules represented by the template by using the function `RuleNumbersFromTemplate`.

With *Mathematica*’s built-in symbolic computation features, it is easy to create templates that represent a whole space. The space of elementary CAs would be represented by the following template.

In [4], the authors analytically found which transitions needed to be fixed, variable, or dependent on other transitions in a CA rule table, in order to have a chance to solve the parity problem perfectly. By fixing those transitions, they restrained the rule space of one-dimensional, binary, radius-2 CAs, composed of 4,294,967,296 rules, to only 16 candidates for perfect parity solvers. Although they used the de Bruijn graph as the primary structure to represent this rule space subset, it could have been easily represented with CA templates.

Empowered by *Mathematica*’s built-in equation-solving capabilities, algorithms can be developed that find the fixed, variable, and dependent state transitions on a rule table, thus leading to templates that are representatives of CAs that share the properties of number conservation and maximal internal symmetry; these are shown below.

In [8], Boccara and Fukś established necessary and sufficient conditions that a CA rule table must meet in order to be conservative (which is another way to say number conserving). These conditions can be translated into an algorithm `BFConservationTemplate` that finds a set of equations that, when solved by *Mathematica*, yields the equivalent of a template that represents all conservative CAs of a determined space.

By running this function for the elementary space, the following template is obtained.

When expanded, the latter yields the following representations.

However, it is clear that not all -ary representations above are valid, since some of them rely on state values outside the range , namely, the states 2 and . Hence, by discarding those three, we get the complete set of five number-conserving rules of the elementary space.

It is important to notice that this kind of strategy can only be employed on properties that derive directly from the CA rule table.

As the internal symmetry of a CA is also a property that derives directly from its rule table, it is a valid candidate to be generalized into a template. By listing a CA rule table along with its respective transformations, it is possible to establish equality relations between them that, when solved by *Mathematica*, yield a template that represents all CAs that have the maximal possible value of internal symmetry, according to any subset of the three transformations.

By establishing that all of the results of the rule tables have to be the same in both the CA and its transformed counterpart, the following function `MaxSymmTemplate` achieves the goal of finding a template that represents all CAs of a given space that present the *maximum* value of internal symmetry, according to a list of transformations received as arguments.

In order to find a template that represents all elementary CAs with maximum symmetry according to the black-white transformation, it suffices to run `MaxSymmTemplate`, then expand the template to generate the rule numbers.

The verification of this result can be achieved by guaranteeing that all these rule numbers yield the same rule tables when transformed.

We can analogously obtain a template representing all CAs with maximum symmetry according to all transformations, from which their expansions also lead to the corresponding rule numbers.

And again, their validity can be checked.

Both the `BFConservationTemplate` and the `MaxSymmTemplate` functions can take another template as an optional argument, which is meant to be used as the starting point of the algorithms. This is the current way to compose the intersection of templates that share a common structure. For instance, in order to generate all the elementary conservative CAs with maximum internal symmetry values according to the black-white transformation, it becomes straightforward to use the template for number-conserving rules of the elementary space as the starting point of `MaxSymmTemplate`. This leads to a template that, once again, can be expanded so as to yield the target rule numbers.

Alternatively, the template with maximal internal symmetry could be used as the starting point of the `BFConservationTemplate` algorithm to obtain the same result.

The concept of CA templates was introduced, a rule table enhancement capable of representing a subset of a CA rule space, where the rules in the set can share a common property. Although the examples used for illustration only referred to one-dimensional, binary rules (the elementary space), the idea seems readily applicable to larger CAs with a larger number of states and more dimensions.

We have shown some of the operations applicable to CA templates, as well as some cases of use, in the form of *Mathematica* functions that yield templates representing subsets of the elementary space of CAs with properties related to number conservation and maximum internal symmetry. With respect to the latter, templates can be derived for any subset of the three symmetry-related transformations.

Templates for the rules in the same dynamical class in the elementary space have appeared previously in the CA literature, such as in [10]. But in these cases, the notion was not at all couched in the conceptual framework we have put forward, which allows templates to be effectively defined for rules having maximal internal symmetry value, let alone the possibility of representing further CA properties.

The properties used as examples here can be couched in terms of well-established relations among the state transitions of the CA, which are a necessary condition for a property to be addressed in the form of templates. As a counterpoint, the notion of reversibility of one-dimensional rules does not seem to be, at least in principle, amenable to template representation, since it is currently not known how to characterize reversibility in terms of the rule table of a CA.

It stands as future work to find new algorithms that would allow template representations of other properties, as well as the enhancement of the current algorithm related to internal symmetry templates, so as to extend the current constraint of only generating maximal internal symmetry toward also allowing the generation of templates with specific values of internal symmetry, not necessarily maximal.

Currently, because of computational demands, template expansion does not scale up well to very big templates; this should also be addressed in a follow-up. In particular, it might be worth defining operations of union and intersection of templates, which might be used to preprocess a template before the operation of template expansion.

Pedro de Oliveira thanks FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo). Maurício Verardo is thankful for a fellowship provided by CAPES, the Brazilian agency of its Ministry of Education.

[1] | S. Wolfram, A New Kind of Science, Champaign, IL: Wolfram Media Inc., 2002. |

[2] | P. P. B. de Oliveira, “Conceptual Connections around Density Determination in Cellular Automata,” Cellular Automata and Discrete Complex Systems (Lecture Notes in Computer Science), 8155, 2013 pp. 1-14. doi:10.1007/978-3-642-40867-0_ 1. |

[3] | P. P. B. de Oliveira, “On Density Determination with Cellular Automata: Results, Constructions and Directions,” Journal of Cellular Automata, forthcoming. |

[4] | H. Betel, P. P. B. de Oliveira, and P. Flocchini, “Solving the Parity Problem in One-Dimensional Cellular Automata,” Natural Computing, 12(3), 2013 pp. 323-337. doi:10.1007/s11047-013-9374-9. |

[5] | D. Wolz and P. P. B. de Oliveira, “Very Effective Evolutionary Techniques for Searching Cellular Automata Rule Spaces,” Journal of Cellular Automata, 3(4), 2008 pp. 289-312. |

[6] | H. Fukś, “A Class of Cellular Automata Equivalent to Deterministic Particle Systems,” in Hydrodynamic Limits and Related Topics, (S. Feng, A. T. Lawniczak, and S. R. S. Varadhan, eds.), Providence, RI: American Mathematical Society, 2000 pp. 57-69. |

[7] | J. Kari and B. Le Gloannec, “Modified Traffic Cellular Automaton for the Density Classification Task,” Fundamenta Informaticae, 116 (1-4), 2012 pp. 141-156. doi:10.3233/FI-2012-675. |

[8] | N. Boccara and H. Fukś, “Number-Conserving Cellular Automaton Rules,” Fundamenta Informaticae, 52(1-3), 2002 pp. 1-13. |

[9] | A. Schranko and P. P. B. de Oliveira, “Towards the Definition of Conservation Degree for One-Dimensional Cellular Automata Rules,” Journal of Cellular Automata, 5(4-5), 2010 pp. 383-401. |

[10] | W. Li and N. Packard, “The Structure of the Elementary Cellular Automata Rule Space,” Complex Systems, 4(3), 1990 pp. 281-297. www.complex-systems.com/pdf/04-3-3.pdf. |

P. P. B. de Oliveira and M. Verardo, “Representing Families of Cellular Automata Rules,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-8. |

Pedro de Oliveira has been a faculty member since 2001 of the School of Computing and Informatics and of the Postgraduate Program in Electrical Engineering at Mackenzie Presbyterian University, São Paulo, Brazil. His research interests are cellular automata, evolutionary computation, and cellular multi-agent systems. Pedro is an alumnus of the 2003 NKS Summer School.

Maurício Verardo is a post-graduate student in Electrical Engineering at Mackenzie Presbyterian University, working with cellular automata ever since his undergraduate senior project for his computer science degree from Mackenzie. Maurício is an alumnus of the 2011 NKS Summer School.

**Pedro P. B. de Oliveira**

*Faculdade de Computação e Informática & Pós-Graduação em Engenharia Elétrica
Universidade Presbiteriana Mackenzie
Rua da Consolação, 930
São Paulo, 01302-907 – Brazil
*

**Maurício Verardo**

*Pós-Graduação em Engenharia Elétrica*

Universidade Presbiteriana Mackenzie

Rua da Consolação, 930

São Paulo, 01302-907 – Brazil

*mauricio.verardo@gmail.com*

Among its many interpretations, the term reliability most commonly refers to the ability of a device or system to perform a task successfully when required. More formally, it is described as the probability of functioning properly at a given time and under specified operating conditions [1]. Mathematically, the reliability function is defined by

where is a nonnegative random variable representing the device or system lifetime.

For a system composed of at least two components, the system reliability is determined by the reliability of the individual components and the relationships among them. These relationships can be depicted using a reliability block diagram (RBD).

Simple systems are usually represented by RBDs with components in either a series or parallel configuration. In a series system, all components must function satisfactorily in order for the system to operate. For a parallel system to operate, at least one component must function correctly. Systems can also contain components arranged in both series and parallel configurations. If an RBD cannot be reduced to a series, parallel, or series-parallel configuration, then it is considered a complex system.

This article deals with the generation of an exact analytical expression for the reliability of a complex system. The demonstrated method relies on finding all paths between the source and target vertices in a directed acyclic graph (i.e., RBD), as well as the inclusion-exclusion principle for probability.

**A Note on Timings**

The timings reported in this article were measured on a custom workstation PC using the built-in function `Timing`. The system consists of an Intel® Core i7 CPU 950 @ 4 GHz and 24 GB of DDR3 memory. It runs Microsoft® Windows 7 Professional (64-bit) and scores 1.32 on the *MathematicaMark9* benchmark.

We begin by considering a directed graph that consists of a finite set of vertices together with a finite set of ordered pairs of vertices called directed edges. The built-in function `Graph` can be used to construct a graph from explicit lists of vertices and edges.

This two-dimensional grid graph, labeled , can be constructed much more efficiently by using the built-in function `GridGraph`. Throughout this section, we utilize it to illustrate our functions.

Now, for a vertex , we define the set of out-neighbors as

where is taken to mean a directed edge from to . This is implemented in the function `VertexOutNeighbors`.

`VertexOutNeighbors` behaves similarly to the built-in function `VertexOutDegree`. That is, given a graph and a vertex , the function returns a list of out-neighbors for the specified vertex.

If, however, only the graph is specified, the function will give a list of vertex out-neighbors for all vertices in the graph.

The order in which the out-neighbors are displayed is determined by the order of vertices returned by `VertexList`.

We can implement similar functions to obtain the set of in-neighbors by simply changing to .

The next step toward our goal is to consider a method of traversing a graph. One common approach of systematically visiting all vertices of a graph is known as depth-first search (DFS). In its most basic form, a DFS algorithm involves visiting a vertex, marking it as “visited,” and then recursively visiting all of its neighbors [2]. The function `DepthFirstSearch` implements this algorithm for directed graphs.

Given a graph and a starting vertex , `DepthFirstSearch` returns a list of vertices in the order in which they are visited.

We compare this with the result of the built-in function `DepthFirstScan`.

Next, let us define the function `DirectedAcyclicGraphQ`.

If the graph is both directed and acyclic, `DirectedAcyclicGraphQ` yields `True`. Otherwise, it yields `False`.

Finally, we consider the problem of finding all paths in a directed acyclic graph between two arbitrary vertices . Typically, we refer to as the source and as the target. A path in is defined as a sequence of vertices such that for . Since we have constrained ourselves to a directed acyclic graph, all paths are simple. That is to say, all vertices in a path are distinct.

By modifying the depth-first search algorithm, we arrive at a solution.

Like the original DFS algorithm, we visit a vertex and then recursively visit all of its neighbors. However, instead of checking if a vertex has been marked “visited,” we compare the current vertex to the target. If they do not match, we continue to traverse the graph. Otherwise, the target has been reached and we store the path for later output.

For a given directed acyclic graph , a source vertex , and a target vertex , `FindPaths` returns a list of all paths connecting to .

In this particular instance, the function takes approximately 0.85 milliseconds to return the result.

`FindPaths` works for any pair of vertices.

If no path is found, the function returns an empty list.

Up to this point, we have been working with graphs in an abstract, mathematical sense. We now make the transition from directed acyclic graph to reliability block diagram by associating vertices with components in a system and edges with relationships among them.

Consider a single component in an RBD. Let us imagine a “flow” moving from a source, through the component, to a target. The component is deemed to be functioning if the flow can pass through it unimpeded. However, if the component has failed, the flow is prevented from reaching the target.

The “flow” concept can be extended to an entire system. A system is considered to be functioning if there exists a set of functioning components that permits the flow to move from source to target. We define a path in an RBD as a set of functioning components that guarantees a functioning system. Since we have chosen to use a directed acyclic graph to represent a system’s RBD, all paths are minimal. That is to say, all components in a path are distinct.

Once the minimal paths of a system’s RBD have been obtained, the principle of inclusion-exclusion for probability can be employed to generate an exact analytical expression for reliability. Let be the set of all minimal paths of a system. At least one minimal path must function in order for the system to function. We can write the reliability of the system as the probability of the union of all minimal paths:

This is implemented in the function `SystemReliability`.

Given a system’s RBD (represented by a directed acyclic graph ), a source vertex , and a target vertex , `SystemReliability` returns an exact analytical expression for the reliability.

Consider the RBD of a simple system with four components in a series configuration.

The reliability of the system is given in terms of the reliability of its four components.

Consider the RBD of a simple system with four components in a parallel configuration.

The “start” and “end” components are not part of the actual system. They are added to ensure the RBD meets the criteria for a directed acyclic graph.

Furthermore, these nonphysical components are taken to have perfect reliability, that is, . Since they have no effect on the system’s reliability, they can be safely removed from the resulting analytical expression. To do so, we simply define a list of replacement rules and apply it to the result of `SystemReliability`.

The reliability of the system is given in terms of the reliability of its four components.

Next, we examine the RBDs of two simple systems with components in a series-parallel configuration.

Component is in series with component , and both components are in parallel with component .

As in previous examples, we use `SystemReliability` to obtain an exact analytical expression for the reliability.

Finally, we examine the RBDs of two complex systems.

The reliability of the system is given in terms of the reliability of its six components.

The result is returned after approximately 0.59 milliseconds.

The reliability of the system is given in terms of the reliability of its fourteen components.

The result is returned after approximately 0.33 seconds.

We now turn our attention to the derivation of a time-dependent expression for the reliability of a complex system based on information contained within its reliability block diagram.

Let us imagine that we have a generic system composed of six subsystems and we know the reliability relationships among them. In addition, the underlying statistical distributions and parameters used to model the subsystems’ reliabilities are known.

We begin by creating the system’s RBD.

In defining the RBD, we have made use of the `Property` function to store information associated with each subsystem. For instance, the custom property `"Distribution"` is used to store a parametric statistical distribution. Labels, images, and other properties can also be specified.

Next, we use `SystemReliability` to generate an exact analytical expression for the reliability.

Now, the reliability function of the subsystem is given by

where is the corresponding cumulative distribution function (CDF). For each subsystem, we use `PropertyValue` to extract the symbolic distribution stored in the RBD, and then use the built-in function `CDF` to construct its reliability function.

We extract additional information, for example, subsystem labels, from the RBD and combine it with the reliability functions to create plots for comparison.

In order to transform our static analytical expression into a time-dependent function, we first define a list of replacement rules.

Next, we apply the list of rules to the expression for system reliability.

The result is a time-dependent reliability function for the complex system described by the RBD.

Finally, we generate a plot of the system’s reliability over time.

We have demonstrated a method of generating an exact analytical expression for the reliability of a complex system using a directed acyclic graph to represent the system’s reliability block diagram. In addition, we have shown how to convert an analytical expression for system reliability into a time-dependent function based on statistical information stored in an RBD. While our focus has been on the analysis of complex systems, we have also shown that the combination of path finding and the inclusion-exclusion principle is equally applicable to simple systems in series, parallel, or series-parallel configurations.

Knowing the static analytical expression or time-dependent solution of a system allows us to perform a more advanced reliability analysis. For instance, we can easily calculate the Birnbaum importance

of the component using the result of `SystemReliability`. Similarly, we can derive the hazard function, or failure rate, from the system’s time-dependent reliability function.

There are several ways in which the functionality demonstrated in this article can be improved and expanded:

- Increase the efficiency of
`SystemReliability`by implementing improvements to the classical inclusion-exclusion principle [3]. - Add functions related to common tasks in reliability analysis, for example, reliability importance, failure rate, and so on.
- Add support for -out-of- structures, that is, redundancy.
- Add the ability to export and import complete RBDs.
- Add a mechanism, for example, a graphical user interface (GUI), to facilitate the construction and modification of RBDs.

Finally, the code can be combined into a user-friendly package with full documentation.

[1] | W. Kuo and M. Zuo, Optimal Reliability Modeling: Principles and Applications, Hoboken, NJ: John Wiley & Sons, 2003. |

[2] | S. Skiena, The Algorithm Design Manual, 2nd ed., London, UK: Springer-Verlag, 2008. |

[3] | K. Dohmen, “Improved Inclusion-Exclusion Identities and Inequalities Based on a Particular Class of Abstract Tubes,” Electronic Journal of Probability, 4, 1999 pp. 1-12. doi:10.1214/EJP.v4-42. |

T. Silvestri, “Complex System Reliability,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-7. |

Todd Silvestri received his undergraduate degrees in physics and mathematics from the University of Chicago in 2001. As a graduate student, he worked briefly at the Thomas Jefferson National Accelerator Facility (TJNAF) where he helped to construct and test a neutron detector used in experiments to measure the neutron electric form factor at high momentum transfer. From 2006 to 2011, he worked as a physicist at the US Army Armament Research, Development and Engineering Center (ARDEC). During his time there, he cofounded and served as principal investigator of a small laboratory focused on improving the reliability of military systems. He is currently working on several personal projects.

**Todd Silvestri**

*New Jersey, United States*

*todd.silvestri@optimum.net*

The aim of canonical correlation analysis is to find the best linear combination between two multivariate datasets that maximizes the correlation coefficient between them. This is particularly useful to determine the relationship between *criterion measures* and the set of their *explanatory factors*. This technique involves, first, the reduction of the dimensions of the two multivariate datasets by projection, and second, the calculation of the relationship (measured by the correlation coefficient) between the two projections of the datasets.

While the correlation coefficient measures the relationship between two simple variables, canonical correlation analysis measures the relationship between two *sets* of variables. Although the correlation measure employed for both techniques is the same, namely

(1) |

the distinction between the two techniques must be clear: while for the correlation coefficient and must be -dimensional vectors containing realizations of the random variables, for canonical correlation analysis (CCA) has to be an and an matrix, with and at least 2. In the latter case, is the number of realizations for all random variables, where is the number of random variables contained in the set and is the number of random variables in the set .

This article calculates, through CCA, the relationship between stock markets of developed and developing countries and performs Bartlett’s test for the statistical significance of the canonical correlation found.

For an introduction to statistics in financial markets, see [1].

The data employed for the CCA in the present work was obtained directly from *Mathematica*’s function. The variables are divided into two groups: the ETFs representing developed nations and the ETFs representing developing countries. The first group is treated as independent variables and the second group as dependent variables. The idea here is to analyze the relationship between stock markets in these two groups of countries through ETFs traded at the New York Stock Exchange (NYSE).

Although there are several country-specific ETFs traded on the NYSE, not all of them were chosen. The idea is to select, for each group, those ETFs representing countries with large stock markets according to a market capitalization criterion. The market capitalization of all stock markets was obtained from the website of the World Federation of Exchanges (www.world-exchanges.org/statistics). All countries with stock markets greater than 500 billion US dollars in December 2012 were chosen, and only one ETF per country was selected.

These six ETFs were included in the group of developed nations: EWA (Australia), EWC (Canada), EWG (Germany), EWJ (Japan), EWU (UK), and SPY (USA).

Eight ETFs were included in the group of developing countries: EWZ (Brazil), FXI (China), EPI (India), EWW (Mexico), RSX (Russia), EWS (Singapore), EWY (South Korea), and EWT (Taiwan).

These are the monthly returns for the five-year period between March 2008 and February 2013 (60 months).

This checks the number of observations for each variable. Evaluate the previous command again if the lengths are not all 60.

This plots the data for all the variables.

This plots the price behavior of the six ETFs representing developed countries for the 60-month period.

This plots the price behavior of the eight ETFs representing developing countries for the 60-month period.

According to [2], “to use canonical correlation analysis safely for descriptive purposes requires no distributional assumptions.” However, they still state that “to test the significance of the relationships between canonical variates, (…), the data should meet the requirements of multivariate normality and homogeneity of variance” ([2], p. 339). Is the data normally distributed in this sense?

As can be seen, the null hypothesis of normality cannot be rejected for all variables at the 5% confidence level.

In order to perform the canonical correlation analysis, it is necessary to organize the data into two groups of variables: (representing the developed countries) and (representing the developing countries);

where to represent the developed countries’ ETFs and to represent the developing countries’ ETFs.

In canonical correlation analysis, and , and the problem is to find the “most interesting” linear combinations

for the two sets of variables, that is, those values that maximize

(2) |

Let be the concatenation of the matrices and ,

so

where and are the (empirical) variance-covariance matrices and and are the mean vectors of and , respectively. represents the covariance matrix of and , and is its transpose.

From equation (1) and from the properties

(3) |

(4) |

where and are conformable and is a constant,

(5) |

where

CCA can be performed either on variance-covariance matrices or on correlation matrices. If the random variables and are standardized to have unit variance, the variance-covariance matrix becomes a correlation matrix.

After partitioning the variance-covariance matrix, and given equation (5), the main objective is to solve

(6) |

subject to

To solve this problem, define:

(7) |

A singular value decomposition of gives

(8) |

where

(9) |

(10) |

(11) |

and are column orthonormal matrices , and is a diagonal matrix with positive elements, namely, the eigenvalues of . (For detailed information about singular value decomposition, see [3].) From the property

and from equation (7),

For this solution procedure, the largest eigenvalue of is the canonical correlation of our analysis. and can also be found through

(12) |

(13) |

The problem in this case is to solve the following canonical equations [2, 4]:

(14) |

and

(15) |

where is the identity matrix and is the largest eigenvalue for the characteristic equations

(16) |

and

(17) |

The largest eigenvalue of the product matrices

is the squared canonical correlation coefficient. Furthermore, it can be shown that

(18) |

and

(19) |

which means that only one of the characteristic equations needs to be solved in order to find or .

This transposes the data.

This checks the dimensions of `Z`; it has 60 rows (months) and 14 columns (ETFs).

There are 14 random variables (six in the first set and eight in the second); the dimensions of the submatrices are 60×6 for , 60×8 for , 6×6 for , 6×8 for , 8×6 for , and 8×8 for .

Define `M1` to be the variance-covariance matrix of `Z`. Here are the first seven columns of `M1`.

Partition `M1` into the four submatrices , , , and .

To better understand the relationship between the random variables, here is `M2`, the correlation matrix of `Z`.

This defines `K`.

This performs the singular value decomposition on `K`.

This is the largest eigenvalue of `K`.

This checks by computing the square root of the eigenvalues of

and

according to the second solution procedure. (`Chop` replaces numbers that are close to zero by the exact integer 0.)

Performing a spectral decomposition on and and calculating the square roots of their eigenvalues is another check of the canonical correlation coefficient.

The checks agree.

The last step in this analysis is to find the canonical correlation vectors, which maximize the correlation between the canonical variates. According to equations (12) and (13), this computes the canonical correlation vectors.

The canonical correlation matrix ` B` is computed using , not , because

Given that

the canonical correlation vectors and are the columns of and .

In terms of the canonical correlation vectors, the canonical variates are

where, as before,

Given that

(20) |

only and are needed in order to find . Thus, the only canonical variates needed are and .

The interpretation of canonical correlation coefficients, canonical correlation vectors, and canonical variates is one of the most difficult tasks in the whole analysis. CCA would be better understood relating the original data matrix to the matrix computed using the canonical correlation vectors, which is simply a reduction of the data matrix through linear combinations of its elements. It should be easier to understand that the canonical correlation coefficient is merely the ordinary Bravais-Pearson correlation between the two columns of the reduced matrix.

In principle, one can say that the highest canonical correlation coefficient that was found is the maximum possible correlation between the two columns of the reduced matrix. In this case, it is usual to say that this coefficient represents the relationship between the two datasets, and , in the sense of a correlation measure. Thus, if is the matrix containing the explanatory factors of , the matrix containing the criterion measures (or criterion variables), it is possible to say that the explanatory factors would perfectly explain the criterion variables if . If , the explanatory factors have no influence on the criterion variables, and any value between 1 and 0 is merely an interpolation of these extreme cases.

In the next inputs we will compute and show (partially) the reduced data matrix. In order to demonstrate the validity of the CCA theory, we also compute the correlation for the other (not so interesting for our analysis) canonical variates. We start by defining and .

The first column of our reduced data matrix is .

The first value of , for instance, refers to the linear combination between EWA, EWC, EWG, EWJ, EWU, and SPY for March 2008, such that

We can also define .

Thus, after assigning the values to the canonical variates, , , , and , we have four vectors with the values of the linear combinations of and . Now we can simply compute the Bravais-Pearson correlation between all the canonical variables.

We also verify equation (20).

The correlation between the canonical variates can be better interpreted graphically. First we show the reduced matrix computed using the canonical correlation vectors and , whose canonical correlation coefficient is .

Now we show the reduced matrix computed using the canonical correlation vectors and , whose canonical correlation coefficient is .

Finally, we compute the *canonical loadings*, that is, the correlation between every single ETF and its respective canonical variate.

We can also compute the *canonical cross-loadings*, that is, the correlation between every single ETF and its opposite canonical variate.

It might be of interest to compute the canonical loadings for the *second canonical variate*, that is, the linear combination of variables with correlation coefficient .

Finally, we compute the canonical cross-loadings for the second canonical variate, that is, the linear combination of variables with correlation coefficient .

It is possible to compute canonical loadings and cross-loadings for all the six canonical variates. However, only the first two are shown here for descriptive purposes.

In this section we test the hypothesis of no correlation between the two sets and . An approximation for large was provided in [5]:

(21) |

where

We can also test the hypothesis that the individual canonical correlation coefficients are different from zero:

(22) |

where is a parameter to select the canonical correlation coefficient to be tested.

This defines the Bartlett variable.

This assigns values to the .

We calculate Bartlett’s statistic (equation (21)) to test if the two sets of variables and are uncorrelated. Our hypotheses are:

This computes the 99% quantile of the chi-square distribution with 48 () degrees of freedom, .

**Test Conclusion**: The hypothesis of no correlation between the two sets has to be rejected once the Bartlett statistic (here 249.415) is greater than the 99% quantile of the chi-square distribution with 48 degrees of freedom (here 73.6826).

This article analyzed the relationship between two sets of variables, namely financial assets represented by NYSE-traded country-specific ETFs. The ETFs were divided into two sets representing developed and developing countries. In the first set a total of six ETFs (representing developed countries) were included, while in the second set a total of eight ETFs were included (representing developing countries). Using monthly return data for a five-year period it was possible to show, through canonical correlation analysis (CCA), that there is a significant relationship between these two sets of ETFs. The highest correlation coefficient found in the present study was and, in an analogous manner to statistics in regression analysis, we could interpret its squared value as the explanatory power of the canonical correlation analysis. In other words, the squared canonical correlation coefficient indicates the proportion of variance a dependent variable linearly shares with the independent variable generated from the observed variable’s set (i.e., the canonical variates).

[1] | J. Franke, W. Härdle, and C. Hafner, Einführung in die Statistik der Finanzmärkte, Berlin: Springer Verlag, 2001. |

[2] | W. R. Dillon and M. Goldstein, Chap. 9 in Multivariate Analysis: Methods and Applications, New York: Wiley, 1984. |

[3] | K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis, London: Academic Press, 1979. |

[4] | T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 3rd ed., New York: Wiley, 2003. |

[5] | M. S. Bartlett, “A Note on Tests of Significance in Multivariate Analysis,” Proceedings of the Cambridge Philosophical Society, 35(2), pp. 180-185, 1939. doi:10.1017/S0305004100020880. |

R. L. Malacarne, “Canonical Correlation Analysis,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-6. |

Rodrigo Loureiro Malacarne is a professor of financial mathematics and financial management at the Faculdades Integradas Espirito Santenses (FAESA). His areas of research include statistics of financial markets and financial time series analysis.

**Rodrigo Loureiro Malacarne
**

Faculdades Integradas Espirito Santenses (FAESA)

Av. Vitória, 2.220 – Monte Belo

Vitória, ES, Brazil – CEP 29.053-360

malacarne@gmail.com

Motivated by the computational advantages offered by *Mathematica,* I decided some time ago to embark on collecting and implementing properties of the fascinating geometric figure called the arbelos. I have since been impressed by the large number of surprising discoveries and computational challenges that have sprung out of the growing literature concerning this remarkable object. I recall its resemblance to the lower part of the iconic canopied penny-farthing bicycle of the 1960s TV series *The Prisoner*, Punch’s jester cap (of *Punch and Judy* fame), and a yin-yang symbol with one arc inverted; see Figure 1. There is now an online specialized catalog of Archimedean circles (circles contained in the arbelos) [1] and important applications outside the realm of mathematics and computer science [2] of arbelos-related properties.

Many famous names are involved in this fascinating theme, among them Archimedes (killed by a Roman soldier in 212 BC), Pappus (320 AD), Christian O. Mohr (1835-1918), Victor Thébault (1882-1960), Leon Bankoff (1908-1997), and Martin Gardner (1914-2010). Recently, they have been succeeded by Clayton Dodge, Peter Y. Woo, Thomas Schoch, Hiroshi Okumura, and Masayuki Watanabe, among others.

Leon Bankoff was the person who stimulated the extraordinary attention on the arbelos over the last 30 years. Schoch drew Bankoff’s attention to the arbelos in 1979 by discovering several new Archimedean circles. He sent a 20-page handwritten note to Martin Gardner, who forwarded it to Bankoff, who then gave a 10-chapter manuscript copy to Dodge in 1996. Due to Bankoff’s death, a planned joint work was interrupted until Dodge reported some discoveries [3]. In 1999 Dodge said that it would take him five to ten years to sort all the material in his possession, then filling three suitcases. Currently this work is still forthcoming. Not surprisingly, like Volume 4 of *The Art of Computer Programming*, it appears that important work needs a substantial time to be developed.

**Figure 1.** *The Prisoner’*s penny-farthing bicycle, Punch and Judy, a physical arbelos.

The arbelos (“shoemaker’s knife” in Greek) is named for its resemblance to the blade of a knife used by cobblers (Figure 1). The arbelos is a plane region bounded by three semicircles sharing a common baseline (Figure 2). Archimedes appears to have been the first to study its mathematical properties, which he included in propositions 4 through 8 of his *Liber assumptorum* (or* Book of Lemmas*). This work might not be entirely by Archimedes, as was recently revealed through an Arabic translation of the *Book of Lemmas* that mentions Archimedes repeatedly without fully recognizing his authorship (some even believe this work to be spurious [4]). The *Book of Lemmas* also contained Archimedes’s famous *Problema Bovinum* [5].

This article aims at systematically enumerating selected properties of the arbelos, without attempting to be exhaustive. Our purpose is to develop a uniform computational methodology in order to tackle those properties in a pedagogical setting. A sequence of properties is arranged and subsequently verified by testing the computationally equivalent predicates. This work includes some discoveries and extensions contributed by the author.

We refer to the largest semicircle as the *top arc* and the two small ones as the left and right *side* *arcs,* or just the *side* *arcs* when there is no need to distinguish them. We use and to denote their respective radii (the top arc thus has radius ). A *segment* between two points is an undirected line segment going from one point to the other, while a *line* through two points is the infinite straight line through the two points. A traditional abuse of notation uses for both the line segment joining the points and and the length of the segment, depending on the context; modern usage is to write for the length of the segment.

This function displays the arbelos.

This draws the basic arbelos.

**Figure 2.** The arbelos.

**Property 1**

In other words, the total length of the side arcs equals the length of the top arc. This property is related to an intriguing paradox [6].

**Property 2**

This was lemma 4 of the *Book of Lemmas *(see Figure 3) [7, 8].

These two properties are easily verified by simultaneously testing two equalities.

The function `drawpoints` is used to display specific points as red disks.

**Figure 3.** The area of the circle of diameter (the radical circle) is equal to the area of the arbelos.

The circle in Figure 3 is called the *radical circle* of the arbelos and the line is its *radical axis* (this terminology will be clarified in Generalizations). To illustrate properties 3-11 and 25, 26, we draw and label points and show some coordinates, lines, and circles in Figure 4.

**Figure 4.** Labels, coordinates, lines, and circles referred to in properties 3 through 11 and 25, 26.

**Property 3**

The lines and are orthogonal and intersect the side arcs at points and , joining a common tangent to the side arcs.

To verify the orthogonality of the lines and , we take the inner product of the vectors and .

We employ the following result to obtain the slopes at the points and .

**Theorem 1**

The function `PQ` finds the coordinates of the tangent points and by solving a system of four equations, which places them on the arcs and sets their tangent slopes according to theorem 1.

Besides `PQ`, other definitions in this article for points and quantities are: `VWS`, `HK`, `U`, `EF`, `IJr`, and `LM`.

The function `dSq` computes the square of the distance between two given points.

**Property 4**

As is a diameter of the radical circle, we only need to verify the equality of the distances of and to the center of the radical circle, namely the point .

**Property 5**

Let the line intersect the top arc at points and . Then and lie on a circle with center and radius .

We get the coordinates of the points and by solving a system of equations that places them on the top arc and on the line .

This verifies property 5 by checking that the distances of and to are the same as the distance from to .

**Property 6**

This is equivalent to the fact that the determinant (cross product) of the vectors and is zero.

**Property 7**

This is equivalent to the fact that the inner product of the vectors and is zero.

Let us use the notation for a circle with center and radius .

**Property 8**

The inversion of a point in the circle , is defined to be the unique point such that [9]. The function `inversion` implements this idea.

This verifies property 8, recalling the coordinates of are .

**Property 9**

Let be the circle of inversion. The points , , invert to themselves. The segment inverts to the arc and the segment inverts to the arc . The arcs and invert to themselves. The radical circle inverts to the line .

**Property 10**

This is the same as claiming that the corresponding arcs are orthogonal to the radical circle. By property 8, the arcs are orthogonal to the circle with diameter as they pass through inverse pairs [10, 11].

**Property 11**

This is one of Bankoff’s surprises [12, 13, 14]. As all four points are on the radical circle, we need to verify only that bisects .

The following `Manipulate` illustrates properties 3-11. The easiest way to define the points `P`, `Q`, `H`, `K` is to copy and paste the formulas for them.

Now consider the circle tangent to the side arcs and the top arc, the *incircle* with tangent points , , and as shown in Figure 5 [15, 16]. We also consider points and at the tops of the side arcs.

**Figure 5.** The incircle and coordinates, lines, and points referred to in properties 12 through 15.

Proposition 6 of the *Book of Lemmas* included the value of , the radius of the incircle. The function `U` calculates the coordinates of the center and the radius .

The coordinates of the tangent points , , and are obtained as the intersections of the lines joining the centers of the three arcs of the arbelos and the incircle.

**Property 12**

The points , , and are collinear. The points , , and are collinear. The lines and intersect in a point lying on the incircle.

Using the criterion of the determinant to check for collinearity, we verify the first two claims.

Let be the point of intersection of the lines and . Confirming that its distance to is equal to verifies the third claim.

**Property 13**

The points , , , and are on a circle with center . Similarly, the points , , , and are on a circle with center .

The following `Manipulate` illustrates property 13 [17]. The option for showing the Bankoff circle as the incircle of the triangle joining the center of the arcs and the incircle corresponds to property 23.

**Property 14**

Let be the diameter of the incircle parallel to and let be the projection of onto . The rectangle between the segments and is a square.

This property is illustrated in the next `Manipulate` and is readily verified here.

**Property 15**

Let and be the intersections of the lines and with the side arcs. Then is a square of almost the same size as the one mentioned in property 14.

First we obtain points and as the intersections of their respective lines and their respective arcs, and keep the result in the variable `replaceEF`.

We verify property 15 by setting to be equal to the vector obtained by rotating around by 90° and setting to be equal to the vector obtained by translating by .

Assuming and, the following plot compares the sizes of the two squares.

This `Manipulate` illustrates properties 14 and 15.

Consider the two gray circles tangent to the radical axis, a side arc, and the top arc in Figure 6. They are called *the twins*, or the *Archimedean circles*. Due to the following remarkable property, they have been extensively studied. We collect many of their extraordinary occurrences in our list of properties [3, 18, 19].

**Figure 6.** The twins.

**Property 16**

The two circles tangent to the radical axis, the top arc, and one of the side arcs of an arbelos have the same radius.

This property appeared as proposition 5 in the *Book of Lemmas*. Solving the following system of six equations finds the values of the radii, verifies they are equal, and computes the centers , .

These four solutions give the centers in pairs: , , , , where and are the reflections of and in the diameter of the arbelos; only the last expression is valid. The result also shows that the twins are indeed of the same radius . Any circle with radius equal to the twins’ radius is called *Archimedean*. A nice interpretation of arises when considering and as resistances: then is the resistance resulting from connecting and in parallel; that is, . The function `IJr` computes the value of the centers and the common value of the radius of the twins.

**Property 17**

Consider a circle tangent to both twins, with center at point and radius . Then there are two possible values of .

To find the extrema of , we set the derivative of each of the above expressions to zero and solve for .

So the centers of the smallest and largest circles tangent to the twins lie on the radical axis. Moreover, they are concentric, as this result confirms.

Thus, by using property 2, we confirm that the largest tangent circle, which is the smallest enclosing the twins, satisfies property 17. The following `Manipulate` shows the circles tangent to the twins as you vary the radius of the left side arc.

The following plot compares the radii of the two circles tangent to the twins with centers on the radical axis.

**Figure 7.** Labels and lines referred to in properties 18 through 24.

**Property 18**

The common tangent of the left arc and its tangent twin at passes through . Similarly, the common tangent of the right arc and its tangent twin at passes through (see Figure 7).

This computes the tangent points and .

By using theorem 1, we verify both claims.

**Property 19**

We verify both claims simultaneously.

However, the points , , and are not on a circle centered at , nor are the points , , and on a circle centered at ; otherwise, the following expression would be zero.

**Property 20**

As the length of the segment is the ordinate of and the length of the segment is the ordinate of , we only need to verify that the midpoints of those segments lie on the mentioned lines by checking slopes.

**Property 21**

Those circles are the fourth and fifth Archimedean circles discovered by Bankoff [20]. In order to verify this property, we use the following result [21]:

**Theorem 2**

This directed distance is positive if the triangle is traversed counterclockwise and negative otherwise. The function `dAB` implements this.

Let and be the center and radius of the blue circle on the left side of point in Figure 7. Solving the following system finds the value of .

Similarly, this calculates the radius of the blue circle to the right of , which equals .

Thus, both circles are Archimedean as claimed. The following `Manipulate` shows the twins and these two other circles.

**Property 22**

Archimedes discovered the original twins; Bankoff improved on this by discovering this third circle in 1950 [22]. The coordinates of the center of the Bankoff circle are obtained by equating the distances of to the points , , and .

**Property 23**

The Bankoff circle is the incircle of the triangle formed by joining the centers of the side arcs and the center of the incircle of the arbelos.

Using theorem 2 to compute the distance of to the sides of the triangle, we verify this property (as `dAB` computes a directed distance, the order of the arguments describing the line is important).

**Property 24**

This computes the values of and .

The circle is the one where the ordinate of is positive. Note that is not on the radical axis.

**Property 25**

The circles and tangent to the radical axis, one passing through and the other passing through the point , are both Archimedean (see Figure 4).

**Property 26**

A circle with center and radius tangent to the line is such that the distance from to is

, so this equation holds:

Because the circle passes through ,

Because the circle is tangent to the top arc,

This input uses explicit expressions for , , and that satisfy these three equations.

**Property 27**

Consider the two (red) segments connecting the center of the top arc to the top points and of the left and right arcs of the arbelos. These segments have the same length and are orthogonal. The tangent circles and at and to those lines and the top arc are Archimedean (see Figure 8).

This property was discovered in the summer of 1998 [23].

**Figure 8.** The two pairs of Archimedean circles from property 27.

We have seen that there are some Archimedean circles other than the twins, namely the Bankoff circle and those mentioned in properties 21 through 27. There are also *non-Archimedean twins*, that is, pairs of circles of the same radius, different than that of the twins, appearing at significant places within the arbelos.

The discovery of the *slanted twins *arose from the initial assumption that, besides being tangent to either side arc and the top arc, the two circles-to-be-twins could be tangent to themselves and not necessarily to the radical axis. Clearly there are an infinite number of solutions if we do not require these circles to be of equal radius. The idea was that if we started by assuming they are of equal radius, we might end up discovering they are tangent to the radical axis. This turned out not to be the case. Let us consider circles with centers at the points and with common radius . The value of can be obtained by solving a system of five equations.

These expressions involve square roots differing in sign. The ones using the plus sign diverge at and are rejected.

The other one converges.

We conclude that the slanted twins are indeed congruent and that their common radius is

The following comparison between the radii of the twins and the slanted twins shows that their difference turns out to be very small.

This gives the coordinates of the centers of the slanted twins.

The following `Manipulate` shows the slanted twins and, optionally, the twins, as you vary .

In this section we generalize the shape of an arbelos by allowing the arcs to cross and by considering a 3D version. To set the context of the first of those generalizations, we need the concept of the *radical axis of two circles*.

Let be a point and be the circle . The *power* of with respect to is defined to be the real number . The power of is positive, zero, or negative depending on whether lies outside, on, or inside [12]. Let ; if the points of satisfy the equation , then an alternative way to define the power of is to evaluate . (A similar result applies if , when the circle degenerates to a line, in which case the sign of indicates whether is above, on, or below the line.)

Here is a very interesting property of the power of a point. Given a circle and a point , choose an arbitrary line through meeting the circle at points and . Then the product depends only on —it is independent of the choice of line through . This product is equal to the power of .

In the following `Manipulate`, drag the four locators to vary the size of the circle, the position of , and the slope of the line through .

Given two circles with different centers, their *radical axis* is defined to be the line consisting of all points that have equal powers with respect to each of the two circles. Proofs of the following can be found in [10].

**Theorem 3**

If two circles intersect at two points and , then their radical axis is the common secant . If two circles are tangent at , then their radical axis is their common tangent at .

**Corollary 1**

Given three circles with noncollinear centers, the three radical axes of the circles taken in pairs are distinct concurrent lines.

**Theorem 4**

The radical axis of two circles is the locus of points from which tangents drawn to both circles have the same length.

The following `Manipulate` shows two circles; one is fixed, and you can vary the center and size of the other one by dragging the locator or changing its radius with the slider. You can use the other slider to move the red point on the radical axis to illustrate theorem 4.

The following `Manipulate` illustrates two generalizations.

**Property 28**

The inscribed circles tangent to the radical axis of the side arcs and the top arc and either of the arcs of the generalized arbelos have the same radius.

Let be the length of the *gap* between the bases (so that the diameter of the top arc is ) and let be the abscissa of the intersection of the radical axis with the axis, assuming the origin is at the leftmost point of the arbelos [10].

**Theorem 5**

With the help of this theorem, we compute the value of .

We can assume without loss of generality that , , and ( can be negative). Let the inscribed circles be and . The values of these parameters are obtained as follows.

Then, although some centers can be disregarded, the radius is the same in all cases.

Finally, here are three more properties of the arbelos. See if you can guess what property is involved by experimenting with the controls [24, 25].

This first `Manipulate` lets you move the side arcs in a systematic way.

This second `Manipulate` lets you rotate a line around the point of tangency of the side arcs.

Finally, the third `Manipulate` shows an infinite family of twins.

[1] | F. van Lamoen. “Online Catalogue of Archimedean Circles.” (Jan 22, 2014) home.planet.nl/~lamoen/wiskunde/arbelos/Catalogue.htm. |

[2] | S. Garcia Diethelm. “Planar Stress Rotation” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/PlanarStressRotation. |

[3] | C. W. Dodge, T. Schoch, P. Y. Woo, and P. Yiu, “Those Ubiquitous Archimedean Circles,” Mathematical Magazine, 72(3), 1999 pp. 202-213. www.jstor.org/stable/2690883. |

[4] | H. P. Boas, “Reflection on the Arbelos,” American Mathematical Monthly, 113(3), 2006 pp. 236-249. |

[5] | H. D. Dörrie, 100 Great Problems of Elementary Mathematics: Their History and Solution (D. Antin, trans.), New York: Dover Publications, 1965. |

[6] | J. Rangel-Mondragón. “Recursive Exercises II: A Paradox” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/RecursiveExercisesIIAParadox. |

[7] | R. B. Nelsen, “Proof without Words: The Area of an Arbelos,” Mathematics Magazine, 75(2), 2002 p. 144. |

[8] | A. Gadalla. “Area of the Arbelos” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/AreaOfTheArbelos. |

[9] | J. Rangel-Mondragón, “Selected Themes in Computational Non-Euclidean Geometry. Part 1. Basic Properties of Inversive Geometry,” The Mathematica Journal, 2013. www.mathematica-journal.com/2013/07/selected-themes-in-computational-non-euclidean-geometry-part-1. |

[10] | D. Pedoe, Geometry: A Comprehensive Course, New York: Dover, 1970. |

[11] | M. Schreiber. “Orthogonal Circle Inversion” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/OrthogonalCircleInversion. |

[12] | M. G. Welch, “The Arbelos,” Master’s thesis, Department of Mathematics, University of Kansas, 1949. |

[13] | L. Bankoff, “The Marvelous Arbelos,” The Lighter Side of Mathematics (R. K. Guy and R. E. Woodrow, eds.), Washington, DC: Mathematical Association of America, 1994. |

[14] | G. L. Alexanderson, “A Conversation with Leon Bankoff,” The College Mathematics Journal, 23(2),1992 pp. 98-117. |

[15] | S. Kabai. “Tangent Circle and Arbelos” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/TangentCircleAndArbelos. |

[16] | G. Markowsky and C. Wolfram. “Theorem of the Owl’s Eyes” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/TheoremOfTheOwlsEyes. |

[17] | P. Y. Woo, “Simple Constructions of the Incircle of an Arbelos,” Forum Geometricorum, 1, 2001 pp. 133-136. forumgeom.fau.edu/FG2001volume1/FG200119.pdf. |

[18] | B. Alpert. “Archimedes’ Twin Circles in an Arbelos” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/ArchimedesTwinCirclesInAnArbelos. |

[19] | J. Rangel-Mondragón. “Twins of Arbelos and Circles of a Triangle” from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/TwinsOfArbelosAndCirclesOfATriangle. |

[20] | H. Okumura, “More on Twin Circles of the Skewed Arbelos,” Forum Geometricorum, 11, 2011 pp. 139-144. forumgeom.fau.edu/FG2011volume11/FG201114.pdf. |

[21] | E. W. Weisstein. “Point-Line Distance—2-Dimensional” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/Point-LineDistance2-Dimensional.html. |

[22] | L. Bankoff, “Are the Twin Circles of Archimedes Really Twins?,” Mathematics Magazine, 47(4), 1974 pp. 214-218. |

[23] | F. Power, “Some More Archimedean Circles in the Arbelos,” Forum Geometricorum, 5, 2005 pp. 133-134. forumgeom.fau.edu/FG2005volume5/FG200517.pdf. |

[24] | A. V. Akopyan, Geometry in Figures, CreateSpace Independent Publishing Platform, 2011. |

[25] | H. Okumura and M. Watanabe, “Characterizations of an Infinite Set of Archimedean Circles,” Forum Geometricorum, 7, 2007 pp. 121-123. forumgeom.fau.edu/FG2007volume7/FG200716.pdf. |

J. Rangel-Mondragón, “The Arbelos,” The Mathematica Journal, 2014. dx.doi.org/doi:10.3888/tmj.16-5. |

Jaime Rangel-Mondragón received M.Sc. and Ph.D. degrees in applied mathematics and computation from the University College of North Wales in Bangor, UK. He has been a visiting scholar at Wolfram Research, Inc. and has held positions in the Faculty of Informatics at UCNW, the College of Mexico, the Center for Research and Advanced Studies, the Monterrey Institute of Technology, the Queretaro Institute of Technology, and the University of Queretaro in Mexico, where he is presently a member of the Faculty of Informatics. His current research includes combinatorics, the theory of computing, computational geometry, urban traffic, and recreational mathematics.

**Jaime Rangel-Mondragón**

*UAQ, Facultad de Informatica
Queretaro, Qro. Mexico*