The metric structure on a Riemannian or pseudo-Riemannian manifold is entirely determined by its metric tensor, which has a matrix representation in any given chart. Encoded in this metric is the sectional curvature, which is often of interest to mathematical physicists, differential geometers and geometric group theorists alike. In this article, we provide a function to compute the sectional curvature for a Riemannian manifold given its metric tensor. We also define a function to obtain the Ricci tensor, a closely related object.

A *Riemannian manifold* is a differentiable manifold together with a Riemannian metric tensor that takes any point in the manifold to a positive-definite inner product function on its *tangent space*, which is a vector space representing geodesic directions from that point [1]. We can treat this tensor as a symmetric matrix with entries denoted by representing the relationship between tangent vectors at a point in the manifold, once a system of local coordinates has been chosen [2, 3]. In the case of a parameterized surface, we can use the parameters to compute the full metric tensor.

A classical parametrization of a surface is the standard parameterization of the sphere. We compute the metric tensor of the standard sphere below.

This also works for more complicated surfaces. The following is an example taken from [4].

Denoting the coordinates by , we can then define , where the are functions of the coordinates ; this definition uses Einstein notation, which will also apply wherever applicable in the following. From this surprisingly dense description of distance, we can extract many properties of a given Riemannian manifold, including *sectional curvature*, which will be given an explicit formula later. In particular, two-dimensional manifolds, also called *surfaces*, carry a value that measures at any given point how far they are from being flat. This value can be positive, negative or zero. For intuition, we give examples of each of these types of behavior.

The sphere is the prototypical example of a surface of positive curvature.

Any convex subspace of Euclidean space has zero curvature everywhere.

The monkey saddle is an example of a two-dimensional figure with negative curvature.

Sectional curvature is a locally defined value that gives the curvature of a special type of two-dimensional subspace at a point, where the two dimensions defining the surface are input as tangent vectors. Manifolds may have points that admit sections of both negative and positive curvature simultaneously, as is the case for the Schwarzchild metric discussed in the section “Applications in Physics.” An important property of sectional curvature is that on a Riemannian manifold it varies smoothly with respect to both the point in the manifold being considered and the choice of tangent vectors.

Sectional curvature is given by

where .

In this formula, represents the purely covariant Riemannian curvature tensor, a function on tangent vectors that is completely determined by the . Both and the are treated more thoroughly in the following section, as well as in [1]. Some immediate properties of the curvature formula are that is symmetric in its two entries, is undefined if the vectors and are linearly dependent, and does not change when either vector is scaled. Moreover, any two tangent vectors that define the same subspace of the tangent space give the same value. This is important because curvature should only depend on the embedded surface itself and not how it was determined.

While we are primarily concerned with Riemannian manifolds, it is worth noting that all calculations are valid for pseudo-Riemannian manifolds, in which the assumption that the metric tensor is positive-definite is dropped. This generalization is especially important in areas such as general relativity, where the metric tensors that represent spacetime have a different signature than that of traditional Riemannian manifolds. We explore this connection more in the section “Applications in Physics.”

For a differentiable manifold, an *atlas* is a collection of homeomorphisms, called *charts*, from open sets in Euclidean space to the manifold, such that overlapping charts can be made compatible by a differentiable transition map between them. Via these homeomorphisms, we can define coordinates in an open set around any point by adopting the coordinates in the corresponding Euclidean neighborhood. By convention, these coordinates are labelled , and unless important, we omit the point giving rise to the coordinates. In some cases of interest, it is possible to adopt a coordinate system that is valid over the whole manifold.

From such a coordinate system, whether local or global, we can define a basis for the tangent space using a *coordinate frame* [5]. This will be the basis consisting of the partial derivative operators in each of the coordinate directions, that is, . Considering the tangent space as a vector space, this set is sometimes referred to in mathematical physics as a *holonomic basis* for the manifold. We use this expression then to define the symmetric matrix by the following expression for :

From here, we define one more tensor of interest for the purposes of calculating curvature. Using Einstein notation, the Riemannian curvature tensor is

The various are the *Christoffel symbols*, for which code is presented in the next section. In light of these definitions, we recall sectional curvature once again from the introduction as the following, now considering the special case of the tangent vectors being chosen in coordinate directions:

.

The norm in the denominator is the norm of the tangent vector associated to that partial derivative in the holonomic basis, which is induced by the associated inner product from .

We now create functions to compute these tensors and sectional curvature itself. These values depend on a set of coordinates and a Riemannian metric tensor, so that will be the information that serves as the input for these functions. Coordinates should be a list of coordinate names like , and should be a square symmetric matrix whose size matches the length of the coordinate list. Some not inconsiderable inspiration for the first half of this code was taken from Professor Leonard Parker’s Mathematica notebook “Curvature and the Einstein Equation,” which is available online as a supplement to [6].

We can now define a function for the Christoffel symbols from the previous section. This calculation consists of taking partial derivatives of the metric tensor components and one tensor operation. In Mathematica, the dot product, typically used for vectors and matrices, is also able to take tensors and contract indices.

We can now use the formulas stated in the second section to define both the covariant and contravariant forms of the Riemannian curvature tensor.

We perform one more tensor operation using the dot product to transform our partially contravariant tensor into one that is purely covariant. Both of these will be called at various points later.

The full function to return the sectional curvatures consists of computing a scaled version of the covariant Riemannian metric tensor.

The output consists of a symmetric matrix with zero diagonal entries representing curvatures in the coordinate directions. These diagonal values should not be taken literally, as curvature is undefined given two linearly dependent directions. While this of course does not give all possible sectional curvatures, one may perform a linear transformation on the basis in order to obtain a new metric tensor with arbitrary (linearly independent) vectors as basis elements. From here, the new tensor may be used for computation.

Here is an example with diagonal entries that are functions of the last coordinate.

Any good computation in mathematics must stand to scrutiny by known cases, so we evaluate our function with the input of hyperbolic 3-space. The two in the exponent should be imagined as the squaring of the exponential function.

Checking with [7] verifies that this is indeed a global metric tensor for hyperbolic 3-space. As such, we know that it has constant sectional curvature of (recall the diagonal entries do not represent any curvature information).

Continuing with the hyperbolic space metric tensor, it is a well-known result in hyperbolic geometry that one is able to scale these first two dimensions to vary the curvature and produce a *pinched curvature* manifold.

If we allow for new constant coefficients in the exponents for positive real numbers and , then we should see explicit bounds on the curvatures.

In this vein, the Riemannian structure for *complex hyperbolic space* is similar to the real case, except for a modification to allow for complex variables.

In this setting, a formula for the metric tensor valid over the entire manifold is available from [8], among other places.

One can verify that, although not constant, the entries in the upper-left block are always bounded between and for positive . This result agrees with sectional curvature in complex hyperbolic space, and so serves as an example of sectional curvature computation where the underlying tensor is not diagonal. A careful review of [8] reminds us that this metric is only well-defined up to rescaling, which can change the values of the sectional curvature. What does not change, however, is the ratio of the largest and smallest curvatures, which are always exactly 4. The introduction in [9] takes considerable care to remind us that definitions change between curvatures in , and even .

Perhaps the most interesting applications of differentiable manifolds and curvature to physics lie in the area of relativity. This discipline uses the idea of a *Lorentzian manifold*, which is defined as a manifold equipped with a Lorentzian metric that has signature instead of the signature for four-dimensional Riemannian manifolds. As noted in the introduction, however, this has no impact on the computations of sectional curvature. Examples of such Lorentzian metrics include the *Minkowski flat spacetime metric*; is the familiar constant speed of light.

Justifying the name of *flat* spacetime, our curvature calculation guarantees all sectional curvatures are identically zero.

More generic Lorentzian manifolds may have nonzero curvature. To this end, we examine the *Schwarzschild metric*, which describes spacetime outside a spherical mass such that the gravitational field outside the mass satisfies Einstein’s field equations. This most commonly is viewed in the context of a black hole and how spacetime behaves nearby. More details on the following tensor can be found in [10].

In the following, , and are standard spherical coordinates for three-dimensional space and represents time. With this setup, we can calculate the sectional curvature of spacetime for areas outside such a spherical mass.

This result indicates that the sectional curvature is directly proportional to the mass and inversely proportional to the distance from the object. In particular, there is a singularity at , indicating that curvature “blows up” near the center of the mass. Indeed, these results are in line with Flamm’s paraboloid, the graphical representation of a constant-time equatorial slice of the Schwarzchild metric, whose details can be found in [11].

In fact, the calculations we have done already allow us to compute one further object of interest for a Riemannian or pseudo-Riemannian manifold: the Ricci curvature. The Ricci curvature is a tensor that contracts the curvature tensor and is computable when one has the contravariant Riemannian curvature tensor. Below we use a built-in function for tensors to contract the first and third indices of the contravariant Riemannian curvature tensor to obtain a matrix containing condensed curvature information (see [12] for more information).

The values 1 and 3 above refer to the dimensions we are contracting. In general, the corresponding indices must vary over sets of the same size; here all dimensions have indices that vary over a set whose size is the number of coordinates. We compute the Ricci curvature for some of the previous examples.

The fact that the Ricci curvature vanishes for the above solution to the Einstein field equation is a consequence of its types of symmetries. In general, the Ricci curvature for other solutions is nonzero. Notice for the example (and the , trivially), all information from the Ricci tensor is contained in the diagonal elements. This is always the case for a diagonal metric tensor [12]. As such, we may sometimes be interested only in these values, so we take the diagonal in such a case.

The supervising author would like to thank Dr. Nicolas Robles for suggesting the submission of this article to *The Mathematica Journal*. We would also like to thank Leonard Parker, who authored the notebook file available at [6], which greatly illuminated some of the calculations. We are also very grateful to the referee and especially the editor, whose contributions have made this article much more accurate, legible and efficient.

[1] | M. do Carmo, Differential Geometry of Curves & Surfaces, Mineola, NY: Dover Publications, Inc., 2018. |

[2] | J. M. Lee, Introduction to Smooth Manifolds, Graduate Texts in Mathematics, 218, New York: Springer, 2003. |

[3] | C. Stover and E. W. Weisstein, “Metric Tensor” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/MetricTensor.html. |

[4] | “ParametricPlot3D,” ParametricPlot3D from Wolfram Language & System Documentation Center—A Wolfram Web Resource. reference.wolfram.com/language/ref/ParametricPlot3D.html. |

[5] | F. Catoni, D. Boccaletti, R. Cannata, V. Catoni, E. Nichelatti and P. Zampetti, The Mathematics of Minkowski Space-Time, Frontiers in Mathematics, Basel: Birkhäuser Verlag, 2008. |

[6] | J. B. Hartle, Gravity: An Introduction to Einstein’s General Relativity, San Francisco: Addison-Wesley, 2003. web.physics.ucsb.edu/~gravitybook/math/curvature.pdf. |

[7] | J. G. Ratcliffe, Foundations of Hyperbolic Manifolds, 2nd ed., Graduate Texts in Mathematics, 149, New York: Springer, 2006. |

[8] | J. Parker, “Notes on Complex Hyperbolic Geometry” (Jan 10, 2020). maths.dur.ac.uk/~dma0jrp/img/NCHG.pdf. |

[9] | W. M. Goldman, Complex Hyperbolic Geometry, Oxford Mathematical Monographs, Oxford Science Publications, New York: Oxford University Press, 1999. |

[10] | R. Adler, M. Bazin and M. Schiffer, Introduction to General Relativity, New York: McGraw-Hill, 1965. |

[11] | R. T. Eufrasio, N. A. Mecholsky and L. Resca, “Curved Space, Curved Time, and Curved Space-Time in Schwarzschild Geodetic Geometry,” General Relativity and Gravitation, 50(159), 2018. doi:10.1007/s10714-018-2481-2. |

[12] | L. A. Sidorov, “Ricci Tensor,” Encyclopedia of Mathematics (M. Hazewinkel, ed.), Netherlands: Springer, 1990. www.encyclopediaofmath.org/index.php/Ricci_tensor. |

E. Fairchild, F. Owen and B. Burns Healy, “Sectional Curvature in Riemannian Manifolds,” The Mathematica Journal, 2020. https://doi.org/10.3888/tmj.22–1. |

Elliott Fairchild is a high-school student at Cedarburg High School. He particularly enjoys problems in analysis, and is always looking for more research opportunities.

Francis Owen is an undergraduate student at the University of Wisconsin-Milwaukee. His major is Applied Mathematics and Computer Science, and he is eager to find new programming opportunities.

Brendan Burns Healy is a Visiting Assistant Professor at the University of Wisconsin-Milwaukee. Though a geometric group theorist and low-dimensional topologist by training, he also enjoys problems of computation and coding.

**Elliott Fairchild**

*Department of Mathematical Sciences
University of Wisconsin-Milwaukee
3200 N. Cramer St.
Milwaukee, WI 53211
*

**Francis Owen**

*Department of Mathematical Sciences
University of Wisconsin-Milwaukee
3200 N. Cramer St.
Milwaukee, WI 53211*

**Brendan Burns Healy, PhD**

Department of Mathematical Sciences

University of Wisconsin-Milwaukee

3200 N. Cramer St.

Milwaukee, WI 53211

*www.burnshealy.com*

We study the distribution of eigenspectra for operators of the form with self-adjoint boundary conditions on both bounded and unbounded interval domains. With integrable potentials , we explore computational methods for calculating spectral density functions involving cases of discrete and continuous spectra where discrete eigenvalue distributions approach a continuous limit as the domain becomes unbounded. We develop methods from classic texts in ODE analysis and spectral theory in a concrete, visually oriented way as a supplement to introductory literature on spectral analysis. As a main result of this study, we develop a routine for computing eigenvalues as an alternative to , resulting in fast approximations to implement in our demonstrations of spectral distribution.

We follow methods of the texts by Coddington and Levinson [1] and by Titchmarsh [2] (both publicly available online via archive.org) in our study of the operator and the associated problem

(1) |

where on the interval with real parameter and boundary condition

(2) |

for fixed , where . For continuous (the set of absolutely integrable functions on ), we study the spectral function associated with (1) and (2) using two main methods: First, following [1], we approximate by step functions associated with related eigenvalue problems on finite intervals for some sufficiently large positive ; then, we apply asymptotic solution estimates along with an explicit formula for spectral density [2]. For some motivation and clarification of terms, we recall a major application: For certain solutions of (1) and (2) and for any (the set of square-integrable functions on ), a corresponding solution to (1) may take the form

where

(in a sense described in Theorem 3.1 of Chapter 9 [1]); here, is said to be a spectral transform of . By way of such spectral transforms, the differential operator may be represented alternatively in the integral form

where induces a measure by which (roughly, the set of square-integrable functions when integrated against ) and by which Parseval’s equality holds. Typical examples are the complete set of orthogonal eigenfunctions for and the corresponding Fourier sine transform in the limiting case (cf. Chapter 9, Section 1 [1]).

For a fixed, large finite interval , we consider the problem (1), (2) along with the boundary condition

(3) |

(), which together admit an eigensystem with correspondence

where the eigenvalues satisfying and where the eigenfunctions form a complete basis for . Since the associated spectral function is a step function with jumps at the various , we first estimate these by way of a related equation arising from Prüfer (phase-space) variables and compute the corresponding jumps .

Then, we use interpolation to approximate the continuous spectral function using data from a case of large at points and using

(4) |

imposing the condition for all .

We compare our results with those of a well-known formula [2] appropriate to our case on , which we outline as follows: For fixed , let be the solution to (1) with boundary values

for which the asymptotic formula

(5) |

holds as . Then we have

(6) |

from Section 3.5 [2].

Finally, in the last section, we apply the above techniques to extend our study to operators on large domains and on , where spectral matrices take the place of spectral functions as a matrix analog of spectral transforms on these types of intervals (cf. equation (5.5) [1]). The techniques are described in detail below, but it is of particular interest that our computations uncover an interesting pattern in a discrete-spectrum case, as we are forced to reformulate our approach according to certain eigen-subspaces involved: our desired spectral approximations are resolved by way of an averaging procedure in forming Riemann sums.

Various sections of Chapters 7–9 [1] (see also [3] and related articles) present useful introductory discussion applied to material presented in this article; yet, with our focus on equations (1)–(6), one may proceed given basic understanding of Riemann–Stieltjes integration along with knowledge of ordinary differential equations and linear algebra, commensurate with (say) the use of and .

We compute eigenvalues by first computing solutions on to the following, arising from Prüfer variables (equation 2.4, Chapter 8 [1]):

(7) |

Here, , where is a nontrivial solution to (1), (2) and (3) and satisfies

(8) |

for positive integers . We interpolate to approximate such solutions as an efficient means to invert (8) in the variable . And we use the following function on (7) throughout this article.

Consider an example with , , and potential for parameter with , , in the case , .

We create an interpolation approximation for eigenvalues .

It is instructive to graphically demonstrate the theory behind this method. Here, we consider the eigenvalues as those values of where the graph of intersects the various lines as we use to find (or ), our maximum index , depending on .

We choose these boundary conditions so that we may compare our results with those of applied to the corresponding problem (1) and (2) using .

We now compare and contrast the methods in this case. The percent differences of the corresponding eigenvalues are all less than 0.2%, even within our limits of accuracy.

In contrast, our interpolation method allows some direct control of which eigenvalues are to be computed, whereas (in the default setting) outputs a list up to 39 values, starting from the first. Moreover, our method admits nonhomogeneous boundary conditions, where admits only homogeneous conditions, Dirichlet or Neumann.

We proceed to build our approximate spectral density function for the problem (1) and (2) on with the same potential as above. We compute eigenvalues likewise but now on a larger interval for and with nonhomogeneous boundary conditions, say given by , (albeit does not depend on ).

We compute eigenvalues via our interpolation method and compute a minimum (or ) as well as a maximum index so as to admit only positive eigenvalues; is supported on and negative eigenvalues result in dubious approximations by .

We now compute the values .

We now apply the method of [2] as outlined in equation (6). We use to include data from an interval near the endpoint that includes at least one half-period of the period of the fitting functions and .

The function may return non-numerical results among the first few, in which case we recommend that either or be readjusted or that be set large enough to disregard such results.

We now compare our results of the discrete and continuous (asymptotic fit) spectral density approximations.

We compare the results by plotting percent differences, all being less than 0.1%.

We chose as above because, in part, the solutions can be computed in terms of well-known (modified Bessel) functions. Replacing by , for , the solutions are linear combinations of

(9) |

From asymptotic estimates (cf. equation 9.6.7 [4]), we see that the former is dominant and the latter is recessive as when . Then, from Chapter 9 [1], equation 2.13 and Theorem 3.1, we obtain the density function by computing

(10) |

where is a solution as above and is a solution with boundary values , . (Here, is commonly known as the Titchmarsh–Weyl -function.) In the following code, we produce the density function in exact form by replacing functions from (9), the dominant by 1 and the recessive by 0, to compute the inside limit and thereafter simply allowing to be real.

We likewise compare the exact formula for the continuous spectrum with the discrete results, noting that the exact graph appears to essentially be the same as that obtained by our asymptotic fitting method (not generally expecting the fits to be accurate for small !).

For the operator we now extend our study to large domains in the discrete-spectrum case and to the domain in the continuous-limit case. We choose an odd function potential of the form for positive constants , . We focus on the spectral density associated with specific boundary values at and an associated pair of solutions to (1): namely, we consider expansions in the pair and such that

(11) |

We apply the above computational methods to the analytical constructs from Chapter 5 [1] in both the discrete and continuous cases. First, for the discrete case, we compute spectral matrices associated with self-adjoint boundary-value problems and the pair as in (11): We estimate eigenvalues for an alternative two-point boundary-value problem on for (moderately) large to compute the familiar jumps of the various components . These components induce measures that appear in the following form of Parseval’s equality for square-integrable functions on (taken in a certain limiting sense):

(real-valued case). Second, we compute the various densities as limits as by the formulas

(12) |

where and are certain limits of -functions, related to equation (10), but for our ODE problem on domains and , respectively. The densities are computed by procedures more elaborate than (6), as discussed later. Then, we compare results of the discrete case like in (4), approximating

(13) |

After choosing (self-adjoint) boundary conditions (of which the limits happen to be independent)

(14) |

on an interval , we estimate eigenvalues and compute coefficients , from the linear combinations

for the associated orthonormal (complete) set of eigenfunctions ; , whereby

(real-valued case). Here, the functions result by normalizing eigenfunctions satisfying (14) so that we obtain

We are ready to demonstrate. Let us choose , and , (arbitrary). Much of the procedure follows as above, with minor modification, as we include to obtain the values and (the next result may take around three minutes on a laptop).

We now approximate the density functions by plotting where

(15) |

(for certain ) as we compute the difference quotients at the various jumps, over even and odd indices separately, and assign the corresponding sums to the midpoints of corresponding intervals .

We give the plots below, in comparison with those of the continuous spectra, and give a heuristic argument in the Appendix as to why this approach works.

First, we apply the asymptotic fitting method using the solutions and . Here, we have to compute full complex-valued formulas for the corresponding -functions (cf. Section 5.7 [2]) where a slight modification of the derivation of , via a change of variables and a complex conjugation, results in (See Appendix).

We now compare the result of the discrete and asymptotic fitting methods for the elements .

We have deferred some discussion on our use of , comparison of eigenvalue computations, discrete eigenspace decomposition and Weyl -functions to this section.

First, we have used to suppress messages warning that some solutions may not be found. From Chapter 8 [1], we expect unique solutions since the functions are strictly increasing. We have also used to suppress various messages from and other related functions regarding small values of to be expected with short-range potentials and large domains.

Second, our formulation of and the midpoints as in (15) arises from a decomposition of the eigenspace by even and odd indices. We motivate this decomposition by an example plot of the values , where the dichotomous behavior is quite pronounced, certainly for large .

We are thus inspired to compute the quotients over even and odd indices separately. Then, we consider, say, a relevant expression from Parseval’s equality: for appropriate Fourier coefficients , , associated with respective solutions , we write

We suppose that and for the corresponding transforms in the limit . Of course, a rigorous argument is beyond the scope of this article.

Finally, we elaborate on the calculations of the -functions and : Given the asymptotic expressions

as (resp.), we follow Section 5.7 of [2], making changes as needed, with a modification via complex conjugation (, say) for to arrive at

The author would like to thank the members of MAST for helpful and motivating discussions concerning preliminary results of this work in particular and Mathematica computing in general.

[1] | E. A. Coddington and N. Levinson, Theory of Ordinary Differential Equations, New York: McGraw-Hill, 1955. archive.org/details/theoryofordinary00codd. |

[2] | E. C. Titchmarsh, Eigenfunction Expansions Associated with Second-Order Differential Equations, 2nd ed., London: Oxford University Press, 1962. archive.org/details/eigenfunctionexp0000titc. |

[3] | E. W. Weisstein. “Operator Spectrum” from MathWorld–A Wolfram Web Resource. mathworld.wolfram.com/OperatorSpectrum.html. |

[4] | M. Abramowitz and I. A. Stegun, eds., Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Wiley, 1972. |

C. Winfield, “From Discrete to Continuous Spectra,” The Mathematica Journal, 2019. https://doi.org/10.3888/tmj.21–3. |

C. Winfield holds an MS in physics and a PhD in mathematics and is a member of the Madison Area Science and Technology amateur science organization, based in Madison, WI.

**Christopher J. Winfield**

Madison Area Science and Technology

3783 US Hwy. 45

Conover, WI 54519

*cjwinfield2005@yahoo.com*

H. S. M. Coxeter wrote several geometry film scripts that were produced between 1965 and 1971 [1]. In 1992, Coxeter gave George Beck mimeographs of two scripts that had not been made. Beck wrote Mathematica code for the stills and animations. This material was added to the third edition of Coxeter’s *The Real Projective Plane* [2]. This article updates the Mathematica code.

The example of a thermometer makes it easy to see how the real numbers (positive, zero and negative) can be represented by the points of a straight line.

On the axis of ordinary analytic geometry, the number is represented by the point .

Given any two such numbers, and , we can set up geometrical constructions for their sum, difference, product, and quotient.

However, these constructions require a scaffolding of extra points and lines. It is by no means obvious that a different choice of scaffolding would yield the same final results.

The object of the present program is to make use of a circle (or any other conic) instead of the line, so that the constructions can all be performed with a straight edge, and the only arbitrariness is in the choice of the positions of three of the numbers (for instance, 0, 1 and 2).

Although this is strictly a chapter in projective geometry, let us begin with a prologue in which the scale of abscissas on the axis is transferred to a circle by the familiar process of stereographic projection.

A circle of any radius (say 1, for convenience) rests on the axis at the origin 0, and the numbers are transferred from this axis to the circle by lines drawn through the opposite point.

That is, the point at the top. In this manner, a definite number is assigned to every point on the circle except the topmost point itself.

The numbers come closer and closer to this point on one side, and the numbers come closer and closer on the other side.

So it is natural to assign the special symbol (infinity) to this exceptional point: the only point for which no proper number is available.

The tangent at this exceptional point is, of course, parallel to the axis; that is, parallel to the tangent at the point 0.

Having transferred all the numbers to the circle, we can forget about the axis; but the tangent at the point infinity will play an important role in the construction of sums.

For instance, there is one point on this tangent that lies on the line joining points 1 and 2, also on the line joining 0 and 3, and on the line joining −1 and 4. We notice that these pairs of numbers all have the same sum: .

Similarly, the tangent at 1 meets the tangent at infinity in a point that lies on the lines joining 0 and 2, −1 and 3, −2 and 4, in accordance with the equations .

These results could all be verified by elementary analytic geometry, but there is no need to do this, because we shall see later that a general principle is involved.

Having finished the Euclidean prologue, let us see how far we can go with the methods of projective geometry. Let symbols 0, 1, infinity be assigned to any three distinct points on a given conic.

There is a certain line through 0 concurrent with the tangents at infinity and 1; let this line meet the conic again in 2.

(Alternatively, if we had been given 0, 1, 2 instead of 0, 1, infinity, we could have reconstructed infinity as the point of contact of the remaining tangent from the point where the tangent at 1 meets the line 02.)

We now have the beginning of a geometrical interpretation of all the real numbers.

To obtain 3, we join 1 and 2, see where this line meets the tangent at infinity, join this point of intersection to 0, and assign the symbol 3 to the point where this line meets the conic again. Thus the line joining 0 and 3 and the line joining 1 and 2 both meet the tangent at infinity in the same point.

More generally, we define addition in such a way that two pairs of points have the same sum if their joins are concurrent with the tangent at the point infinity.

In other words, we define the sum of any two points and to be the remaining point of intersection of the conic with the line joining 0 to the point where the tangent at infinity meets the join of and .

To justify this definition, we must make sure that it agrees with our usual requirements for the addition of numbers: the commutative law

a unique solution for every equation of the form

and the associative law

The commutative law is satisfied immediately, as our definition for involves and symmetrically.

The equation is solved by choosing so that and have the same sum as and .

Thus the only possible cause of trouble is the associative law; we must make sure that for any three points , , (not necessarily distinct), the sum of and is the same as the sum of and .

For this purpose, we make use of a special case of Pascal’s theorem, which says that if is a hexagon inscribed in a conic, the pairs of opposite sides (namely and , and , and ) meet in three points that lie on a line, called the Pascal line of the given hexagon.

In 1639, when Blaise Pascal was sixteen years old, he discovered this theorem as a property of a circle.

He then deduced the general result by joining the circle to a point outside the plane by a cone and then considering the section of this cone by an arbitrary plane.

We do not know how he proved this property of a hexagon inscribed in a circle, because his original treatise was lost, but we do know how he might have done it, using only the first three books of Euclid’s *Elements*. In our own time, an easier proof can be found in any textbook on projective geometry.

Each hexagon has its own Pascal line. If we fix five of the six vertices and let the sixth vertex run round the conic, we see the Pascal line rotating about a fixed point.

If this fixed point is outside the conic, we can stop the motion at a stage when the Pascal line is a tangent. This is the special case that concerns us in the geometrical theory of addition.

The hexagon shows that the sum of and is equal to the sum of and .

Beginning with 0, 1 and infinity, we can now construct the remaining positive integers

and so on.

We can also construct the negative integers , given by

and so on.

Alternatively, we can construct the negative integers using

and so on.

By fixing while letting vary, we obtain a vivid picture of the transformation that adds to every number . The points and chase each other round the conic, irrespective of whether happens to be positive or negative.

In our construction for the point 2, we tacitly assumed that the tangent at 1 can be regarded as the join of 1 and 1.

More generally, the join of and meets the tangent at infinity in a point from which the remaining tangent has, for its point of contact, a point such that , namely, , which is the arithmetic mean (or average) of and .

This result holds not only when is even but also when is odd; for instance, when and are consecutive integers. In this way we can interpolate 1/2 between 0 and 1, 1 1/2 between 1 and 2 and so on.

We shall find it convenient to work in the scale of 2 (or binary scale), so that the number 2 itself is written as 10, one half as 0.1, one quarter as 0.01, three quarters as 0.11 and so on.

We can now interpolate

1.1 between 1 and 10, …

1.01 between 1 and 1.1, …

… and so on to the eighths between 1 and 10.

In fact, we can construct a point for every number that can be expressed as a terminating “decimal” in the binary scale. By a limiting process, we can thus theoretically assign a position to every real number.

For instance, the square root of two, being (in the binary scale)

is the limit of a certain sequence of constructible numbers:

Conversely, by a process of repeated bisection, we can assign a binary “decimal” to any given point on the conic. (The “but one” is, of course, the point to which we arbitrarily assigned the symbol infinity.)

We can now define multiplication in terms of the same three points 0, 1 and infinity.

Two pairs of points have the same product if their joins are concurrent with the line joining 0 and infinity.

The geometrical theory of projectivities is somewhat too complicated to describe here, so let us be content to remark that, if we pursued it, we could prove that our definition for addition is consistent with this definition for multiplication.

The product is positive if the point of concurrence is outside, negative if it is inside the conic.

In other words, we define the product of any two points and on the conic to be the remaining point of intersection of the conic with the line joining 1 to the point where the line joining 0 and infinity meets the line joining and .

Of course, the question arises as to whether this definition agrees with our usual requirements for the multiplication of numbers:

- the commutative law
- a unique solution for every equation of the form (with )
- the associative law

The commutative law is satisfied immediately, as our definition for involves and symmetrically.

The equation is solved by choosing so that and have the same product as 1 and .

Finally, another application of Pascal’s theorem suffices to show the associative law.

That is, for any three points , , , the product of and is equal to the product of and . In fact, the appropriate hexagon is .

By fixing while letting vary, we obtain a vivid picture of the transformation that multiplies every number by . If is positive, the points and chase each other round the conic.

But if is negative, they go round in opposite directions.

The familiar identity is illustrated by the concurrence of the tangent at 2 with the line joining 1 and 4 and the line joining 0 and infinity.

More generally, if and are any two numbers having the same sign, the join of the corresponding points meets the line joining 0 to infinity in a point from which the two tangents have, for their points of contact, points such that , namely , where the square root of is the geometric mean of and .

Setting and , we obtain a construction for the square root of two without having recourse to any limiting process. In fact, we have finite constructions for all the “quadratic” numbers commonly associated with Euclid’s straight-edge and compass.

One of the most fruitful ideas of the nineteenth century is that of one-to-one correspondence. It is well illustrated by the example of cups and saucers. Suppose we have about a hundred cups and about a hundred saucers and wish to know whether the number of cups is actually equal to the number of saucers. This can be determined, without counting, by the simple device of putting each cup on a saucer, that is, by establishing a one-to-one correspondence between the cups and saucers.

In our first application of this idea to plane geometry, the cups are points, the saucers are lines and the relation “cup on saucer” is incidence. As we know, a line is determined by any two of its points and is of unlimited extent. We say that a point and a line are “incident” if the point lies on the line; that is, if the line passes through the point. It is natural to ask whether the number of points on a line is actually equal to the number of lines through a point. In ordinary geometry both numbers are infinite, but this fact need not trouble us: if we can establish a one-to-one correspondence between the points and lines, there are equally many of each.

The set of all points on a line is called a range and the set of all lines through a point is called a pencil. If the line and the point are not incident, we can establish an elementary correspondence between the range and the pencil by means of the relation of incidence. Each point of the range lies on a corresponding line of the pencil. The range is a section of the pencil (namely the section by the line ) and the pencil projects the range (from the point ).

In our picture, the range is represented by a red point moving along a fixed line (which, for convenience, is taken to be horizontal) and the pencil is represented by a green line rotating around a fixed point .

There is evidently a green line for each position of the red point. But we must admit that for some positions of the green line the red point cannot be seen because it is too far away; in fact, when the green line is parallel to (that is, horizontal), the red point is one of the ideal “points at infinity” that we agree to add to the ordinary plane so as to make the projective plane. Without this ideal point, our elementary correspondence would not be one-to-one: the number of points in the range would be one less than the number of lines in the pencil. In other words, the postulation of ideal points makes it possible for us to express the axioms for the projective plane in such a way that they remain valid when we consistently interchange the words “point” and “line” (and consequently also certain other pairs of words such as “join” and “meet”, “on” and “through”, “collinear” and “concurrent” and so forth). It follows that the same kind of interchange can be made in all the theorems that can be deduced from the axioms.

This principle of duality is characteristic of projective geometry. In the plane we interchange points and lines. In space, the same principle enables us to interchange points and planes, while lines remain lines.

When we regard the elementary correspondence as taking us from the point to the line , we write the capital before the small , as . The inverse correspondence, from to , is denoted by the same sign with the small before the capital , as . If , , , … are particular positions of , and , , , … of , we write all these letters before and after the sign, taking care to keep them in their corresponding order (which need not be the order in which they appear to occur in the figure), .

This notation enables us to exhibit the principle of duality as the possibility of consistently interchanging capital and small letters.

By combining two elementary correspondences, one relating a range to a pencil and the other a pencil to a range, we obtain a perspectivity. This either relates two ranges that are different sections of one pencil, or two pencils that project one range from different centers.

In the former case, two of the symbols with one bar , or can be abbreviated to one with two bars, or, if we wish to specify the point that carries the pencil, we put above the two bars, as .

In the latter case (when two pencils project one range from different centers), the two symbols with one bar are again abbreviated to one with two bars, and if we wish to specify the line that carries the range, we put above the bars.

We can easily go on to combine three or more elementary correspondences. But then we prefer not to increase the complication of the symbols. Instead, we retain the simple symbol (with just one bar) for the product of any number of elementary correspondences. Such a transformation is called a projectivity. Thus elementary correspondences and perspectivities are the two simplest instances of a projectivity.

The product of three elementary correspondences is the simplest instance of a correspondence relating a range to a pencil in such a way that the range is not merely a section of the pencil.

The product of four elementary correspondences, being the product of two perspectivities, shares with a simple perspectivity the property of relating a range to a range or a pencil to a pencil. Now there is the interesting possibility that the initial and final range (or pencil) may be on the same line (or through the same point). We see two moving red points and , on , related by perspectivities from and to an auxiliary red point on . When reaches , on , we have another invariant point; the three red points all come together.

Such a projectivity, having two distinct invariant points, is said to be hyperbolic.

On the other hand, the three lines , and may all meet in a single point , so that coincides with and there is only one invariant point. Such a projectivity is said to be parabolic.

A third possibility is an elliptic projectivity that has no invariant point, but this is more complicated, requiring three perspectivities (i.e., six elementary correspondences). The centers of the three perspectivities are , and . The green lines, rotating around these points, yield four red points. Two of the red points and chase each other along the bottom line .

These two points are related by the elliptic projectivity.

However, this is not the most general elliptic projectivity.

There is a special feature arising from the fact that the points , , lie on the sides of the green triangle. When one of the two red points is at , the other is at , and vice versa: the projectivity interchanges and and is consequently called an involution. Thus we are watching an elliptic involution.

Looking closely, we see that it not only interchanges and but also interchanges every pair of related points. For instance, it interchanges with (on ). An important theorem tells us that for any four collinear points , , , , there is just one involution that interchanges with and with .

We denote it by . At any instant, the two red points are a pair belonging to this involution. Call them and . We now have three pairs of points, , , , on the bottom dark blue line, all belonging to one involution. The other lines form the six sides of a complete quadrangle , which consists of four points (no three collinear) and the six lines that join them in pairs. Two sides are said to be opposite if their point of intersection is not a vertex; for instance, and are a pair of opposite sides.

We see now that the six points named on the bottom dark blue line are sections of the six sides of the quadrangle, and that each related pair comes from a pair of opposite sides. Accordingly the six points, paired in this particular way, are said to form a quadrangular set. Here is another version of the quadrangle and the corresponding quadrangular set , , . As before, is a pair of the involution .

This remains true when we move the bottom dark blue line to a new position so that coincides with and with . Now and are invariant points, and we have a hyperbolic involution , which still interchanges and .

The quadrangular set of six points has become a harmonic set of four points. We say that and are harmonic conjugates of each other with respect to and , and that the four points satisfy the relation .

This means that there is a quadrangle having two opposite sides through and two opposite sides through , while one of the remaining two sides passes through and the other through .

Given , and , we can construct by drawing a triangle whose sides pass through these three points.

Let meet in ; then meets in . Of course, the hyperbolic involution can still be constructed as the product of three perspectivities (with centers , , ).

But the invariant points and enable us to replace these three perspectivities by two, with centers (where meets ) and .

Another product of two perspectivities relates ranges on two distinct lines. The fundamental theorem of projective geometry tells us that a projectivity relating ranges on two such lines is uniquely determined by any three points of the first range and the corresponding three points of the second. There are, of course, many ways to construct the projectivity as the product of two or more perspectivities, but the final result will always be the same.

For instance, there is a unique projectivity relating on the first line to on the second. This means that for any point on there is a definite point on .

The simplest way to construct this projectivity is by means of perspectivities from and , so that is first related to on and then to on . We can regard as a variable triangle whose vertices run along fixed lines , , while the two sides and rotate around fixed points and . The third side joins the projectively related points and .

This construction remains valid when and are of general position, instead of lying on the lines that carry the related ranges. Let meet in and in . Now we have a construction for the unique projectivity that relates to .

As before, the vertices of the variable triangle run along fixed lines , , while the two sides and rotate around the fixed points and . The possible positions for the third side include, in turn, each of the five sides of the pentagon .

Carefully watching this line , we see that it envelops a beautiful curve.

This is the same kind of curve that was constructed quite differently by Menaechmus about 340 BC. Since that time it has been known everywhere as a conic. One important property is that a conic is uniquely determined by any five of its tangents, and that these may be any five lines of which no three are concurrent.

Since the possible positions for our variable line include, in turn, each side of the pentagon , we call its envelope the conic inscribed in this pentagon.

To sum up: Let be a variable point on the diagonal of a given pentagon . Then the point , where meets , and the point , where meets , determine a line whose envelope is the inscribed conic.

For any particular position of (on ), we see a hexagon whose six sides all touch the conic. The three lines , , , which join pairs of opposite vertices, are naturally called diagonals of the hexagon. Thus, if the diagonals of a hexagon are concurrent, the six sides all touch a conic. Conversely, if all the sides of a hexagon touch a conic, five of them can be identified with the lines , , , , . Since the given conic is the only one that touches these fixed lines, the sixth side must coincide with one of the lines that we have constructed. We thus have Brianchon’s theorem: If a hexagon is circumscribed about a conic, the three diagonals are concurrent.

All these results can, of course, be dualized. (Now all the letters that we use are lowercase, representing lines.)

For any pentagon whose vertex is joined to by and to by , there is a unique projectivity relating to .

The sides of the variable triangle rotate about fixed points , , while the two vertices and run along the fixed lines and . The possible positions for the third vertex include, in turn, each of the five vertices of the pentagon.

Carefully watching this moving point , we see that it traces out a curve through these five fixed points (no three concurrent).

What is this curve, the dual of a conic?

One of the many possible definitions for a conic exhibits it as a self-dual figure, with the interesting result that the dual of a conic (regarded as the envelope of its tangents) is again a conic (regarded as the locus of the points of contact of these tangents).

Thus the locus of the point is a conic, and this is the only conic that can be drawn through the five vertices of the pentagon.

To sum up: Let be a variable line through the intersection of two non-adjacent sides of a given pentagon . Then the line , which joins to , and the line , which joins to , determine a point whose locus is the circumscribed conic.

The hexagon , which, for convenience, we rename , yields the dual of Brianchon’s theorem, namely Pascal’s theorem: If is a hexagon inscribed in a conic, the points , , (where pairs of opposite sides meet) are collinear.

The hexagon that we see is, perhaps, unusual, because its sides cross one another. From the standpoint of projective geometry, this feature is irrelevant. A convex hexagon would serve just as well, but the “diagonal points” would be inconveniently far away. Another natural observation is that our conic looks like the familiar circle. In fact, this famous theorem was first proved for a circle in 1639, when its discoverer, Blaise Pascal, was only sixteen years old. Nobody knows just how he did it, because his original treatise has been lost.

But there is no possible doubt about how he deduced the analogous property of the general conic. He joined the circle and lines to a point outside the plane, obtaining a cone and planes. Then he took the section of this solid figure by an arbitrary plane.

We change the position of the points of the hexagon.

In this way the conic appears in one of its most ancient aspects: as the section of a circular cone by a plane of general position.

We change the position of the points of the hexagon.

Thanks to Gregory Robbins, who sparked this update and was able to read the files from an old diskette.

[1] | College Geometry Project (1963–71). (Dec. 19, 2018) archive.org/details/CollegeGeometry. |

[2] | H. S. M. Coxeter, The Real Projective Plane, 3rd ed., New York: Springer, 1993. |

H. S. M. Coxeter and G. Beck, “The Arithmetic of Points on a Conic and Projectivities,” The Mathematica Journal, 2018. https://doi.org/10.3888/tmj.21–2. |

H. S. M. Coxeter (1907–2003) was a Canadian geometer. For an extensive biography, see mathworld.wolfram.com/news/2003-04-02/coxeter.

George Beck earned a B.Sc. (Honours Math) from McGill University and an MA in math from the University of British Columbia. He has been the managing editor of *The Mathematica Journal* since 1997. He has worked for Wolfram Research, Inc. since 1993 in a variety of roles.

**George Beck**

*102-1944 Riverside Drive
Courtenay, B.C., V9N 0E5
Canada*

A comprehensive discussion is presented of the closed-form solutions for the responses of single-degree-of-freedom systems subject to swept-frequency harmonic excitation. The closed-form solutions for linear and octave swept-frequency excitation are presented and these are compared to results obtained by direct numerical integration of the equations of motion. Included is an in-depth discussion of the numerical difficulties associated with the complex error functions and incomplete gamma functions, which are part of the closed-form solutions, and how these difficulties were overcome by employing exact arithmetic. The closed-form solutions allowed the in-depth study of several interesting phenomena. These include the scalloped behavior of the peak response (with multiple discontinuities in the derivative), the significant attenuation of the peak response if the sweep frequency is started at frequencies near or above the natural frequency, and the fact that the swept-excitation response could exceed the steady-state harmonic response.
### Notation

### 1. Introduction

### 2. Equations of Motion

### 3. Closed-Form Solution: Linear Sweep

### 4. Closed-Form Solution: Octave Sweep

### 5. Challenges in Separating Real and Imaginary Parts of Closed-Form Solutions

### 6. Challenges in Numerical Evaluation of the Exact Closed-Form Solutions

### 7. Comparison of Exact and Numerical Solutions

### 8. Construction of Peak Response Curves

### 9. Peak Response Curves for Linear Sweep

#### 9.1 Discontinuities in Derivative of Peak Response Curves at Low Frequencies

### 10. Peak Response Curves for Octave Sweep

#### 10.1 Numerical Integration in the Domain

#### 10.2 Numerical Optimization to Identify Peak Response

#### 10.3 Peak Response Curves for Octave Sweep

### Conclusion

### Acknowledgments

### References

### About the Authors

]]> complex variable

complex variable

dimensionless composite parameter

complex variable

error function

imaginary error function

linear sweep rate in Hz per minute

nonzero start frequency for an octave sweep rate

natural frequency in Hz

complex variable

complex variable

octave sweep rate in octaves per minute

complex variable

time (also used as a dummy integration variable)

upper limit of search for peak values

time at which instantaneous frequency of excitation for linear sweep equals

time at which instantaneous frequency of excitation for octave sweep equals

* * new independent variable for octave sweep

value at which instantaneous frequency of excitation for octave sweep equals

single-degree-of-freedom system displacement response

single-degree-of-freedom system velocity response

single-degree-of-freedom system acceleration response

initial displacement

initial velocity

complex variable

octave sweep rate in octaves per second

composite parameter for closed-form solution for linear sweep

composite parameter for closed-form solution for linear sweep

composite parameter for closed-form solution for linear sweep

composite parameter for closed-form solution for octave sweep

general phase function

composite parameter for closed-form solution for octave sweep

initial phase value

incomplete gamma function

dummy integration variable

composite variable proportional to

composite parameter for closed-form solution for linear sweep

composite parameter for closed-form solution for linear sweep

natural frequency in radians per second

linear sweep rate in radians per second per second

generalized sweep forcing function

multiplication

composite parameter for closed-form solution for octave sweep

composite parameter for closed-form solution for octave sweep

critical damping ratio

dummy integration variable

Harmonic excitation is a fact of life in systems with rotating machinery, such as liquid rocket engine turbopumps, spacecraft momentum wheels, aircraft turbojet engines, electric plant steam turbines and liquid-transport turbine compressor trains. Associated with high performance are high shaft speeds and the resulting excitation caused by imbalances in the rotating components and imperfections in the shafts and ball bearings. Furthermore, phenomena such as shaft whirl and rotor dynamic instability are critical design aspects. Although performance requirements dictate design parameters such as shaft speed, avoiding certain speeds due to dynamic interactions within the system is also a critical design consideration. Completely avoiding critical speeds may not be possible. For example, if the critical speeds are below the operational shaft speed, then at startup and shutdown, the rotation rate sweeps through them. The magnitude of the response is a function of the sweep rate, system damping and modal gains at the excitation and response locations. In addition, bearing imperfection can produce excitation above and below the operational frequency, and responses to these imperfections are also a function of the sweep rate associated with the startup and shutdown of the system. In addition to rotating machinery considerations, frequency sweep effects are a critical aspect of harmonic base shake vibration testing, as employed in the aerospace industry, for example. Therefore, it has been recognized that being able to predict the vibration response of systems to swept-frequency excitation is critical (e.g. [1–7]).

In 1932 Lewis presented the first response of a single-degree-of-freedom system to linear frequency sweep excitation [1]. He derived an expression for the envelope functions that contained the peak values. The limited quantitative results presented by Lewis were obtained by graphical integration for various levels of damping and sweep rate. Lewis concluded that the greater the sweep rate, the larger the attenuation relative to steady-state response, and the higher the instantaneous frequency of excitation would be at which the peak envelope response occurs. Fearn developed in 1967 [2] an algebraic expression for the time at which the peak displacement response of a single-degree-of-freedom system subjected to a linear frequency sweep would occur, and an approximate magnitude of the displacement response. Until Cronin’s dissertation [3], published in 1965, analytical studies were generally restricted to linear frequency sweep, and exponential sweep-excitation studies were mostly experimental in nature. Cronin did provide results for relatively slow sweep rates; his work included analog studies involving linear and exponential excitation frequency sweeps. In addition to spring-mass single-degree-of-freedom systems, work has also been done on unbalanced flexible rotors whose spin rate swept through its critical speeds, e.g., [4]. In these types of systems the modes of vibration would be a function of the spin rate and the resulting gyroscopic moments. In 1964 Hawkes [5] described an approach for obtaining the envelope function of the response of single-degree-of-freedom systems subjected to octave sweep rates. He credits the solution approach to an unpublished document written in April 1961 by T. J. Harvey. From the publication, it is unclear how all required initial conditions were obtained for the resulting differential equations that were solved by numerical integration. The results, however, are consistent with subsequent work published by Lollock [6], who extended the work for both linear and octave sweep rates to useful damping and natural frequency ranges.

In approaches where the envelope function is used to identify the peak response, several factors need to be considered. First, the peak of the envelope function may not coincide with the peak of the time history response; this could lead to an incorrect estimate of the instantaneous excitation frequency that coincides with the peak response. The discrepancy would be greatest for low-frequency systems and decrease as the natural frequency increases relative to the starting frequency of the sweep. Another peculiar feature of this approach is that, whereas the original equation of motion is a second-order differential equation with two initial conditions (say, on the function and its derivative), the envelope equations turn out to be two coupled second-order differential equations, each of which requires two initial conditions, and there does not appear to be any way to derive these four necessary initial conditions from the original two for the equation of motion. There are physical arguments that one could make regarding what the initial conditions ought to be, but there does not appear to be any way to mathematically derive them from the original initial conditions.

It is the purpose of this article to extend and complement previously published work by proposing explicit closed-form solutions to both linear and octave frequency-sweep excitation. This allows the computation of the peak response, not just the peak of the envelope function. The closed-form solutions involve error functions and incomplete gamma functions of complex arguments, computations of which require numerical precision exceeding that which today’s computers can provide. The approach used to overcome this will be described. The closed-form solutions are compared to solutions obtained by numerical integration of the equations of motion. Having the ability to compute closed-form solutions, studies were performed to explore the impact of the frequency separation between the start frequency of the sweeps and the natural frequency of the system. In addition, results are presented showing the fine structure of the peak response in relation to the steady-state resonance response as a function of natural frequencies and critical damping ratios. This includes some unexpected results, in that the peak response curves exhibit highly nonlinear behavior with discontinuities in the derivative.

The differential equation for the motion of a single-degree-of-freedom system driven by harmonic excitation with a linear frequency sweep is given by

(1) |

where is the critical damping ratio, is the natural frequency, is the sweep rate in radians per second per second, and the dots indicate differentiation with respect to time. Assume, without any loss of generality, a sweep starting frequency of zero, a force magnitude equal to the mass of the system and initial conditions of and . The differential equation of motion of a single-degree-of-freedom system driven by harmonic excitation with an octave frequency sweep is

(2) |

where , is the octave sweep rate in octaves/sec, and is the nonzero start frequency of the sweep. As for the linear sweep case, assume a force magnitude equal to the mass of the system and initial conditions of and .

It is helpful to also write both the linear sweep and the octave sweep equations in the following form:

(3) |

where is a general phase function and is the initial phase. Both the linear and octave sweep equations of motion can be put into the following more general form, which will be useful for constructing closed-form solutions:

(4) |

The solution to equation (4) can be expressed as

(5) |

For linear sweep, this becomes

(6) |

If the sine terms are expanded in terms of complex exponentials, then the resulting integrals can be computed in terms of the error function, , and the imaginary error function, , each with complex argument , where . Conceptually, the process proceeds as follows:

- After converting the sine terms to complex exponentials, expand out the products of sums of exponentials, splitting the integral accordingly into a sum of several integrals of exponentials and pulling the parts of each integrand that do not depend on the integration variable outside the integral; the resulting integrals will all have the form.
- With some algebraic manipulation, these integrals can be put into the form or , where , and are, in general, complex valued.
- Choosing as the new integration variable, the first of these integrals becomes:

An identical procedure can be applied to the second of these integrals, leading to an expression involving imaginary error functions. Performing the indicated calculations (including the associated algebra) gives the following closed-form solution for the linear frequency-sweep excitation case. In the interests of compactness, it is helpful to first introduce the following auxiliary parameters:

Then the closed-form solution for the linear sweep case can be written as

(7) |

In order to verify that this equation for does in fact satisfy the equation of motion, we make use of the fact that the derivatives of the error function and the imaginary error function are given by the exponentials and . Then substituting equation (7) and its derivatives into the equation of motion yields an expression involving all of the original and functions, plus a number of terms that do not contain any error functions. Collecting terms with respect to the various error functions, which is relatively straightforward although algebraically tedious, verifies that the coefficient of each of the error functions is zero, and that the terms that do not contain any error functions sum to , which is the forcing function on the right-hand side of the equation. Since we are interested in the peak acceleration response, the second derivative of the solution, equation (7), is the sought-after response time history.

For the case of octave sweep, it is helpful to make a change of independent variable in equation (2) and let , where is the octave sweep rate in octaves per minute. With this change of variable, the equation of motion for octave sweep in the domain becomes

(8) |

The initial conditions become and . Similarly, the expression for the second derivative with respect to time becomes (in terms of derivatives with respect to ),

(9) |

where we define . The advantage of making this change of variable, from the perspective of numerical integration, is *the absence of exponential functions of time in the forcing function* in equation (8); rather, the forcing function is a constant-frequency sine wave, and the coefficients in the equation are at most quadratic in . This greatly improves the stability and reliability of the numerical integration.

It is helpful to write equation (8) for the octave sweep in the following more general form:

(10) |

where . Then using the variation of parameters method, we obtain the following expression for :

(11) |

Substituting for and then expanding the sines in terms of complex exponentials yields integrals of the form , which are readily expressed in terms of incomplete gamma functions after algebraic transformation. The incomplete gamma function is given by . For compactness, it is helpful to first introduce the following auxiliary parameters: , , , and . Then the resulting expression for reduces to

(12) |

Substituting yields the corresponding solution in the time domain:

(13) |

Computing the first and second derivatives of equation (13) and substituting them into the original equation of motion, equation (2), one discovers, after some algebra and collecting terms with respect to the various incomplete gamma functions, that the resulting equation can be put into the form . Since we are interested in oscillatory motion, which implies , it follows that reduces to zero, thereby showing that equation (13) does indeed satisfy equation (2).

The sought-after solutions are the real parts of equation (7) and equation (13). For the linear sweep, series expressions exist for the real and imaginary parts of both and : functions.wolfram.com/GammaBetaErf/Erf/19 for and functions.wolfram.com/GammaBetaErf/Erfi/19 for contain series expressions in terms of Hermite polynomials as well as hypergeometric functions. In practice, these series have very slow and highly nonmonotonic convergence properties, with the partial sums fluctuating over many orders of magnitude as successive terms are added. Furthermore, numerical evaluation of these partial sums using exact numbers as inputs is extremely slow and computation time increases nonlinearly with the number of terms, while evaluation using finite-precision numbers yields erroneous results. Since one does not know ahead of time how many terms will be needed for an accurate computation, this approach is impractical. As with the error function, there are similar numerical challenges in computing the incomplete gamma function of complex arguments. Accordingly, the closed-form solutions will be computed using equations (7) and (13) directly.

There are also numerical challenges associated with the exact solutions because of the complex arguments of the error and gamma functions. Recall that the error function is given by and observe that the magnitude (i.e. the absolute value) of * *is the same as the magnitude of ,* *since .* *However, once the argument becomes complex, we would need to integrate expressions of the form , and the presence of the term in the exponent means that the real part of the exponent grows very quickly with , that is, as . Since is analytic in the complex plane, we can use the Cauchy integral theorem for line integrals [8] to break the integral from 0 to (complex) into two parts: the integral from to plus the integral from to . In the integral from to , we are in effect integrating from to . Thus, both * *and increase very quickly, as shown in the plots in Figures 1 and 2. In order for the end result of the combinations of * *and that appear in the exact solution to sum to an oscillatory function, very precise cancellations are needed, meaning that extremely high precision is needed in order to do the numerical evaluations correctly.

**Figure 1.** Plot of . Observe that over the range and , the magnitude of increases to about .

**Figure 2.** Plot of . The behavior of is similar to that of .

Because of the extremely high numerical precision requirements, Mathematica, which implements arbitrary-precision arithmetic, was chosen to compute the closed-form solutions. This made it possible to experiment with different levels of computational precision. Some results were computed with hundreds or thousands of digits of precision. Depending on the values of the input parameters (sweep rate, natural frequency, damping coefficient, etc.), it was found that different levels of precision were needed in order to get reliable results—not a very attractive idea, since it is impossible to know ahead of time how much precision would be needed for any particular set of inputs. Fortunately, Mathematica also allows exact arithmetic (using rational and/or exact symbolic numbers as inputs), and this made it possible to use the exact analytic solutions in a computationally tractable form. More specifically, one can evaluate functions numerically using exact arithmetic by means of the following steps:

- Convert all of the inputs to integers, rational numbers or exact symbolic numbers such as or , or rational multiples thereof, all of which are treated as having infinite precision.
- Set the global variable , which specifies the maximum number of extra digits of precision, to . This enables as much extra precision as possible.
- Evaluate the function of interest with the desired (exact) inputs. This will, in general, yield a very complicated exact expression.
- Evaluate this exact value to the desired number of digits of precision for the output in order to get a recognizable numerical value, with the understanding that any imaginary “dust” arising from this numerical truncation will be ignored. (For the results presented later, we used 30 digits of output precision.)

To build confidence in the closed-form solution, the equation of motion was also solved by direct numerical integration. For the linear sweep case, the results presented herein were obtained from the closed-form solution, equation (7), as well as direct integration of the differential equation of motion, equation (1). The *closed-form* solution was evaluated by first rationalizing all of the inputs to (7) (other than integers, rational numbers and multiples of and ) using (which converts any number to rational form), and then evaluating the real part of the result (to eliminate any very small imaginary numbers) to the desired number of digits of precision (typically 30, 50 or 100) with the function. The *numerical *solution was obtained by integrating the equation of motion (1) with out to a some desired maximum time (typically some time after the sweep frequency hits the natural frequency of the system), with , , and .

Figure 3 shows the response time histories for a system with a natural frequency of 5 Hz and a critical damping ratio of 1%. The sweep frequency was started at zero Hz and the sweep rate was 150 Hz/min, or . In the figure, the dashed orange line is the closed-form solution and the dotted blue line is the direct numerical integration solution. Clearly, the differences are imperceptible. Table 1 shows the numerical values for both solutions for a randomly selected subset of the time points used in plotting Figure 3. Again, it is evident that for all practical purposes, the solutions are identical.

**Figure 3.** Acceleration response time histories of a single-degree-of-freedom system, , excited by a harmonic force with a linear sweep rate frequency of 150 Hz/min.

For the octave sweep case, the results in Table 2 were obtained from the closed-form solution, equation (13), as well as by direct integration of the differential equation of motion in the -domain, equation (8). The procedure for evaluating the *closed-form* solution for the octave sweep case was identical to that described for the closed-form solution in the linear sweep case. The *numerical *solution was obtained by integrating equation (8) in the -domain with from out to some desired maximum value of (typically corresponding to some time beyond the time at which the sweep frequency hits the natural frequency of the system), with and , and then using equation (9) to transform the acceleration back to the -domain. Figure 4 shows the response time histories for a system with a circular natural frequency of 1/4 Hz and critical damping ratio of 0.01. The sweep frequency was started at 1/8 Hz and the sweep rate was 1/2 octaves/min. The orange dashed line is the closed-form solution and the blue dotted line is the direct numerical integration solution. Again, the differences are imperceptible. Table 2 provides the numerical values for both solutions for a randomly selected subset of the time points used in plotting Figure 4; for all practical purposes, the results are identical.

**Figure 4.** Acceleration response time histories of a single-degree-of-freedom system, , excited by a harmonic force with an octave sweep rate frequency of 0.5 octaves/min.

The construction of the peak response curves involved two steps. First, the times at which the peak of the absolute value of the acceleration occurred were obtained via numerical integration for the desired combinations of , and for linear sweep or for octave sweep. These times were then used as the starting points for a very fine-grained search of the exact analytical solutions in order to determine the peak acceleration in each case. Development of a generic algorithm to accomplish this was not trivial, as will be discussed. However, the effort was made easier by previously published results that indicate that the peak envelope values, which would contain the peak response values, would occur *after* the instantaneous frequency of excitation was equal to the natural frequency of the system. Hence, the search for the peaks was started at the point in the response time history where the instantaneous frequency of excitation was equal to the circular natural frequency of the system. For the linear sweep excitation, the time was computed as

(14) |

and for the octave sweep excitation, the value was computed as

(15) |

In the case of the numerical approach, we sorted the list of computed acceleration values generated via integration, starting at in order to find an initial approximation to the peak acceleration, and then did a more refined local search around this peak using standard local optimization techniques. In the case of the analytical approach, much smaller increments in were used in order to get a sharper picture of some unusual phenomena that emerge at low frequencies. Accordingly, interpolations were generated of the times at which the *numerically generated *peak responses occurred, as a function of for combinations of and for linear sweep, or for octave sweep. Thus, for any value of , we could use this interpolated time value as the starting point for a refined numerical search that involved evaluating the *exact analytical solution* at very closely spaced time points in a neighborhood of this time. For this, we chose time points that were equally spaced in the phase of the forcing function, that is, at 0.25° phase increments, which provided precise, although computationally intense results. In addition, care was taken to search for the peak sufficiently past the start of the search, given by equation (14) or equation (15), to guarantee that the global maximum peak had been found.

Associated with the question of at which point in a time history to start the search for the peak value, that is, and , is the question of how far past this point the search should be conducted to guarantee that the global peak has been identified. Unfortunately, the only way we found to reliably accomplish this was through trial and error. For linear sweep, we found experimentally that it was very helpful to divide the range into two parts: and .

For relatively low natural frequencies, that is , it was found experimentally that evaluating the function out to gives reliable results in most cases, with the peak response typically occurring about 20% of the way out to . At low values of , however, sometimes the peak response occurred about 45 to 50% of the way out to . Although with hindsight, we could have obtained the peak response without going out so far in time, we wanted to be sure that the peak response found was in fact the true global peak response. We observed that in some cases, what looked like a global peak value eventually got “dethroned” by a peak that occurred quite a few cycles later, due to the beating of the frequencies involved. Thus, all of the low-frequency responses, as well as a subset of the high-frequency responses, were visually monitored graphically, and if any peak responses were found at times more than 50% or so of the way out to , then the coefficient of for was increased accordingly. For higher natural frequencies, that is, , it was found that generally gave reliable results for high sweep rates (~150-200 Hz/min), while gave reliable results for lower sweep rates (~10-20 Hz/minute).

In view of the oscillatory nature of the system, it was important to constrain the maximum integration step size to be at most a small fraction of a cycle. Based on prior experience with similar computations, we chose the maximum step size to be 1/40 of a cycle of the largest frequency of interest, which was the sweep frequency at the value described previously. For simplicity, we deliberately chose to constrain the maximum step size based on the largest frequency of interest, encountered at time , rather than attempt to change the maximum step size as the frequency changed. For the low sweep frequencies encountered in the early parts of a sweep, this step size was much smaller than 1/40 of a cycle, but this did not create any problems. The numerical integrator used () employs an adaptive algorithm that adjusts the step size as needed, subject to any user-prescribed constraints. In addition, we used fifth-order interpolation in the numerical integrator so that the acceleration would be a third-order interpolating function. Finally, in view of the progressive increase of sweep frequency with time, we found it useful to specify a maximum of 100,000,000 integration time steps (considerably more than the integrator’s default value), as in some cases a smaller maximum number of time steps (such as 10,000,000) did not allow the adaptive integrator to reach the global peak response.

It was also required that the closed-form solution be evaluated at very closely spaced time increments in order to reliably find the peak acceleration. This strategy leveraged off of the previously computed numerical solutions, that is, the times at which the numerically obtained peak values occurred, in order to do a very fine-grained search (with the closed-form solution) in the neighborhood of the numerically computed peak value. Although a global list of search points could have been generated in other ways without the use of the numerical solution, using the points generated by the numerical integrator seemed like the most efficient approach. The strategy then was to use the numerically generated estimate of when the peak acceleration occurs and search within plus or minus some number of cycles of this time, at equally spaced increments in the forcing function phase. We found that searching within ±60 cycles with 1,440 phase increments per cycle (i.e. at 0.25° phase increments) yielded reliable results.

Figure 5 shows the peak response (from the exact solution) normalized by , the steady-state resonant response when the excitation frequency is equal to the undamped natural frequency of the system, plotted against the natural frequency of the system for three linear excitation sweep rates, , and Hz/min. The system has a critical damping ratio of and its natural frequency was varied from 0.25 Hz to 10 Hz in steps of 0.01 Hz. Each of the (almost 1,000) peak response values on each of these curves was computed via the process for computing peak acceleration (from the exact solution) described in Sections 7 and 8, that is, searching within ±60 cycles of the numerically generated estimate of when the peak acceleration occurs, with 1,440 phase increments per cycle. As can be seen, the attenuation of the peak response relative to the resonant steady-state response is significant for systems with low natural frequencies. As the natural frequency increases, which allows a greater number of response cycles during any given excitation frequency range, the attenuation decreases. These results are consistent with those published by others [6]. What is not consistent is the scalloped behavior of the peak curves at the lower frequencies. This behavior was obtained with both the numerically integrated results and the closed-form solution. Figure 6 shows an expanded close-up view of the lower-frequency range of Figure 5 and was generated by simply changing the horizontal plot range in Figure 5. The details visible in Figure 6 will be discussed in more detail later.

**Figure 5.** Normalized peak response plotted against natural frequency for several linear excitation sweep rates. Left-to-right curves correspond to top to bottom in key.

**Figure 6.** Close-up of the low-frequency range of Figure 5. Left-to-right curves correspond to top to bottom in key.

Another observation is that the peak response during a frequency sweep can exceed the steady-state resonant response. This is shown in Figure 7, where the normalized peak responses are shown for two sweep rates (Figure 7 was obtained from Figure 5 by simply adjusting the vertical plot range to focus on the overshoot portion of the response). This might seem counterintuitive, since the frequency of excitation is sweeping through the natural frequency and therefore does not dwell. However, the sweep causes a response that is at the natural frequency of the system and that decays as a function of the system damping. Once the sweep frequency passes the natural frequency, the total response is the response due to the excitation plus the free-decay response of the system at its natural frequency. This is what causes the beating in the response once the sweep frequency passes the natural frequency. The decaying free response plus the transient response to the swept excitation can combine to produce higher peak responses than the resonant response caused by harmonic dwell at the natural frequency. The overshoot observed here is consistent with the overshoot observed by Cronin [3].

**Figure 7.** Close-up of overshoot phenomenon observed in Figure 5. Left-to-right curves correspond to top to bottom in key.

Figure 8 shows the normalized peak response for various sweep rates plotted against the natural frequency squared divided by the linear sweep rate; this normalization allows comparison to results presented in the literature. The critical damping ratio for this system is . The data used in Figure 8 is the same as the data used in Figure 5, only plotted differently. Observe that the curves merge into one, as explained by Hawkes [5].

**Figure 8.** Normalized peak response for several linear sweep rates plotted against , where is the natural frequency and is the sweep rate.

In Figures 5 through 8, one observes periodic discontinuities in the derivative of the peak response curve. Moreover, the curve does not increase monotonically; sometimes it starts to dip down before hitting a discontinuity in slope and resuming its upward trend. One also observes that at very low frequencies the discontinuities in the derivative are not very regular, but as the natural frequency is gradually increased, they take on a much more regular nature. These discontinuities are best understood in terms of what we will call the “competing peaks” phenomenon, which can be most clearly explained by taking several observations into account:

- The peak response always occurs some time after the sweep frequency reaches the natural frequency of the system.
- As the natural frequency of the system is increased, the time at which the sweep frequency reaches the natural frequency occurs later and later, since for these problems the sweep frequency always started at 1/8 Hz.
- Thus the time at which the peak acceleration occurs can be expected to increase as the natural frequency is increased.
- In the array of plots shown in Figure 9, which show the response time histories for several very closely spaced values of , one observes that as the time at which the peak acceleration is reached increases, the “dominant” peak (i.e. the largest global peak) is eventually overtaken (from one value of to the next, i.e. from one plot to the next) by the secondary peak (i.e. the second-largest global peak), which has been increasing all along. So when this happens, the rate of change of the global peak suddenly changes, since it is now associated with a different peak, and thus there is a discontinuity in the slope of the peak response curve. These peak responses as a function of frequency are summarized in the plot insert at the lower-right corner of Figure 10.

**Figure 9.** Evolution of peak acceleration as natural frequency is increased (left to right, then top to bottom).

**Figure 10.** Evolution of peak acceleration as secondary peak overtakes the dominant peak. The first six points come from the preceding plots.

In principle, there are actually three possible types of behavior that can lead to discontinuities in the derivative of the peak response curve, and all can be understood in terms of the preceding logic:

- A decreasing peak is overtaken by an increasing peak (this is the case described in the preceding).
- An increasing peak is overtaken by a more rapidly increasing peak.
- A decreasing peak passes a more slowly decreasing peak so that the more slowly decreasing peak is now the dominant peak (possible in principle, but not observed in this example).

The later the peak (i.e. the larger the natural frequency of the system), the longer the system has to build up to a steady-state-like response, so that successive peak accelerations (corresponding to successively higher natural frequencies) attain higher and higher values, hence the overall upward general trend of the peak response curve. For this same reason, at high sweep frequencies successive peaks in the response versus time curve all have very similar amplitudes, so that when the natural frequency is changed slightly and one peak overtakes another, the difference in the rates at which the dominant and secondary peaks are increasing is extremely small and barely noticeable. Thus the peak response curve appears to be smooth at high frequencies.

As described earlier, in the octave sweep case, it is extremely helpful to first make a change of independent variable by letting . The resulting differential equation for then has a constant-frequency forcing term in the domain (at the expense of coefficients in the equation that are at most quadratic in time). The resulting differential equation for , equation (8), was solved both analytically (equation (13)) and numerically, and then transformed back to the time domain.

The time at which the sweep frequency equals the oscillator’s resonant frequency is given by equation (15). However, since the integration is being done in the domain, the corresponding expression for the value at which the system’s resonant frequency is reached becomes

(16) |

Since in the domain the forcing function is a constant-frequency sine wave, we found experimentally that in most cases it was sufficient to integrate to a maximum value of 1.5 , although occasionally it was necessary to go up to 3 or 4 times . In some cases it is possible for the value of to become less than 1, and so we also imposed a lower bound of 1.05 on the maximum value of . We again used fifth-order interpolation for computing derivatives in and again allowed the integration to go for a maximum of 100,000,000 time steps: recall that for octave sweep we used the substitution , so increases exponentially with , and thus the number of steps in the domain can become much larger than the number of time steps in the domain.

Once the differential equation (for a given set of , , and values) had been solved, the following procedure for finding the peak response was followed:

- Create a list of the values generated via numerical integration.
- Use equation (9) to evaluate (in the time domain) at each value and then from this list select the largest response.
- Use the data from steps (1) and (2) to also create an interpolating function for as a function of .
- Having found this initial estimate of the peak value, then use the interpolation function returned by to do a local optimization (via the function) around this initial peak, using this peak as a starting point.

Figure 11 shows the normalized peak response to various octave sweep rates. Each of the (almost 1,000) peak response values on each of these curves was computed via the process for computing peak acceleration (from the exact solution) described in Sections 7 and 8, that is, searching within ±60 cycles of the numerically generated estimate of when the peak acceleration occurs, with 1,440 phase increments per cycle. As expected, the slower the sweep rate, the lower the attenuation. In addition, the scalloped behavior in the peak response curves that was observed for the linear sweeps is also present here, although not as pronounced. This is because the octave sweep increases in frequency more rapidly than the linear sweep.

**Figure 11.** Normalized peak responses with and several values of octave sweep rate (octaves/minute). At low natural frequencies, , the peak response was computed in increments of 0.002 Hz. Left-to-right curves correspond to top to bottom in key.

Figure 12 shows an expanded view of Figure 11 corresponding to the lower frequency systems so that the scalloped behavior can be better seen. Figure 12 was obtained from Figure 11 by simply adjusting the vertical and horizontal plot ranges.

**Figure 12.** Expanded view of peak response curves with at low natural frequencies for various octave sweep rates (octaves/minute). Left-to-right curves correspond to top to bottom in key.

Figure 13 shows the results from Figure 11 normalized by the octave sweep rate, as suggested by Hawkes [5]. The data used in Figure 13 is the same as the data used in Figure 11, only plotted differently. As in the case with the linear sweep rate and its normalization factor, the octave sweep rate results also merge into a single curve for systems with the same critical damping ratio.

**Figure 13.** Normalized peak response curves for and various octave sweep rates plotted against , where is in Hz and is in octaves/minute.

Figure 14 shows comparable results to those in Figure 13 for systems with a critical damping ratio of .

**Figure 14.** Normalized peak response curves for and various sweep rates with , plotted against ; is in Hz and is in octaves/minute.

Figures 15 and 16 show the severe attenuation that occurs when the start frequency of the sweep is close to the natural frequency. In both figures, the sweeps were started at 1 Hz. As can be ascertained, the attenuation is significant for systems with natural frequencies close to or below 1 Hz, as would be expected. Hence, the attenuation is not only a function of the natural frequency, damping and sweep rate, but also of the proximity of the start frequency of the sweep to the natural frequency. As with Figure 11, each of the peak response values on each of the curves in Figures 14–16 was computed via the process for computing peak acceleration (from the exact solution) described in Sections 7 and 8.

**Figure 15.** Normalized peak response curves for octave sweep with and (instead of 1/8 Hz). Left-to-right curves correspond to top to bottom in key.

**Figure 16.** Normalized peak response curves for octave sweep with and (instead of 1/8 Hz). Left-to-right curves correspond to top to bottom in key.

The derivation of closed-form solutions for the responses of single-degree-of-freedom systems subject to linear and octave swept-frequency harmonic excitation was presented. The closed-form solutions were compared to results obtained by direct numerical integration of the equations of motion with excellent agreement obtained. In addition, an in-depth discussion was presented on the numerical difficulties associated with the gamma and error functions of complex arguments that are part of the closed-form solutions, and how these difficulties were overcome by employing exact arithmetic with infinite-precision numbers, that is, rational and/or exact symbolic numbers. This included a study of precision requirements by performing computations with numerical precision exceeding what is available on today’s computers. The closed-form solutions allowed the in-depth study of several interesting phenomena including: (a) computation of the peak response instead of the peak of the envelope function; (b) scalloped behavior of the peak response with frequent discontinuities in the derivative; (c) the significant attenuation of the peak response if the sweep frequency is started at frequencies near or above the natural frequency; and (d) the fact that the swept-excitation response could exceed the steady-state harmonic response when the system is excited at its natural frequency.

We are grateful to Luke Titus of Wolfram Research for his valuable suggestions on exact numerical computation.

This work was supported by contract # FA8802-14-C-0001.

[1] | F. M. Lewis, “Vibration during Acceleration through a Critical Speed,” Transactions of the American Society of Mechanical Engineers, 54(1), 1932 pp. 253–261. |

[2] | R. L. Fearn and K. Millsaps, “Constant Acceleration of an Undamped Simple Vibrator through Resonance,” The Aeronautical Journal, 71(680), August 1967 pp. 567–569. https://doi.org/10.1017/S0001924000055007. |

[3] | D. L. Cronin, Response of Linear, Viscous Damped Systems to Excitations Having Time-Varying Frequency, Ph.D. thesis, Dynamics Laboratory, California Institute of Technology, Pasadena, California, 1965. https://authors.library.caltech.edu/26518. |

[4] | R. Gasch, R. Markert and H. Pfutzner, “Acceleration of Unbalanced Flexible Rotors through the Critical Speeds,” Journal of Sound and Vibration, 63(3), 1979 pp. 393–409. https://doi.org/10.1016/0022-460X(79)90682-5. |

[5] | P. E. Hawkes, “Response of a Single-Degree-of-Freedom System to Exponential Sweep Rates,” Shock, Vibration and Associated Environments, Part II, Bulletin No. 33, February 1964 pp. 296–304. https://apps.dtic.mil/dtic/tr/fulltext/u2/432931.pdf. |

[6] | J. A. Lollock, “The Effect of Swept Sinusoidal Excitation on the Response of a Single-Degree-of-Freedom Oscillator,” in 43rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, 2002, Denver, CO. https://doi.org/10.2514/6.2002-1230. |

[7] | R. Markert and M. Seidler, “Analytically Based Estimation of the Maximum Amplitude during Passage through Resonance,” International Journal of Solids and Structures, 38(10–13), 2001 pp. 1975–1992. https://doi.org/10.1016/S0020-7683(00)00147-5. |

[8] | L. Ahlfors, Complex Analysis: An Introduction to the Theory of Analytic Functions of One Complex Variable, New York: McGraw-Hill, 2000. |

C. C. Reed and A. M. Kabe, “Peak Response of Single-Degree-of-Freedom Systems to Swept-Frequency Excitation,” The Mathematica Journal, 2018. https://doi.org/10.3888/tmj.21-1. |

Dr. Chris Reed is a Senior Engineering Specialist in the Structures Department at The Aerospace Corporation. As an applied mathematician, his work has encompassed mechanical vibrations, structural deformation, space-based sensor system performance, satellite system design optimization, flight termination system interference, fluid sloshing, electrostatic discharges, dielectric degradation on satellites and queueing systems. He has two patents and received a Wolfram Innovator award in 2017. His B.S. is from the California Institute of Technology and his M.S. and Ph.D. degrees are from Cornell University.

Dr. Alvar M. Kabe is the Principal Director of the Structural Mechanics Subdivision of The Aerospace Corporation. He has made notable contributions to the state of the art of launch vehicle and spacecraft structural dynamics. He has published numerous papers, is an Associate Fellow of the AIAA, and has received The Aerospace Corporation’s Trustees’ Distinguished Achievement Award and the Aerospace President’s Achievement Award. His B.S., M.S. and Ph.D. degrees are from UCLA.

**C. Christopher Reed**

*Senior Engineering Specialist
Structures Department
M4-912
The Aerospace Corporation
P.O. Box 92957
Los Angeles, CA 90009-2957
*

**Alvar M. Kabe**

*Principal Director
Structural Mechanics Subdivision
M4-899
The Aerospace Corporation
P.O. Box 92957
Los Angeles, CA 90009-2957
*

dx.doi.org/doi:10.3888/tmj.20-8

This article presents a numerical pseudo-dynamic approach to solve a nonlinear stationary partial differential equation (PDE) with bifurcations by passing from to a pseudo-time-dependent PDE . The equation is constructed so that the desired nontrivial solution of represents a fixed point of . The numeric solution of is then obtained as the solution of at a high enough value of the

pseudo-time.### 1. Introduction: Soft Bifurcation of a Stationary Nonlinear PDE

### 2. Numerical Description of a Soft Bifurcation: A Problem and a Workaround

#### 2.1. A Pseudo-Dynamic Equation

#### 2.2. A Critical Slowing Down

### 3. Example: A 1D Ginzburg–Landau Equation

### 4. Numerical Solution of the Ginzburg–Landau Equation

#### 4.1. Pseudo-Time-Dependent Equation

#### 4.2. Solution within a Finite Domain

#### 4.3. The Solution Norm and the Convergence Control

#### 4.4. The Critical Slowing Down in the Numeric Process

#### 4.5. In Search of the Bifurcation Point

#### 4.6. Varying the Boundary

### 5. Discussion

#### 5. 1. Nonzero Boundary Conditions

#### 5.2. Dimensionality

#### 5.3. A Supercritical (Soft) versus a Subcritical (Hard) Bifurcation

### 6. Summary

### References

### About the Author

]]>
pseudo-time.

The method described here can be applied to solve PDEs coming from different domains. However, it was initially developed to get the numerical solution of a stationary nonlinear PDE with a *bifurcation*. The method’s application to a broader class of equations is briefly discussed at the end of the article.

The term “bifurcation” describes a phenomenon that occurs in some nonlinear equations that depend on one or several parameters. These equations can be algebraic, differential, integral or integro-differential. At some values of a parameter, such an equation may exhibit a fixed number of solutions. However, as soon as the parameter exceeds a critical value (referred to as the *bifurcation point*), the number of solutions changes and either new solutions emerge or some old ones disappear. To be specific, we discuss the case of dependence on a single parameter .

The new solutions can emerge continuously at the bifurcation point. The norm of the solution exhibits a continuous though nonsmooth dependence on the parameter at the bifurcation point (left, Figure 1). An explicit example is in Section 4.5. A bifurcation at which the solution is continuous at the bifurcation point is referred to as *supercritical* or *soft*.

The behavior of the solution in the case of a *subcritical* or *hard* bifurcation is different: the norm of the solution is finite at the bifurcation point but has a jump discontinuity there (right, Figure 1).

**Figure 1.** Soft versus hard bifurcation. In the case of the soft bifurcation, the solution has a continuous dependence of the solution norm on the control parameter , with a kink at the bifurcation point, . In contrast, in the case of a hard bifurcation, the solution is discontinuous at the bifurcation point.

In this article, we focus only on the case of a nonlinear PDE with soft bifurcations; some peculiarities of hard bifurcations are briefly discussed in Section 5.3.

In the most general form, a nonlinear PDE can be written as:

(1) |

Here so that (1) indicates a system of nonlinear PDEs; is an -dimensional vector representing the dependent variable. The subscript indicates that is the solution of a stationary equation. Further, is a -dimensional vector. Finally, is a real numerical parameter. The system of equations (1) is analyzed in a domain subject to zero Dirichlet boundary conditions:

(2) |

Also assume that

(3) |

and thus represents a trivial solution of (1, 2).

It is convenient to separate out the linear part of the operator (1), which is often (though not always) representable in the form and to write it down in the following form:

(4) |

Here is a linear differential operator (such as, for example, the Laplace operator). Further, is the nonlinear part of the operator . The assumption that solves equation (1) implies that .

In its explicit form, we use the representation (4) only in Section 2.2, where we derive the critical slowing-down phenomenon. In all other cases, a general form of the dependence of equation (4) on is valid: and . Nevertheless, we stick to the form (4) for simplicity, while the generalization is straightforward.

Let us also consider an auxiliary equation

(5) |

that yields the linear part of the nonlinear equation (4). Equation (5) represents the eigenvalue problem, where the are its eigenfunctions and the are its eigenvalues, indexed by the discrete variable , provided the discrete spectrum of (5) exists. Let us assume that at least a part of the spectrum of (5) is discrete. We assume here that starts from zero: . The state with is referred to as the *ground state*.

Without proofs, we recall a few facts from bifurcation theory [1] valid for soft bifurcations of such equations.

Assume that the trivial solution is stable for some values of . As soon as the parameter becomes equal to the smallest discrete eigenvalue of the auxiliary equation (5), this solution becomes unstable. As a result, a nontrivial solution branches off from the trivial one. In the close vicinity of the bifurcation point , this solution has the asymptotics

(6) |

where is the set of eigenfunctions of the equation (5) belonging to the eigenvalue . The vector is the set of amplitudes. The scalar product stands for the expression . Here the index (where ) enumerates the eigenfunctions in the -dimensional subspace of the functional space where (5) has a nonzero solution. The exponent exceeds unity: .

There are a few methods available to determine . Listing them is out of the scope of this article. However, the simplest of these methods can be applied if there exists a generating functional enabling one to obtain the system of equations (1) as its minimum condition:

(7) |

where is the variational derivative. This functional we refer to as *energy* in analogy with physics. Substituting the representation (6) into the energy functional and integrating out the spatial coordinates, one finds the energy as a function of the amplitudes and parameter . Minimizing the energy with respect to the amplitudes yields the system of equations for the amplitudes, referred to as the *ramification equation*:

(8) |

Their solution is only accurate close to the bifurcation point . Assuming that the bifurcation takes place with decreasing (as is the case in the following example), one finds the typical solution for the amplitudes,

(9) |

where and are real numbers to be determined using the original equation. One of the methods to analytically find these parameters is discussed in Section 3. Further analytical methods may be found in [1]. This article focuses on finding these parameters numerically (Section 4.5).

All theorems and proofs for the preceding statements, along with more general methods of the derivation of the ramification equation, can be found in [1].

The bifurcation theory formulated so far is quite general: equation (1) can be differential, integral or integro-differential [1]. In what follows, we focus only on a more specific class of nonlinear partial differential equations.

The solution of the spectral system of equations (5) yields the bifurcation point ; the solutions (6) and (9) are only valid very close to this point. With increasing , the solution soon deviates from the correct behavior quantitatively, and the solution often fails to resemble (6) even qualitatively. For this reason, to get the solution at some finite that would be correct both qualitatively and quantitatively, one needs to solve (1) numerically.

In the case of a hard bifurcation, none of the machinery of the theory of soft bifurcations described so far works. Studying the bifurcation numerically often becomes the only possibility.

However, the direct numerical solution of nonlinear equations like (1) and (4) with some nonlinear solvers only returns the trivial solution for equation (4), even at the values of the parameter at which the trivial solution is unstable and a stable nontrivial solution already exists.

A plausible reason may be as follows: the solver starts to construct the PDE solution from the boundary. Here, however, the boundary condition is already part of the trivial solution. Thus the solver appears to be placed at the true solution of the equation and is then unable to climb down from it.

To find a nontrivial solution, one needs to use a method that would start from some initial approximation that, even if rough, should be quite different from the trivial solution. Furthermore, this method should converge to the nontrivial solution by a chain of successive steps.

One can do this with the pseudo-dynamic approach formulated in the present article.

Let us introduce pseudo-time . The word “pseudo” indicates that is not real time. It just represents a technical trick that helps with the simulation. Assume now that the dependent variable is a function of both the set of spatial coordinates and the pseudo-time: . Instead of the stationary equation (1), let us study the behavior of the pseudo-time-dependent equation:

(10) |

One solves equation (10) with a suitable nonzero initial condition . Let us stress that the solution of the time-dependent equations (10) is not the same as the solution of the stationary equation (1).

One could also construct the pseudo-time-dependent equation as follows: , that is, with a minus sign in front of . The idea of such an extension is that either or exhibits a fixed point, so that , while the other diverges as . By trial and error, one chooses the equation whose solution converges to the fixed point .

The operator has not yet been specified; for definiteness let us assume that the fixed point at takes place for equation (10), that is, with the plus sign in front of .

The convergence of the solution of the dynamic equation to the fixed point enables one to apply the following strategy. Instead of the static equation (1), which is difficult to solve numerically, one simulates the quasi-dynamic equation (10) using a suitable time-stepping algorithm.

The advantage of this approach is in the possibility of starting the simulation from an arbitrary distribution chosen as the initial condition, provided it agrees with the boundary conditions. From the very beginning, such a choice takes one away from the trivial solution. The time-stepping process takes the initial condition for each step from the previous solution. The solution starting from any function gradually converges to with time if belongs to its attraction basin.

After having obtained the solution of the pseudo-time-dependent equation, one approximates the function , as at a large enough value of the pseudo-time . The meaning of the words “large enough” is clarified in Section 4.3.

The approach can be given a pictorial interpretation (Figure 2). In the infinite-dimensional functional space, let be an infinite set of basis functions. Then the function can be represented as

(11) |

**Figure 2.** Schematic view of the 3D projection of the infinite-dimensional functional space with a trajectory from the initial state (blue dot) to the fixed point (red dot).

The trajectory in this space goes from the initial state to the final state , as shown by the two dots.

The time derivative represents the velocity of the motion of a point through this space, while can be regarded as a force driving this point. Thus equation (10) can be interpreted as describing a driven motion of a massless point particle with viscous friction through the functional space. In these terms, the condition (1) means that the driving force is equal to zero at some point of the space, which is the location of the fixed point of the nonlinear equation (10).

If the energy functional for equation (1) exists, one can make one further step in the interpretation (Figure 3).

**Figure 3.** Schematic view of the energy functional as the function of the coordinate in the functional space (A) above and (B) below the bifurcation point. The cross section of the infinite-dimensional space along a single coordinate is shown. The points show initial positions of the particle, while the arrows indicate its motion to the nearest minimum of the potential well.

Indeed, according to the definition given, equation (1) delivers a minimum to the energy functional. In this case, one can regard the dynamic equation (10) as describing a viscous motion of the massless point particle along a hypersurface in the -dimensional space, , the surface forming a potential well. The motion goes from some initial position to the minimum of the potential well as shown schematically in Figure 3. Above the bifurcation, this minimum only corresponds to the trivial solution (A) situated at . Below the bifurcation, the energy hypersurface exhibits a new configuration with new minima, while the previous minima vanish. As a result, below the bifurcation, the point particle moves from the initial position (shown by dots in Figure 3) to one of the newly formed minima (as the red and green arrows show in B). The functional space has infinite dimension, and essential features of the numeric process may involve several dimensions. The D representation displayed in Figure 3 is therefore oversimplified and only partially represents the bifurcation phenomenon.

Equation (10) can be rewritten as:

(12) |

Though lacking a stationary nonlinear solver at present, Mathematica offers the option , efficiently applicable to dynamic equations like (12). This method is applied everywhere in the rest of this article.

The evident penalty of this approach is that the computation time can become large, especially in the vicinity of the bifurcation point; this peculiarity is discussed next.

Close to the critical point , the relaxation of the solution to the fixed point dramatically slows down. This is referred to as *critical slowing down*. Its origin is illustrated in Section 4. To simplify the argument, let us consider a single equation with the one-component dependent variable that still depends on the D-dimensional coordinate . The generalization for a system of equations is straightforward, though a bit cumbersome.

According to (6), close to the bifurcation point, one can look for the solution of equation (12) in the form:

(13) |

Ignore the higher-order terms, assuming that is small. Substitute (13) into the first equation (12) and linearize it. Here one should distinguish between the case at , where the linearization should be done around , and that at , where one linearizes with the center at (the second line of equation 9). In the former case, one finds

Making use of (5), one finally obtains the dynamic equation for at :

(14) |

implying that , and the relaxation time has the form .

At , analogous but somewhat more lengthy arguments give the characteristic time, twice as small as that above the critical point. One comes to the relation:

(15) |

One can see that the relaxation time diverges with from both sides. From the practical point of view, this suggests increasing the simulation time according to (15) near the critical point.

The result (15) is valid for equation (12), in which the linear part of the pseudo-dynamic equation has the form . That is, the parameter enters this equation only linearly, in the form of the product . In the general case , one still finds diverging relaxation time , though the factors (such as above, and below the bifurcation point) may be different.

The phenomenon of critical slowing down was first discussed in the framework of the kinetics of phase transitions [2].

As an example, let us study the 1D PDE:

(16) |

where is the dependent variable of the single coordinate . This equation exhibits a cubic nonlinearity . A classical Ginzburg–Landau equation only has constant coefficients for the terms and . In contrast, equation (16) possesses the inhomogeneity with

(17) |

shown by the solid line in Figure 4. It thus represents a nonhomogeneous version of the Ginzburg–Landau equation. One can see that (16) has the trivial solution .

**Figure 4.** The potential from equation (17) (solid, red) and the solution of the auxiliary equation (18) (dashed, blue).

Equations (16) and (17) play an important role in the theory of the transformation of types of domain walls into one another [3].

The auxiliary equation (5) in this case takes the following form:

(18) |

where enumerates the eigenvalues and eigenfunctions belonging to the discrete spectrum. One can see that equation (18) represents the Schrödinger equation [4] with potential well (17) and energy .

The exact solution of the auxiliary equation (18) is known [3, 4]. It has two discrete eigenvalues when and , and the ground-state () solution has the form

(19) |

which can be easily checked by direct substitution.

The energy functional generating the Ginzburg–Landau equation (16, 17) has the form:

(20) |

Equation (6) can be written as . Substituting that into equation (20) for the energy, eliminating the term with the derivative using equation (18) and applying the Gauss theorem, one finds the energy as a function of the amplitude :

(21) |

The *ramification* equation takes the form :

(22) |

with the following solution for the amplitude:

(23) |

Let us now look for the numerical solution of equation (16). The problem to be solved is to find the point of bifurcation and the overcritical solution at . The pseudo-time-dependent equation can be written as:

(24) |

The choice of the initial condition is not critical, provided it is nonzero. The method of lines employed in the following is relatively insensitive to whether or not the initial condition precisely matches the boundary conditions. We demonstrate its solution with three initial conditions

in the in the next section.

The method of lines is applied here since it can solve nonlinear PDEs, provided these equations are dynamic, which is exactly the case within the pseudo-time-dependent approach.

To address the problem numerically, let us start with the initial conditions taken at a finite distance, rather than at infinity. The distance must be greater than the characteristic dimension of the equation, which is the distance for which exhibits a considerable variation. For the Ginzburg–Landau equation (16), the characteristic dimension is defined by the width of the potential for (17), which is about 1. That is, let us start with the boundaries at with . We check the quality of the result obtained with such a boundary later.

To obtain a precise enough solution, one needs to make a spatial discretization providing a step comparable to the characteristic dimension of the equation, which we just saw is of the order of . Therefore, a step that is small enough can be a few times . The value appears to be enough.

The following code solves the equation. To keep the discretization with the step comparable to the characteristic equation dimension, we chose .

To avoid conflicts with variables that may have been previously set, this notebook has the setting Evaluation ▶ Notebook’s Default Context ▶ Unique to This Notebook.

According to Section 2, the time-dependent solution obtained converges to the solution of the stationary problem . In practice, however, one can instead take some finite value, provided that it is large enough.

We solve the pseudo-dynamic equation (24) with each of the three initial conditions stated before.

Further, in order to give the feeling of the method, we visualize and animate the solution, varying as well as the initial conditions. This requires a few comments. As discussed in Section 2.2, the maximum time of simulation strongly depends on . This is accounted for by introducing according to (15), where was chosen by trial so that the simulation does not last too long, but also so that the value of always ensures the convergence for any combination of and initial condition.

In the simulations, you can observe two essential features of the present method.

First, near the fixed point, the solution converges more slowly and the curve gradually appears to stop changing.

Second, near the critical point, close to , the critical slowing down (see Section 2.2) takes place, which requires considerably longer to approach the fixed point. In the animation, the curve evolves much more slowly at and , and the convergence, therefore, requires much more time.

In the , choose one of the three initial conditions and a value of . Click the button with the arrow to start the animation. The value of the current time is shown at the top-left corner. The distribution shown by the blue curve at corresponds to the initial condition, while at the animation shows its further evolution.

For each of the three initial conditions, the solution converges to the same bell-shaped curve. One can make sure that for low , the solution is nonzero. However, for greater than about 0.5, the solution is trivial.

To get an accurate solution, one needs to control the convergence as the pseudo-time increases. Here we control the convergence by analyzing the behavior of the integral

(25) |

(the norm of the solution in Hilbert space) at a fixed value of the parameter as a function of . The norm is zero above the bifurcation but nonzero below it.

We show how depends on the time limit at three fixed values of the control parameter : , and , which are all below the bifurcation point .

The following code makes a nested list containing three sublists corresponding to the three values. Each sublist consists of pairs at different values of the simulation time , which increases from 10 to approximately 3000. The exponential rate of increase is chosen so as to make the plot on a semilogarithmic scale look equally spaced (Figure 5).

**Figure 5.** Semilogarithmic plots of the Hilbert norm of the solution for (disks), (squares) and (diamonds) depending on the simulation time, .

There is convergence for all three values of . However, the value of for which the convergence is satisfactory depends on . For example, at the solution at slightly exceeding 100 is already near convergence. Thus, with , one can be sure that the solution is satisfactory. We use this in Section 4.4 to determine the expression for accounting for the critical slowing down.

In contrast, the solution for shows some evolution even at .

As we showed in Section 2.2, the value that gives satisfactory convergence depends on . To get an accurate solution, must considerably exceed the relaxation time . For example, in the calculation of the result shown in Figure 4, substituting and into (15), one finds , while the convergence only becomes good enough at , which is eight times greater than . This implies that to find an accurate solution in the close vicinity of the bifurcation point, one has to define depending on by

(26) |

where is the regularization parameter.

The bifurcation point can be found by analyzing the same integral calculated at in (26). Let us denote . This time we study the integral as a function of the parameter .

The transition from to occurs at the bifurcation point. Accordingly, the integral at this point changes from to .

To find the critical point, bifurcation theory (23) predicts the norm to be expressed in the form:

(27) |

We find the constant parameters and by fitting.

We now find the numerical solution of the equation (16) as a function of the control parameter ; the norm obtained from this solution depends on . We vary from 0.45 to to create a list consisting of pairs . The most critical region for dependence is close to the critical point, so the points there are taken to be about 10 times more dense. This list is fitted to the function (27). The list is plotted with the analytic function obtained by fitting (Figure 6).

**Figure 6.** Behavior of the Hilbert norm of the solution in the vicinity of the bifurcation point. Dots show the integrals (25), while the solid line indicates the result of fitting with the relation (27), yielding .

The values of the integrals at various are shown by the red dots in Figure 6, while its fitting curve is shown by the solid blue curve. The fitted value of the bifurcation point is and .

We used equation (26) for the used in the solution. However, this equation depends on the spectral value . In the present case, the value was known, which considerably simplifies the task. In general, the value of is only established in the course of the fitting procedure, requiring an iterative approach. For the first simulation, we fix some large enough value of independent of and obtain a fit. This fit gives the first guess for , which can then be used for the simulation with the equation (26). This procedure can be repeated until a satisfactory is achieved.

To check how the choice of the boundary affects the results, we solve the problem by gradually increasing (Figure 7). (This takes some time.)

**Figure 7.** A double-logarithmic plot showing the convergence of the bifurcation point with

increasing .

Figure 7 displays the error in the spectral value obtained by the numerical process. As one could have expected, with the increase of , it decreases from to about .

The preceding example has shown the application of the pseudo-dynamic approach for solving a 1D nonlinear PDE with zero boundary conditions that exhibits a supercritical (soft) bifurcation. That simple problem was chosen to keep the processing time as short as possible. Now possible extensions are discussed.

Recall that zero boundary conditions often (if not always) represent a problem for a nonlinear solver. Starting from along the boundary, such a solver often only returns the trivial solution, since zero is, indeed, the solution of the equation considered here. For this reason, a solution to a problem like the one discussed in this article necessarily requires some specific approach that can converge to a nontrivial solution. It is for this type of equation that the approach presented here has been developed.

One should, however, make two comments.

First, there are numerous problems where the bifurcation takes place from a solution that is nonzero. The boundary condition in this case has the form . A trivial observation shows that one comes back to the original problem by the shift .

Second, the approach formulated here can be applied to nonlinear equations with no bifurcation. These equations can have boundary conditions that are either zero or nonzero. Indeed, such equations can often be solved by a nonlinear solver if one is available. Among other approaches, the present one can be applied; the nonzero boundary conditions are not an obstacle for the transition to the pseudo-time-dependent equation.

Though the present approach takes longer, in certain cases it is preferable; for example, when due to a strong nonlinearity the nonlinear solvers fail. The solver moves along the pseudo-time parameter in small steps from to , gradually passing from the initial condition to the final solution. Such a slow ramping can be stable.

The space dimensionality does not limit the application of our approach (for 2D examples, see [5, 6]).

In the case of a soft bifurcation, the energy can have only one type of minimum, as shown in Figure 2 describing the convergence either to the trivial or the nontrivial solution. The trajectory always flows into the minimum along the steepest slope of . The minimum is a fixed point.

An essentially different situation occurs for a hard bifurcation, when the hypersurface may have multiple minimums. Figure 8 (A) shows a schematic cross section of the infinite-dimensional functional along the plane, leaving out all other dimensions. This cross section shows the situation with minima of different types. One of these minima is more pronounced than the others. The arrows schematically indicate the trajectories in the functional space. These start from the initial conditions displayed by the dots in Figure 8 (A, B) and converge to the minima (Figure 8 A). The green arrow shows the convergence of the process to the principal minimum, while the red one converges to a secondary minimum.

**Figure 8.** Schematic view of the energy functional along a direction of the functional space, where it exhibits a metastable minimum (A). The green point schematically indicates the initial condition starting from which the solution converges to the one corresponding to the principal energy minimum (green arrow), while the red dot shows the initial condition leading to the convergence to the secondary minimum. (B) The trajectory ends at an inflection point.

As a result, depending on the choice of initial condition, some solution trajectories may end up at a fixed point that is a secondary minimum rather than in the main one.

Also, keep in mind that the dimension of the functional space is infinite and can have many unobvious secondary minima.

There can also be inflection and saddle points of the energy hypersurface (Figure 8 B). The trajectory completely stops at such a point.

It is a fundamental question whether or not such secondary fixed points as well as the inflection points belong to the problem under study. The answer is not straightforward. One should look for such an answer based on the origin of the equation.

Let us also mention possible gently sloping valleys in the energy relief. In this case, the motion along such a shallow slope may appear practically indistinguishable from an asymptotic falling into a fixed point during the numerical process.

This article offers an approach to solve nonlinear stationary partial differential equations numerically. It is especially useful in the case of equations with zero boundary conditions that have both a trivial solution and nontrivial solutions. The approach is based on solving a pseudo-time-dependent equation instead of the stationary one, the initial condition being different from zero. Then the solver can avoid sticking to the trivial solution and is able to converge to a nontrivial solution. However, the penalty is increased simulation time.

[1] | M. M. Vainberg and V. A. Trenogin, Theory of Branching of Solutions of Non-linear Equations, Leyden, Netherlands: Noordhoff International Publishing, 1974. |

[2] | E. M. Lifshitz and L. P. Pitaevskii, Physical Kinetics: Course of Theoretical Physics, Vol. 10, Oxford, UK: Pergamon, 1981 Chapter 101. |

[3] | A. A. Bullbich and Yu. M. Gufan, “Phase Transitions in Domain Walls,” Ferroelectrics, 98(1), 1989 pp. 277–290. doi:10.1080/00150198908217589. |

[4] | L. D. Landau and E. M. Lifshitz, Quantum Mechanics: Course of Theoretical Physics, Vol. 3, 3rd ed., Oxford, UK: Butterworth-Heinemann, 2003. |

[5] | A. Boulbitch and A. L. Korzhenevskii, “Field-Theoretical Description of the Formation of a Crack Tip Process Zone,” European Physical Journal B, 89(261), 2016 pp. 1–18. doi:10.1140/epjb/e2016-70426-6. |

[6] | A. Boulbitch, Yu. M. Gufan and A. L. Korzhenevskii, “Crack-Tip Process Zone as a Bifurcation Problem,” Physics Review E, 96(013005), 2017 pp. 1–19. doi:10.1103/PhysRevE.96.013005. |

A. Boulbitch, “Pseudo-Dynamic Approach to the Numerical Solution of Nonlinear Stationary Partial Differential Equations,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-8. |

Alexei Boulbitch graduated from Rostov University (USSR) in 1980 and obtained his Ph.D. in theoretical solid-state physics in 1988 from this university. In 1990 he moved to the University of Picardie (France) and later to the Technical University of Munich (Germany). The Technical University of Munich granted him his habilitation degree in theoretical biophysics in 2001. His areas of interest are bacteria, biomembranes, cells, defects in crystals, phase transitions, physics of fracture (currently active), polymers and sensors (currently active). He presently works in industrial physics with a focus on sensors and gives lectures at the University of Luxembourg.

**Alexei Boulbitch**

*Zum Waldeskühl 12
54298 Igel
Germany*

dx.doi.org/doi:10.3888/tmj.20-7

This article is a summary of my book *A Numerical Approach to Real Algebraic Curves with the Wolfram Language* [1].
### 1. Introduction

### 2. Lines

### 3. Important Definitions

### 4. Topology and Tracing

### 5. A Classical Interlude

### 6. Fractional Linear Transformations

### 7. Applications to Geometry

### 8. The Möbius Band Model of the Real Projective Plane

Map from affine plane to rectangular hemisphere.
Map from rectangle to Möbius band.
Plot Möbius band with infinite line.
### 9. Diamond Diagrams

### 10. Conclusion

### Acknowledgments

### References

### About the Author

]]>
The nineteenth century saw great progress in geometric (real) and analytic (complex) algebraic plane curves. In the absence of an ability to do the large number of computations for a concrete theory, the twentieth century saw the abstraction to algebraic geometry of this material. Ideas of ideals, rings, fields, varieties, divisors, characters, sheaves, schemes and many types of homology and cohomology arose. The added benefit of this approach is that it became possible to apply geometric techniques to other fields. Probably the most striking accomplishment of this abstract approach was the solution of Fermat’s problem by Wiles and Taylor at the end of the century.

The plane geometric curve theory of the nineteenth century was collateral damage. All modern books on the subject want to follow the abstract approach, which raises the bar for those who want to know this theory. In addition, little attention was given to the concrete geometric theory. One goal of my book is to rectify this problem; substituting software for the abstract theory, we can give the theory in terms the non-mathematician can follow.

Since most algebraic curves have only finitely many rational points, I work numerically. The methods are constructive, heuristic and visual rather than the traditional theorem-proof of contemporary mathematics. In fact there is a fundamental oxymoron at the heart of my approach: a numerical algebraic curve is the solution set of an equation , where is a polynomial with integer or machine-number coefficients. Evaluating this polynomial at a point with machine-number coordinates gives a machine number on the left-hand side, while the right-hand side is a symbolic number, so actual equality is impossible. So my book is not an algebraic geometry book. Having worked during my career as a mathematician in both the abstract and numerical realms, I believe that while these approaches are incompatible, they can and should coexist within mathematics.

We will generally describe an algebraic plane curve by giving a polynomial in two variables with integer or real machine-number coefficients.

From an operational point of view, with an exception noted later, for a given curve we accept the output of `NSolve[{f,g},{x,y},{y,y _{0}}]` and

For example, suppose some calculation claims is a point on . (If you have set values in your session for , and so on, now is the time to store them if needed and apply to them.)

We find a random line containing and use to check the point of intersection of with .

We see the residue is not zero.

But can be reconstructed from .

It checks.

The simplest example of an algebraic plane curve is a line. The first problem for lines is to find the equation of a line through two given points. We give our solution, found at the beginning of Chapter 1 of [1], as it will give the flavor of our approach to this subject.

Let , be the given points. The desired equation is of the form

We thus consider the coefficients , , as unknowns, but , as coordinates of the given points. So we have two equations in the three variables , , .

But this system is underdetermined. It is also not symmetric in the variables, so we use a dummy variable and add a third equation to get the system

(1) |

where , , are random real numbers.

Suppose the points are and . Here are the random reals.

Then the line is constructed as follows.

Perhaps this is not what you expected. But we are working with machine numbers so, particularly if this is not our final answer, we should not mind. If this does still bother us, we can always look for integers.

But system (1) gives more options. Suppose instead we were given the point and slope 2. We can change the second equation by setting , and .

Since our original already had slope 2, we are now not surprised to get the same result. Now consider the possibility that our line was given parametrically.

This time we replace the second equation in (1) with , , and again . We solve the

new system.

This is the same answer, because again we have the line through and

We can put all of this into one program if we simply make the convention that a slope or direction vector is denoted by a triple with third coordinate 0. So here is our universal code for creating a line.

Our results will differ from the previous ones because we are now choosing the random numbers each run but normalizing the output. The advantage is that each run gives the same answer up to a factor of .

Computing a point far away from by taking in our parametric equation, we get approximately .

So we can consider to be the *infinite point* on the line. But putting in our function gave us the same thing, so these infinite points are *homogeneous*; that is, they can be multiplied by a scalar getting the same infinite point. Note also that adding a coordinate 1 to a coordinate pair *homogenizes* a Cartesian point of the plane.

In Chapter 5 of [1], we find that we have invented the *projective plane*. So that we do not get confused, we will henceforth call points (pairs) of our standard Cartesian plane *affine points* and the triples *projective points*.

The method for finding equations of lines can be generalized to find curves of degree through sufficiently general points. See [2] for the code of (here stands for affine).

We do define two families of curves that are used extensively as examples in [1]. The first are *Gaussian curves*. We start with a single variable polynomial , typically with integer coefficients but possibly complex integers such as . Replace by ; after expanding, the formal real part forms a curve. Gauss used this construction in his first and fourth proofs of the fundamental theorem of algebra, published 50 years apart.

For example, the following is said to be Gauss’s original example for the fourth proof. Note that it has a singular point!

A second family of curves I call Newton’s hyperbolas. Here can range from 1 to .

The *total degree* of a plane curve is an important invariant, but not quite as simple in the numerical case as it may seem. Small coefficients of the highest degrees matter little near the origin but strongly affect the asymptotic and infinite behavior of the curve. Therefore, we approach this symbolically using .

Sometimes a little care is necessary to make sure that coefficients that are the result of roundoff error only are not allowed to increase the degree; a judicious use of may be required.

Because we are often working numerically, we use a slightly stronger criterion for a plane curve to be called regular at a point on . The quantity is known as the *Jacobian determinant* of the intersection of the curve and line at . We say *is a regular point of * if the Jacobian is not numerically zero for almost all pairs , of machine numbers. In practice, this can be checked by letting , be random real numbers. For a regular point , a *tangent line* is defined as follows.

On the other hand, a point of is called *singular* if the Jacobian is zero at for all numbers , . Again, in practice it is enough to check for a random pair , .

An alert reader may notice that since we are working constructively, *regular* and *singular* are not logical negations of each other, but a practical test does distinguish regular from singular points.

An important kind of point for [1] is a *critical point*. A point on curve is critical if it is also on the curve defined by . All real critical points of a curve can be found easily in practice by the following.

Unlike the conditions regular and singular, which are invariant under transformations such as translation, being a critical point is a positional property. Among the critical points are local extrema of the distance from the origin to a point on the curve and, by our definition, singular points. The most important thing about critical points is that *every affine topological component of a plane curve contains at least one critical point*. This means that from our simple function for finding critical points, we will be able to locate all components on the curve, no matter how small—even one-point components.

Consider the following contrived example of a numerical cubic curve, which has an isolated point.

The point with coordinates is a one-point component of the curve `h1`.

The same idea allows us to find the closest point on a curve to a given point in the plane.

In this case, the closest point may be one invisible on a plot.

We may also find the infinite points of a curve. Here is code that is slightly different from [1] but avoids subroutines. This uses a random variable so that different runs give the infinite points in possibly a different order.

Here is an example using Newton hyperbola 376.

We start with an idea Gauss used in his 1849 proof of the fundamental theorem of algebra characterizing real plane curves. Given a bivariate polynomial , Gauss considered the semialgebraic set . *The algebraic curve ** is the complete topological boundary in ** of *.

Among other things, this nicely solves our conundrum as to the precise meaning of the curve when is a polynomial with machine-number coefficients, as the inequality does make sense numerically.

Another consequence of this definition is that for each regular point of the curve, a line different from the tangent line intersecting the curve at this point travels from to the negative set at . We will see later in this section that a curve defined by a square-free has only finitely many singular points, so a contour plot gives a reasonable picture of the curve in a bounded region with appropriate scaling. Contour plots may miss large parts or all of the curve if the polynomial has a factor repeated an even number of times. Fortunately, if is a polynomial with integer coefficients, then the built-in function finds the repeated factors, and one can produce a square-free polynomial with the same curve. For machine-coefficient polynomials, there is a function given in Appendix 1 of [1] and in [2] that can check to see if is square free and if not, produce a square-free polynomial giving the same curve.

This last paragraph also tells us that the complement of a square-free curve is two colored, with the curve separating the colors. In particular, an algebraic real plane curve cannot have bifurcations [3]. That is, the following cannot be a plot of an algebraic curve.

There are always an even number of *branches* going in and out of singular points, an essential idea we will use in the next section.

For now, the main use of the Gauss point of view is that a square-free curve is oriented; that is, we can specify a direction of travel along the curve. In his proof, Gauss proposed “walking along the curve” with on our right. Essentially, we are traveling around topological components clockwise. As an aside, the curve Gauss was using is our Gaussian curve of the particular complex univariate polynomial that he was proving has a zero. Thinking of points of the plane as complex numbers, Gauss showed the walker would always stumble over a zero of .

We implement this by noting that for regular points, this right-hand direction is given by the vector , so we can use the following code (`g` stands for Gauss, `T` for tangent, and `vec` for vector).

Example: Consider the curve . (In the PDF and HTML versions, the graphic is not interactive.)

This leads to path tracing. In [1], we consider various methods, including using a method based on the built-in . Here, we use a very common method given by the following.

This function traces from point to point in the direction defined by with steps of size . By default, it stops after 40 steps, but that can be changed by an option. If is the wrong direction from , this fails with a warning. The direction can be changed by replacing the curve by . If there is a singular point in the path between and , then this will likely get hung up there. The key is that one can trace into a singularity, but not out. Normally, we use critical points for the endpoints , , but we may need to add points between singularities.

The bow curve is a good example of using path tracing.

We proceed as follows with the positive direction clockwise around the positive region, but always into the singularity at

In [1], we develop a number of utility functions to make tracing easier and do many examples, particularly of Gaussian curves. But the main point we are making is that a square-free curve can be reasonably approximated by a piecewise linear curve, and the instructions to do so can be given by a graph (network) consisting of the endpoints of each trace as vertices with the direction traveled, not traced, as directed edges. Here is the graph for the previous example.

In this section, we touch base with contemporary algebraic geometry. We operate in the *real and complex projective planes * and .

Our construction follows our discussion on lines in Section 2. A point in the real (or complex) projective plane is a triple of real (or complex) numbers so that not all of , , are zero. Two such triples that differ by a nonzero real (or complex) multiple are considered the same. For example, if , then loosely speaking, is an affine point. We called points infinite points; in the projective plane they are just points. Just as we added a variable for the third coefficient in the equation of a line, in the projective plane we again add a third variable for equations. We call this *homogenization*. Now we want all of our monomials to have the same total degree. The next function homogenizes a bivariate polynomial.

That is, if we are working with a polynomial of degree , a monomial is converted by . There is a 1-1 correspondence between two-variable monomials of total degree less than or equal to and three-variable monomials of degree exactly .

If particular , , with , then also, so being a zero is a property of the projective point . Thus *projective curves* are the zero set of homogeneous polynomials in three variables. Also in the example for the bow curve , is a point of the homogenization of , which means is an infinite point of .

The opposite of homogenization is *specialization*. We can substitute the number 1 for any of the three variables in a homogeneous polynomial and get a two-variable polynomial that is in general nonhomogeneous. For example, if we homogenize and then specialize at , we get back the original. But specializing at or produces a new polynomial.

We say is a singular curve if any complex projective singular point exists. So may be a singular curve even though there are no affine singularities. We do this partly to be consistent with the algebraic geometers, but also because singular curves (even with infinite or complex singularities) do behave differently from regular curves.

Likewise, a curve is *reducible* if its homogenization is reducible over the complex numbers. Because homogenization preserves polynomial multiplication, the homogeneous polynomial is reducible if and only if all its specializations are reducible. It is fairly rare that a bivariate real polynomial has complex factors, but an important class of examples is the homogeneous functions in two variables. These always factor into linear factors, but some factors may be complex. Consider the next example.

This seems to be irreducible, but the plot appears to be a straight line rather than a cubic. Furthermore, it is singular at . Think of this curve as a homogenization of a polynomial of one variable and specialize at .

So gives a complex numerical factorization of ; the two complex factors are invisible on the contour plot.

Related to singular points are intersection points. Here is an example.

We say the intersection of these curves at has *multiplicity* 8. To explain what this means, particularly in the case of numerical curves, we use the formulation given in [4], which has been implemented numerically by Z. Zeng and the author. The implementation in the plane curve case is given in Appendix 1 of [1], the code and examples are in [2] and further information can be found in [5].

Intersections and singularities are connected, in that if and intersect at , then the curve has a singularity at However, there is an important difference. If we perturb a curve with a singularity by adding some terms with very small coefficients, the singularity often goes away. But if we perturb both of the curves intersecting at , then locally we have the same multiplicity. Here is an example.

What this shows is that singularities are numerically unstable, but intersections are numerically stable. Thus in [1], which emphasizes the numerical point of view, we avoid getting deeply into singularities, but we can deal with intersections.

This leads to the most important theorem of complex projective plane algebraic geometry, Bézout’s theorem.

Given complex algebraic curves and of degrees (respectively) and with no common nonconstant factor, there are exactly complex projective points on both curves counting intersection multiplicity.

There are many proofs in the literature, and we will not give a complete proof here or in [1]. The complicating issue is when there are infinite or multiple intersection points. The typical proof involves use of the *resultant*. In the case of possibly infinite but not multiple points, one approach is to apply a random projective transformation. The resulting curves then, with high probability, will have no infinite intersection points and moreover, each intersection point will have a unique coordinate. We can find these by applying the resultant with respect to , which will then give a polynomial of degree with distinct and hence non-multiple zeros. One can easily find the coordinates of the transformed system by substituting each in either equation and solving for . Finally, transforming back will give the solutions of the original system. We will study these transformations and find infinite intersection points by transforming, solving the affine system and transforming back in the next section.

As an example, consider the following Gaussian cubic and quadratic. There is one infinite solution. Applying the random projective linear transformation with matrix

gives a system of equations that leads to polynomials with rational coefficients, no infinite solutions and unique coordinates for the affine solutions.

Pictured are the original system and the transformed system. The indicated point in the second plot corresponds to the infinite solution of the first plot. Even in this simple example with equations and transformation using one-digit integers, the resultant polynomial was a rational polynomial with numerators of 17 digits and a denominator of 21 digits!

Later in [1], Bézout’s theorem is used in the discussion of Cayley’s theorem and Harnack’s theorem. In this section, we use Bézout’s theorem to argue the *singularity theorem*:

An irreducible curve of degree has at most complex projective singular points.

In [1] I take a constructive point of view and show instead that a curve of degree with or more singular points is reducible. In the argument, we produce a polynomial of smaller degree that meets the given curve in too many points, so has a common factor with the given curve. In fact, in Appendix 1 of [1] we implement this argument with a function that factors the defining polynomial of any curve with or more singularities.

Going back to the cubic, we homogenize and then specialize at .

The resulting plot shows the infinite points of in the specialization where the dashed line is the original infinite line. The original infinite points are named , , . The first critical point becomes the infinite point in the – plane, and the other two go to the points , .

So in the projective plane, infinite points look just like affine points. We can trace projective paths just like affine paths. Thus, we can form graphs just like in the affine case; in particular, the projective graphs now have the property that every vertex is even. This gives my *fundamental theorem of real plane projective algebraic curves*, henceforth called just the *fundamental theorem*, which completely describes the topology of the projective curve.

Let be a homogeneous real plane projective algebraic curve . Then there is a finite set of points in , called vertices, and a set of edges between pairs of vertices satisfying:

*Each edge corresponds to a continuous arc (or path) in**connecting the two vertices*.*Every singular point of**is a vertex*.*The interiors of any two arcs corresponding to edges are disjoint; that is, arcs only meet at vertices*.*Every point of**is either a vertex or an interior point of an arc*.*The graph is an Euler graph; that is, every vertex is even*.

In the previous example, the graph can be rendered as follows, where the vertex names refer to the original affine specialization.

Several comments are in order. First, *critical points* are not a concept in the projective plane; they come from some affine specialization. They make good vertices, but in this context are somewhat arbitrary. The same is true of the *direction* of the curve, but these graphs can be given a directed Euler graph structure. The fact that these are Euler graphs implies they can be decomposed into (not necessarily disjoint) directed circuits.

Already in his 1799 proof of the fundamental theorem of algebra, Gauss essentially calculates the infinite points of Gauss curves coming from a monic polynomial of degree as

Since the Gaussian curve already approximately intersects large circles about the origin in the affine points (and their antipodal points) given by the first two coordinates, one can infer that the graph will have edges pointing directly out from boundary points on a large circle to the appropriate infinite point. Thus by treating any two antipodal points of the curve on a large circle about the origin as the same infinite vertex, we convert the bounded graph to the projective graph.

A more interesting example of a Gaussian curve is Gauss’s example, which has two components and a singular point.

We find the critical and boundary points on a circle of radius 4 and put them in an association for labeling.

We show a contour plot and the bounded graph. Then, by treating boundary points as infinite points and identifying pairs of antipodal points (, , , , ), here is the projective graph.

We mention the Riemann–Roch theorems, whose main subject is the concept of *genus*. These theorems are the backbone of complex curve theory and even real space curve theory. However, for real plane curves the important invariant of a curve is the degree, not genus, so we do not dwell on these theorems.

An important tool in [1] is utilizing the projective linear transformations. We follow Abhyankar [6] by keeping the discussion mainly in the affine realm, where it is easier to compute, viewing these as *fractional linear transformations*.

A *fractional linear transformation* is a function defined by

where and are real (or sometimes complex) numbers in the form of integers or machine numbers. Setting the common denominator to zero defines a line, so the domain of is the affine plane minus this line.

The notation suggests describing the fractional linear transformation compactly by the matrix:

This is more compact as well as useful, as the fractional linear transformation is actually given by the two-step procedure using matrix multiplication:

In the Wolfram Language, this becomes the function .

To the extent that we want to work completely in the affine domain, we note that the Wolfram Language also includes fractional linear transformation under the name *linear fractional transformation*. So one can also use the Wolfram Language to evaluate a fractional linear transformation.

Here is an example.

In [1], to keep things simple we assume the matrix is invertible. Matrix multiplication corresponds to composition of transformations; in particular, since our matrices are invertible, so are our fractional linear transformations.

Somewhat unique to [1], we have our transformations work on curves as well as points.

The fractional linear transformation takes the curve (i.e. the bivariate polynomial ) to a curve such that whenever .

Here is an example using as defined before.

The relationship between and is shown by the following example; maps points to points and maps curves to curves. The image of a point under is a point of the image of the curve under .

In this case, the transformation takes the circle to a conic, a parabola. One can use the various transformations given by the Wolfram Language. We provide some additional ones in [2] (such as , which takes line to line and , the reflection about the line ) as Euclidean transformations and as an affine transformation. As an example, we give .

More importantly, we have two fractional linear transformations that act on the projective plane. The transformation takes the infinite point to the origin and the original infinite line to the axis. The transformation specializes the projective plane by removing the line from the affine plane and making it the infinite line; the new axis is the original infinite line.

As an example, we are interested in the behavior of the infinite point of the preceding curve .

The transformation puts the infinite point at the origin of this plot, which shows the infinite line as the axis. It appears that the parabola is actually tangent to the infinite line at the infinite point. To check, we can calculate the tangent line to at the origin to see it is the axis, that is, .

There are various alternate versions of the functions and to handle working projectively. For example, accepts infinite points as input or returns them as output.

The main application for is that we can now find all complex projective singular points or intersection points in one step by picking a random line that, with high probability, will not go through any of the finite number of singular or intersection points. For details, see [1].

In [1], the theory so far is applied to recover known results in lower-dimension geometry. First we consider nonsingular conics in the form

where the are integers or machine numbers. We identify them as to type (hyperbola, parabola, ellipse) and write them in standard form. We parameterize them by rational or trigonometric functions, find their foci and directrix or conversely, construct them from arbitrary foci and another appropriate value such as the semilatus rectum.

We then discuss the numerical theory for nonsingular cubics. Unlike the number theory case, which is one of the most difficult subjects in mathematics, the numerical case is very simple. We give a function to find the numerical inflection points; then, with a choice of inflection point, we have a deterministic black-box function to calculate the Weierstrass normal form and -invariant. The -invariant almost completely classifies numerical cubics relative to fractional linear transformations, that is, relative to the real projective linear group: there are two conjugate classes for each . Under the complex projective linear group, the classification is complete.

We end with Cayley’s theorem: an irreducible curve of degree with double points has a rational parameterization. This means that with parameter , the parameterization components have the form of a polynomial in divided by another polynomial in . The coefficients are not, however, expected to be rational numbers; Cayley only promises algebraic numbers. Thus in practice, this parameterization works best with machine-number coefficients. We illustrate by parameterizing the hypocycloid.

The gap occurs because we only plotted the parameter on a closed interval; in theory it should run from . Details of how the parametric functions were calculated are in [1].

Topologists often think of the real projective plane as a Möbius band where the entire outer boundary is squashed to the affine origin. Alternatively, the Möbius band can be viewed as the real projective plane with a tiny disk about the affine origin removed, the boundary of that disk being the boundary of the Möbius band. In either case, the center line of the band is the *infinite line*.

It is common to construct a Möbius band out of a strip of paper. Here is a slightly different but useful way, but shown in Figure 1 by a physical deconstruction: cut from a boundary point to the center (infinite) line, then cut around the center line.

**Figure 1. **Constructing a Möbius band.

This gives a long skinny strip that we can identify with the real projective plane shown in Figure 2. The vertical yellow lines are the negative and positive axes, and the standard quadrants of the affine plane are numbered in Roman numerals.

**Figure 2. **The real projective plane.

We implement the mappings from the projective plane to this strip, called the *rectangular hemisphere* in [1] for reasons given there, and from the strip to the Möbius band by the following functions.

A simple example is the hyperbola ; we give the construction. The infinite points are and . Unfortunately, even this simple example takes up a great deal of space, so we will just get started. We consider the part of this hyperbola in the second quadrant of the affine plane and plot it on the Möbius band. We start at the infinite point and trace to the infinite point , which is an ambiguous point.

The affine part is well known, so we inspect the obvious infinite points .

A technicality is that uses the line function, which could randomly differ by a multiple of . We need a specific choice, so we set the value of .

Again, the axis represents the infinite line for , and the origin the infinite point.

To connect this plot to the affine curve, find the points where intersects the circle of radius 1.

Now map these back to the affine plane to see that it is the first point in that is related to a point in the second quadrant.

So the part of the hyperbola in the second quadrant can be traced using two parts: the part from the intercept to the point and the image of the part from to the infinite point represented by .

We see no error messages, so we assume the tracing went correctly. Now apply .

Here we get some warning messages. We have to set the ambiguous points correctly and can then draw Figure 3.

So we have drawn this section of the hyperbola on the rectangular hemisphere.

**Figure 3.** The part of the hyperbola in the second quadrant on the rectangular hemisphere.

The reader may wish to attempt the other sections of the hyperbola; the one in the third quadrant is similar to , and the parts in the first and fourth quadrants can be done together since the intercept does not give an ambiguity (Figure 4).

**Figure 4.** The full plot of the hyperbola on the hemisphere looks like this.

Finally, we lift to the actual Möbius band using and (Figure 5).

**Figure 5.** The hyperbola on a Möbius band.

This last example was simple! From now on we just show the final output (Figure 6).

**Figure 6.** Here are two Möbius plots of lines, the first a line through the origin, and then a typical line.

Next, Figure 7 shows two affinely parallel lines meeting at an infinite point and three circles; the black one contains the origin in its interior.

**Figure 7.** Affinely parallel lines and three circles.

In Figure 8, we plot the rational function . This has two infinite points: a singular one at and a regular one at .

**Figure 8.** Plot of a rational function.

Experimenting with these plots we see, as the fundamental theorem tells us, that these curves are comprised of loops, that is, simple closed curves. Draw these yourself using the pattern in our hyperbola example that stops at the rectangle, print it out (preferably in landscape orientation with smaller aspect ratio), then cut it out, twist and tape together the two copies of the infinite line to make the Möbius band. Now cut out a loop. Two things can happen: either you get two pieces, one of topologically a disk and the other not, or only one piece, as in the classic example of cutting a Möbius band along the center line.

In the first case, we call the loop an *oval*, and the complementary piece shaped like a disk is called the *interior*. In the other case, we call this a *pseudo-line*. Notice both kinds of lines have this property. One easy way to tell, without going through the trouble of constructing a physical Möbius band, is that an oval meets the infinite line (or any other line for that matter, since up to fractional linear transformations all lines are the same) in an even number of points (possibly zero). A pseudo-line meets the infinite line in an odd number of points. Again, since in the projective plane all lines are equivalent, two pseudo-lines always meet in an odd number of points, in particular, at least one.

From Bézout’s theorem, a curve of even degree meets any line in an even number of points. A consequence is that* a nonsingular curve can contain at most one pseudo-line. Further, if the degree is even, each loop of the curve must be an oval. On the other hand, each nonsingular curve of odd degree must contain exactly one pseudo-line and possibly some ovals*.

This last paragraph is in italics because it essentially tells us the topological structure of nonsingular plane curves.

We can now find the specific topological (and even some geometrical) structure of any particular real plane curve, at least up to degree six. We concentrate on the Newton’s hyperbola family of curves introduced in Section 2. These are not well conditioned, so they present interesting problems. It may be necessary to go to arbitrary-precision numbers to get further with these, although for well-conditioned curves I have used the methods of [1] for curves up to degree nine.

We first consider Harnack’s theorem [7] and related problems from Hilbert’s problem, Part 1 [8]. Harnack’s first theorem states that a nonsingular curve of degree can have at most topological components in . A rigorous proof requires advanced concepts in topology, but a heuristic proof is easy from Bézout’s theorem, especially given the ideas of ovals and pseudo-lines in the last section.

As mentioned in the last section, an oval is a loop that cuts the Möbius band in two parts, one topologically a disk. That part is known as the *interior* of the oval. It is possible that the interior of an oval contains another oval. Consider the following example with fractional linear transformation given by matrix that cuts the axis out of the affine plane.

We say the smaller oval has *depth* 2. If there were another oval inside that oval, it would have depth 3, and so on. It is easy to prove that the maximal depth of an oval in an irreducible curve of degree is ; simply consider a line through a point in the interior of the deepest oval and apply Bézout’s theorem. The next example generalizes this.

Continuing this way, we can in principle construct an oval of depth using a curve of degree for even and a curve of depth for odd.

An -curve is a nonsingular curve with the maximum number of components. To best show the possible arrangements of the components of an -curve, we use *diamond diagrams*. We have two main types, first the *Descartes–Viro diagrams* (or more simply the *Viro diagrams*), which depend on the signs of coefficients of the equation of the curve [9]. These diagrams turn out to be in 1-1 correspondence with the Newton hyperbolas. We also use *Gauss diagrams*, which show the complementary positive and negative value sets of .

The code for drawing diamond diagrams is very long and explained in [1] and [2]; in this article we do not give code, only graphics.

For the Viro diagram in the first quadrant including the positive axes, the color of the dot at the point is green if the coefficient of is positive and red if it is negative. We do not allow equations with any term 0 for a Viro diagram. The curve then separates the red and green lattice points.

As an example, consider Newton hyperbola 413 (Figure 8).

**Figure 9.** The Viro diagram, region plot, graph and diamond diagram for the function `nh413`.

In this case, the Viro diagram and Gauss diagram (not shown) are the same, other than the color of the lattice points; orange indicates where and brown where . A graph is given using only the infinite points, which are labeled , , , . The outer boundary of the diamond represents the infinite line. The diamond diagram indicates that: (1) on the positive axis the curve crosses three times; (2) it does not cross the negative axis; and (3) it crosses the positive axis once and the negative axis twice. The Viro diagram gives the maximal number of crossings according to Descartes’s theorem on each positive and negative , and axis, viewing as a single-variable polynomial restricted to these lines. In the projective plane, the axis is the line of infinite points where infinite points in the first/third quadrant are positive and those in the second/fourth quadrant are considered negative in this context.

In this example, the crossing points are given as follows.

Let , , be the infinite points.

The Newton hyperbola 613 is more complicated (Figure 10).

**Figure 10.** The Viro diagram for the function `nh613`.

We see there are three ambiguous cells, that is, four lattice points with , one color and , the other. There are two different possible ways to connect regions given by dashed curves in the colors aqua and magenta. Without further investigation, there is no a priori way to determine the correct choice; a slight perturbation of the curve can affect this. A suggests an answer.

Checking infinite points and critical points confirms that there is nothing unexpected going on outside the region plot, so we get the Gauss diagram and graph (Figure 11).

**Figure 11.** Gauss diagram and graph.

In this case, a tiny perturbation changes the geometry and the Gauss diagram (Figure 12).

**Figure 12.** The region plot and Gauss diagram for the perturbed `nh613` are different from `nh613`.

Originally, the negative complement was connected and the positive complement had three components; after the perturbation, it is the positive complement that is connected. Luckily, we did not need to change any values of the lattice points when changing from the Viro diagram to the Gauss diagram. In general, the user will need to do that. See [1] or some later examples.

Now that we have explained our diagrams, we can show some -curves. Hilbert gave a series of -curves for each degree that are given by Viro diagrams and hence exist by the work of Viro. We simply give the diagrams here (Figure 13). For more information see [1].

**Figure 13.** Viro diagrams of -curves of degree .

Hilbert suggested other possibilities with more nesting in degree six.

Many more details on these diagrams and Hilbert’s problem [8] are given in [1].

We now have all of our tools. In [1] we illustrate more complicated examples, two of them the curve and the Newton hyperbola 336941. Both of these curves have interesting behavior at or near the infinite line, so a contour plot, even with large scale, cannot show everything.

At present, we have shown how to analyze and plot curves of degree up to six in various ways. For well-conditioned curves, these machine-number methods often work with higher degree; the author has had success with curves of degree eight and nine. To adequately deal with Newton hyperbolas of degree greater than six, one would perhaps like to rewrite some of the code to use arbitrary precision.

Our forthcoming book is a first attempt to apply numerical methods to a formerly abstract subject. There is a lot more that can be done in this area. We hope the book will be a starting point.

I want to thank the people at Wolfram Research for their help on the book project, especially Jeremy Sykes, Daniel Lichtblau and, for this article, George Beck.

[1] | B. H. Dayton, A Numerical Approach to Real Algebraic Curves with the Wolfram Language, Champaign, IL: Wolfram Media, 2018.www.wolfram-media.com/products/dayton-algebraic-curves.html. |

[2] | Global Functions. (Jul 18, 2018) barryhdayton.space/curvebook/GlobalFunctionsTMJ.nb. |

[3] | E. W. Weisstein. “Bifurcation” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/Bifurcation.html. |

[4] | F. S. Macaulay, The Algebraic Theory of Modular Systems, Cambridge: Cambridge University Press, 1916. |

[5] | B. H. Dayton, T.Y. Li, Z. Zeng, “Multiple Zeros of Nonlinear Systems,” Mathematics of Computation, 80(276), 2011 pp. 2143–2168.www.ams.org/journals/mcom/2011-80-276/S0025-5718-2011-02462-2/S0025-5718-2011-02462-2.pdf. |

[6] | S. S. Abhyankar, Algebraic Geometry for Scientists and Engineers, Providence, RI:AMS, 1990. |

[7] | E. W. Weisstein. “Harnack’s Theorems” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/HarnacksTheorems.html. |

[8] | E. W. Weisstein. “Hilbert’s Problems” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/HilbertsProblems.html. |

[9] | E. W. Weisstein. “Descartes’ Sign Rule” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/DescartesSignRule.html. |

B. H. Dayton, “A Wolfram Language Approach to Real Numerical Algebraic Plane Curves,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-7. |

Barry H. Dayton is Professor Emeritus at Northeastern Illinois University, where he taught for 33 years. His Ph.D. was in the field of algebraic topology, but he has done research in a variety of fields, including algebraic geometry and numerical algebraic geometry.

**Barry H. Dayton**

*Department of Mathematics
Northeastern Illinois University
Chicago, IL 60625-4699*

dx.doi.org/doi:10.3888/tmj.20-6

An important problem in graph theory is to find the number of complete subgraphs of a given size in a graph. If the graph is very large, it is usually only possible to obtain upper bounds for these numbers based on the numbers of complete subgraphs of smaller sizes. The Kruskal–Katona bounds are often used for these calculations. We investigate these bounds in specific cases and study how they might be improved.

Graph theory has many interesting problems that lend themselves to computer investigation. Mathematica has many graph theory functions that can enable these investigations. We shall introduce the reader to Mathematica’s graph theory capability while investigating a problem in extremal graph theory. Extremal graph theory tries to find graphs satisfying certain extreme properties—for example, having the most triangles for a fixed number of edges.

We first introduce some basic graph theory concepts and show how to represent them in Mathematica. Formerly, most graph theory functions were contained in the Combinatorica package; however, this graph theory functionality has, for the most part, been absorbed by the main program, making it unnecessary to load this package.

A finite graph consists of two finite sets, and . The elements of the set are called *vertices* and the elements of the set consist of (unordered) pairs of vertices called *edges*. We often write . A graph is called a *subgraph* of graph if and ; that is, if each vertex in the subgraph is also a vertex in the graph and each edge of the subgraph is also an edge of the graph .

Graphs are often depicted as points (the vertices) and line segments (the edges) that join pairs of vertices in . Thus, to draw the graph consisting of the five labeled vertices and with edge set being all pairs of vertices, we enter the following command.

This graph is known as the *complete graph on five vertices* and denoted by .

In general, denotes the complete subgraph on vertices, that is, the graph with vertex set and edge set consisting of all pairs of elements of .

The *complement* of the graph is the graph having the same set of vertices and whose edges are exactly those pairs of vertices of that *do not* belong to . Thus, the graph complement of the complete graph has no edges at all.

Mathematica can also add vertices to an already existing graph. This adds two vertices labeled and to the graph .

This is how to add edges to graph . (The symbol “” can be entered from the keyboard using the Esc key; press Esc, type u and e, then press Esc again.)

Vertices and edges can be deleted with the commands and .

Another useful operation, , contracts a set of vertices into one vertex. For example, this contracts the graph by contracting the vertices labeled and into a single vertex.

An important problem in graph theory is to find the number of complete subgraphs (or cliques) of a graph. For example, find the number of cliques of a certain size in a large social network graph. Here, the people are the vertices, an edge joins two people who know each other and a clique consists of people who all know each other; that is, they form a complete subgraph.

There is a dual version of this problem, equally important, that asks for the number of independent sets of a graph. A set of vertices is *independent* in the graph if there are *no* edges of connecting the vertices in the set. (Clearly, a set of vertices of is independent if and only if they form a complete graph in the complement of .)

We consider next how to compute the number of complete subgraphs exactly for small graphs and how one might obtain useful upper bounds for larger graphs.

Given , it is easy to determine how many complete subgraphs there are having, say, four vertices; the answer is the number of ways to choose four of the seven vertices, since for each such choice, all edges between the chosen vertices are also present in the original graph. The number of ways to choose four out of seven objects is just the binomial coefficient .

However, if the graph is not a complete graph, it is not so easy to determine how many complete subgraphs of a certain size it contains. We turn to this problem next.

Consider the graph previously defined.

Suppose we want to know how many subgraphs contains. We start with the list of its vertices.

Next, we form the set of all subsets of size four of this set.

Here is an example of a subset of vertices that generates a complete graph in .

We now wish to repeat this calculation for all the subsets of size four.

Finally, we count the number of times occurs in .

Let be the number of complete subgraphs with vertices contained in the graph . The following program implements .

Let us count the number of times , and occur in the graph ; that is, we calculate , and .

Suppose we know that a certain graph satisfies . Can we determine the maximum of ? Surely, can have no subgraphs; that is, , as in this example.

However, the example has been shown to have 19 subgraphs and 10 subgraphs.

In fact, we have shown in a joint paper [1] that for all graphs with , the maximum value of is 10.

The following definition, due to Bollobás [2], is useful in what follows.

**Definition**

If , then is the maximum number of subgraphs that a graph can have if the number of its subgraphs is less than or equal to .

Thus, using this notation, we have shown in [1] that .

Suppose now that the number of triangles a graph can have is fixed and we want to determine the graph with the fewest edges that can have that many triangles.

For example, suppose we want a graph having 23 triangles with the fewest possible edges. Our intuition is to look for graphs that are “tightly packed,” that is, as close to complete graphs as possible. has 20 triangles and has 35. So let us start by adding a vertex to and then add three edges from to three of the vertices of . This adds new triangles.

That shows that a graph with 23 triangles can be gotten with just 18 edges. Is there a graph with fewer edges that has 23 subgraphs?

The next theorem is due to Erdös and Hanani (see [3]).

**Theorem**

If the number of edges is , then since , the theorem says that ; that is, the maximum number of subgraphs in a graph with 18 edges is 23.

Is there a graph with fewer edges and 23 triangles? To answer this question, we can use the Erdös–Hanani theorem to compute the various maximum numbers of triangles with fewer edges. The program gives the maximum number of triangles for a given number of edges, as determined by the Erdös–Hanani theorem; the table computes these values for edge numbers between three and 18.

We see from the table that there is no graph with fewer than 18 edges having 23 triangles. Hence the fewest edges needed to produce a graph with exactly 23 triangles is indeed 18.

If we specify the number of triangles rather than the number of edges a graph can have, computing the maximum numbers of larger complete graphs is not as simple as in the previous section. Exact maximum numbers are not known in most cases, only upper bounds.

A well-known theorem of extremal graph theory (proved independently by Kruskal [4] and Katona [5]) can provide an upper bound for , given , . In fact, the Kruskal–Katona result is for more general objects than graphs, but we will only be using it for graphs and only when ; that is, we specify how many triangles a graph can have and want to bound for some .

**Theorem**

Suppose a graph has triangles, where , where each of , , is chosen in order and to be as large as possible at the time of choosing. Then for , .

For example, if the number of triangular subgraphs of is , then the Kruskal–Katona upper bounds for and are and .

To use this theorem in Mathematica, we first need to express as the binomial sum, . Given , the function finds the numbers , , .

Next, we define the functions and , the Kruskal–Katona upper bounds for and , given that .

Here are the results for 19 triangles.

Thus, and .

Here again is the graph with 19 triangles. It has 10 subgraphs and two subgraphs.

As mentioned, we proved in [1] that and .

Finding exactly is often very difficult and results are not known in most cases. However, if there is a graph with triangles, with (the Kruskal–Katona bound), then . Complete graphs are obvious examples. Also, as remarked in [2], if the number of triangles , we define the graph by adding to a single vertex and edges joining to vertices of ; then (the Kruskal–Katona bound). For example, suppose . We construct such a graph, .

Thus, and .

Next, we find those numbers of triangles for which this construction works; that is, the third entry in their binomial representation is 0.

Also, it is not difficult to see that if , the Kruskal–Katona bound is the same as for (since for ). So the list of the numbers of triangles for which is known can be expanded. Here are the known values for the first 100 positive integers.

This leaves the following numbers of triangles up to 100 still unknown.

We have in fact settled (see [1]) the cases (where , respectively). We add these cases to the list of the known values.

And these are the unknown values of for .

We had started listing the integer sequence (see [6]), and the preceding results can be used to add to this sequence; for example, the first term of the sequence, (when there are four triangles allowed, the graph has at most one ); the sequence continued up to . Since we now know the consecutive values up to , we can add four more consecutive values: , , , . Our conjecture in the next section implies that . Other selected values of the sequence can obviously also be obtained, given that we know so many of the first 100 cases.

We have used complete graphs in building the maximal examples. However, even if we remove an edge from a complete graph, the number of subgraphs in this new graph `gr1` is the same as the Kruskal–Katona bound based on the number of subgraphs (calculated by ), as the following computation shows. (This can also be established with a simple argument that we omit.)

In fact, even if we remove several edges from a complete graph, as long as they share a common vertex, the number of subgraphs in the resulting graph and the Kruskal–Katona bound (based on the number of subgraphs) remain the same, as the reader can easily check with Mathematica. However, if we remove two edges that do not share a common vertex, this is no longer the case, as the following computations show.

This time, the number of subgraphs is one less than the upper bound! These graphs are also Turan graphs. The Turan graph is the graph formed by partitioning a set of vertices into subsets with sizes as equal as possible (differing by at most 1) and connecting two vertices by an edge if and only if they belong to different sets of the partition. The built-in Mathematica function draws this graph. For example, partitions the vertices into the subsets , , ; the edge is omitted since it would connect vertices in the same subset, ; is omitted since it would connect vertices in the same subset .

The graphs are especially interesting to us as they are the same as the graphs , defined before. It may not be immediately apparent that is the same graph as ; however, if we chose the right for the Turan graph, it becomes rather obvious.

In addition, can always be used to check.

Therefore, for the Turan graph , the actual number of subgraphs is just one less than the Kruskal–Katona bounds! Do these graphs provide the true maximum values of subgraphs based upon the number of their triangles? We conjecture below that they do for . In addition, it is not hard to show that the number of triangles in is . (This follows because the vertex sets of are , , , , …, ; then consider all possible ways of choosing three elements from these sets without choosing two elements from the same set.) This expression can be simplified.

The sequence of these triangle numbers for the graphs is the same (with an offset) as the sequence A000297 in [7]. (We recently added a comment to this entry that mentions this.)

**Conjecture**

If , the Turan graph has the maximum number of subgraphs for any graph with triangles.

Mathematica has a large database of graphs accessible with the function .

We wish to find maximal examples, that is, graphs that have the greatest number of subgraphs for their number of triangles. We make this precise with the following definition. A graph is a -maximal graph with respect to subgraphs if it has the greatest number of subgraphs for any graph with the same number of subgraphs (triangles) as . We believe that these -maximal graphs are “tightly packed” and thus have a relatively small number of vertices given their number of triangles. Suppose first we wish to find all the -maximal graphs with exactly nine triangles. The Kruskal–Katona bound is three.

However, the maximal number of subgraphs is really two, that is, (see [1]). We first look for examples in the set of nonisomorphic graphs with six vertices, that is, subgraphs of .

Here is the first such graph.

We apply to each entry to get the graphs themselves but suppress the large output.

Within those, we next search for graphs with nine triangular subgraphs. There is only one.

We attach labels to the vertices of this graph for later use.

This graph has two subgraphs.

Hence is a -maximal example (see [1]).

This graph embedding is also extremal in another way; see [8].

We search for examples in the set of graphs with seven vertices, using the command , which lists all non-isomorphic graphs with seven vertices. (If , only lists some of the graphs with vertices). has 1044 entries, so the result is not immediate.

There are 35 examples of graphs with seven vertices and nine triangular subgraphs.

To see the individual graphs, we use Mathematica’s built-in function , where, in addition to the graph, the number of subgraphs of the graph is listed.

We suspect that one of the -maximal graphs in is really a simple modification of the single -maximal graph with six vertices found in ; that is, the graph plus an edge. (We obviously do not want to add a triangle!) Stepping through the , graph number 22 is easily seen to be that graph.

We then add the edge to the graph found in and finally, use to verify that they are indeed isomorphic.

Next, suppose we want to find the -maximal graphs with 19 triangles. Although has triangles, if we remove even one edge from , the resulting graph has only 16 triangles! Hence, there are no subgraphs of with exactly 19 triangles. However has triangles; thus it is reasonable to search for examples.

Sometimes there are hidden edges in graph drawings in Mathematica; the graph in the middle is a case in point (the edge from the center top to center bottom vertices cannot be seen). We therefore redraw the graphs, setting the option .

We now ask for the number of subgraphs and the number of subgraphs in these graphs.

Also, all three graphs have the same number of edges, 17.

Thus, we have found three examples of graphs that illustrate our result of [1], that .

We now ask what is the maximal number of subgraphs in a graph with 25 vertices. We again search for graphs with seven vertices and 25 triangles.

The one example we have found has 16 subgraphs, and the Kruskal–Katona bound for subgraphs is 17.

We investigate this graph further.

This is the Turan graph ; edges and are missing from . This can also be seen with the function.

Since this is one of the Turan graphs and we have conjectured that they are -maximal, we believe we have found a -maximal example with 25 triangles.

Another example: Suppose we have a graph with 29 triangles. The Kruskal–Katona bound is 22.

A search in yields no graphs with 29 triangles. If we search in , we find one graph with 29 triangles. It has 16 subgraphs.

So it looks like the most subgraphs we can find for a graph with 29 triangles is 16. We can do better! For 26 triangles, since , the Kruskal–Katona bound is . Thus, if we add vertices and to and connect them to four and three vertices of , respectively, we get a graph with 29 triangles and 20 subgraphs.

The graph has eight vertices, but it did not show up in , which has only 289 out of 12346 possible graphs. contains all 1044 graphs with seven vertices.

We have looked at examples with relatively few vertices, and these examples might give the impression that the Kruskal–Katona bounds do not differ significantly from the real values—so why expend so much effort trying to improve them? We construct a larger example to show that the difference can be quite large.

Here is another example.

The Kruskal–Katona bound, however, usually yields more than twice as many subgraphs! More extreme examples can easily be constructed.

There have been some efforts to improve the Kruskal–Katona bounds in the case of graphs (see, for example [1] and [9]); however these have had very limited success. We feel that not enough insight into this problem has been gained and that perhaps by using computer experiments, conjectures can formulated and then proved to advance our knowledge in this area. For example, if we knew that a maximal example with 19 triangles must occur in a graph with seven vertices, our search of would be sufficient to prove that the maximum number of subgraphs a graph with 19 subgraphs can have is 10. We succeeded in proving this result in [1] without using computers, but only with a great deal of effort.

To read more on using Mathematica’s graph theory capability to investigate other maximal problems in graph theory, see [10] and [11].

I wish to thank the editor, whose sage advice substantially improved this paper’s programing and content.

[1] | R. Cowen and W. Emerson, “On Finding ,” Graph Theory Notes, New York Academy of Sciences, 34, 1998 pp. 26–30.www.researchgate.net/publication/287991696_On_Finding_k4k3_x. |

[2] | B. Bollobás, “Relations between Sets of Complete Subgraphs,” Proceedings of the Fifth British Combinatorial Conference, Aberdeen, 1975 pp. 79–84. www.researchgate.net/publication/268543809_Relations_between_sets_of_complete_subgraphs. |

[3] | P. Erdös, “On the Number of Complete Subgraphs Contained in Certain Graphs,” Publications of the Mathematics Institute of the Hungarian Academy of Sciences, 7, 1962 pp. 459–464. |

[4] | J. Kruskal, “The Number of Simplices in a Complex,” Mathematical Optimization Techniques(R. Bellman, ed.), Berkeley: University of California Press, 1963 pp. 251–278. |

[5] | G. Katona, “A Theorem of Finite Sets,” The Theory of Graphs (P. Erdös, ed.), Budapest: Akadémia Kiadó, 1968 pp. 187–207. |

[6] | N. J. A. Sloane. The Online Encyclopedia of Integer Sequences. oeis.org/A020917. |

[7] | N. J. A. Sloane. The Online Encyclopedia of Integer Sequences. oeis.org/A000297. |

[8] | E. W. Weisstein. “Graham’s Biggest Little Hexagon” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/GrahamsBiggestLittleHexagon.html. |

[9] | A. Frohmader, “A Kruskal–Katona Type Theorem for Graphs.” arxiv.org/abs/0710.3960. |

[10] | E. W. Weisstein. “Cage Graph” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/CageGraph.html. |

[11] | E. W. Weisstein. “Degree-Diameter Problem” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/Degree-DiameterProblem.html. |

R. Cowen, “Improving the Kruskal–Katona Bounds for Complete Subgraphs of a Graph,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-6. |

Robert Cowen is Professor Emeritus in Mathematics, Queens College, CUNY. He does research in logic, combinatorics and set theory. He taught a course in Mathematica programming for many years, emphasizing discovery of mathematics, and is currently working on a text on learning Mathematica through discovery with John Kennedy. His website is sites.google.com/site/robertcowen.

**Robert Cowen**

*16422 75th Avenue
Fresh Meadows, NY 11366*

dx.doi.org/doi:10.3888/tmj.20-5

This article explores the numerical mathematics and visualization capabilities of Mathematica in the framework of quaternion algebra. In this context, we discuss computational aspects of the recently introduced Newton and Weierstrass methods for finding the roots of a quaternionic polynomial.

Since Niven proved in his pioneering work [1] that every nonconstant polynomial of the form

(1) |

has at least one zero in , thereby extending the fundamental theorem of algebra to quaternionic polynomials, the use of such polynomials has been considered by different authors and in different contexts. Quaternionic polynomials ([2]) have found a wealth of applications in a number of different areas and have motivated the design of efficient methods for numerically approximating their zeros (see e.g. [3–8]).

This article discusses two numerical methods to approximate the zeros (or roots) of polynomials of the form (1). They can be seen as the quaternionic versions of the well-known Newton and Weierstrass iterative root-finding methods and they both rely on quaternion arithmetic. Here we explain in detail how we have used Mathematica to produce the numerical results recently presented in [9–11].

All the computations in this article require the package , available for download at w3.math.uminho.pt/QuaternionAnalysis (see [12] and [13]).

We introduce the basic definitions and results needed; we refer to Part 1 of this article [2] for recalling the main aspects of the quaternion algebra and to [14] for details on quaternionic calculus.

The real vector space can be identified with by means of

where , and are Hamilton’s imaginary units. Thus, throughout the article, we do not distinguish an element in from the corresponding quaternion in , unless we need to stress the context.

Using the simplified notation for the vector part of , any arbitrary nonreal quaternion can be written as

(2) |

where is the norm of and is the quaternion

(3) |

also referred to as the sign of . In addition, since and , one can say that behaves like the complex imaginary unit, and for this reason we call (2) the complex-like form of the quaternion .

In what follows, we consider domains and functions that can be written in the form

(4) |

where , and and are real-valued functions. Continuity and differentiability are defined coordinate-wise.

We define on the set the so-called radial operators

where and .

We introduce the following concept.

**Definition 1**

Let be a function of the form (4), and , with . Such a function is called radially holomorphic (or radially regular) in if

**Theorem 1**

A function of the form (4) is radially holomorphic iff . In that case, we have .

It follows at once that any quaternionic polynomial of the form (1) but with is radially holomorphic and its radial derivative is

(5) |

For holomorphic complex functions of one complex variable, the well-known Newton method for finding a zero consists of approximating by means of the iterative process

(6) |

with sufficiently close to and . Identifying a real quaternion with a vector in , the problem of solving any quaternionic equation can always be transformed into the problem of solving a system of four nonlinear equations, whose solutions, in turn, can be obtained by using the multivariate version of (6):

(7) |

with sufficiently close to and a nonsingular Jacobian matrix . Not surprisingly, recent experiments performed by some of the authors of this article ([9], [10]) have shown the substantial gain in computational effort that can be achieved when using a direct quaternionic approach to this problem.

Newton methods in the quaternion context were formally adapted for the first time by Janovská and Opfer in [7], where the authors solved equations of the form . Later, Kalantari in [15], using algebraic-combinatorial arguments, proposed a Newton method for finding roots of special quaternionic polynomials. In [9], the equivalence between the classical multivariate Newton method (7) and quaternionic versions of Newton methods for a class of functions was established.

Due to the noncommutativity of multiplication for quaternions, the quotient of two quaternions and may be interpreted in two different ways: either as (the right quotient) or (the left quotient). This leads naturally to considering two versions of Newton iteration in the quaternionic setting:

(8) |

(9) |

The derivative in equations (8) and (9) has been considered in [9] and [10] as the radial derivative of a radially holomorphic function. In fact, in Corollary 2 of [9] it was proved that for such functions, equations (7), (8) and (9) produce, for each , the same sequence, provided that is nonsingular. Here is a more general result.

**Theorem 2 (**[9], Theorem 4)

Let be a function defined on the set such that the , , are radially holomorphic functions in and the are quaternions not all zero. If is a root of such that is nonsingular and is Lipschitz continuous on a neighborhood of , then for all sufficiently close to such that commutes with all , the Newton processes

(10) |

(11) |

both produce the same sequence as (7), which converges quadratically to .

Each step of the iterative schemes (10) and (11) is implemented in the function , which has as arguments the quaternion and the indication of the version: for (10) or for (11). At each step, a test of the value of is also performed. We recall again that all the functions presented here require the package .

The -Newton methods consist of the successive application of the iterative schemes (9) or (10) through the function , using a stopping criteria based on the incremental size and on the maximum number of iterations .

Example 1

Consider the radially holomorphic polynomial , whose only roots in are the real isolated roots , and . For the concepts of isolated and spherical roots, we refer the reader to [2], Definition 4.

The use of the initial guess requires nine iterations to get an approximation to the root 0 with precision The fact that both methods produce the same sequence is also confirmed.

The use of the initial guesses and requires 14 iterations to get an approximation to the roots and , respectively.

Example 2

The polynomial has a real root 0 and the sphere of zeros . Since the polynomial is radially holomorphic, both methods produce the same sequence. Here we would like to call attention to the convergence to the spherical root.

As pointed out in Example 3 of [10], the behavior of the Newton methods in case of convergence to values generating a spherical root is clear: if is the initial guess, then the Newton sequence converges to the root such that . This phenomenon can be easily seen from the preceding results or by computing the sign (3) of the vector part of the iterations.

Example 3

Now consider the polynomial with the three isolated roots , and (cf. [9], Example 3). This polynomial is not radially holomorphic, which means that we cannot anticipate the behavior of Newton methods unless we choose initial guesses such that Theorem 2 applies, that is, such that commutes with . In other words, must be of the form .

What happens if the assumptions of Theorem 2 are not valid? In fact, as we next illustrate, although the left and right Newton methods do not give the same sequence, we can observe convergence in both cases.

With the choice , the right version of the Newton method converges to the root , while the left version converges to .

It is interesting that the 4D Newton method (7) gives convergence to the other root , as observed in [9].

Following [9] and [10], consider a function that gives the number of iterations required for each process to converge, within a certain precision, to one of the solutions of the problem under consideration, using as the initial guess.

We now consider different initial guesses by choosing points in special regions and we show density plots of . The white regions that may appear correspond to a choice of for which the method under consideration does not reach the level of precision with iterations. The default choices of and usually lead to realistic plots that require some minutes to be produced. A smoother density can be obtained by increasing the option .

Example 4

We consider again the polynomial of Example 3, whose roots are the isolated roots , and . The following code produces the plots corresponding to the choice of in one of the following regions:

As was already pointed out, Theorem 2 can be applied only in ; this is why both methods produce the same plots in this case.

Here is the behavior of the -Newton methods in .

Here is the behavior of the -Newton methods in .

The plots produced by give information on the number of iterations required by each of the quaternionic Newton methods to converge within a certain precision to any of the roots of the polynomial under consideration. However, those plots do not give any information about the root and how the convergence occurs. This issue can be easily overcome by plotting the basins of attraction of the roots with respect to the iterative function. More precisely, we introduce a new input parameter in the function with the information of the root for which we want to compute the basin of attraction. A new function takes into account the existence of spheres of zeros. The functions and give the number of iterations needed to observe convergence to an isolated root or a spherical one, respectively. These functions return when the corresponding convergence test fails.

The functions that plot the basin of attraction of an isolated root or a spherical root have an input parameter associated to that root. The color coding used is the following: if the initial guess , chosen in a domain , causes the process to converge to a certain isolated root to which the color was associated, then the point is plotted with the color . For a sphere of zeros , all the points that converge to a point in have the color assigned to . Dark shades of a color mean fast convergence, while lighter-colored points lead to slower convergence. As before, white regions mean that the method does not converge.

Example 5

We consider once more the polynomial of Example 4, now from the perspective of the basins of attraction of each of the roots , and . We associate with these roots the colors red, blue and green, respectively, and consider the domains , and , described in Example 4. The corresponding plots can be obtained as follows (it can take some time to produce the figures).

Here are the basins of attraction in (left).

Here are the basins of attraction in (left and right).

Here are the basins of attraction in (left and right).

Example 6

This example concerns the polynomial studied in Example 2, which has an isolated root 0 (red) and a sphere of zeros (blue). The corresponding plots can be obtained as follows.

Here are the basins of attraction in (left).

Here are the basins of attraction in (left); as expected, the behavior is similar to that in , since .

Here are the basins of attraction in (left).

The Weierstrass method is one of the most popular iterative methods for obtaining simultaneously approximations to all the roots of a polynomial with complex coefficients. The method was first proposed by Weierstrass [16] in 1891 and later rediscovered and derived in different ways by Durand [17] in 1960, Dočev [18] in 1962 and Kerner [19] and Prešić [20] in 1966.

Let be a complex monic polynomial of degree with roots and let be distinct numbers. The classical Weierstrass method for approximating the roots is defined by the iterative scheme:

(12) |

If the roots are distinct and are sufficiently good initial approximations to these roots, then the method converges at a quadratic rate, as was first proved by Dočev [18]. The iteration procedure (12) computes one approximation at a time based on the already computed approximations. For this reason, it is usually referred to as the *total-step* or *parallel* mode. The convergence of the method can be accelerated by using a variant—the so-called *single-step, serial* or *sequential mode—*that makes use of the most recent updated approximations to the roots as soon as they are available:

(13) |

In a recent article [11], we adapted the Weierstrass method to the quaternion algebra setting. We refer to [2] and references therein to recall the main concepts and properties of the ring of unilateral quaternionic polynomials. In particular, we recall the factorization of polynomials in into linear terms and the relation between zeros and factors of .

**Theorem 3—Factorization into linear terms**

Any monic polynomial of degree in admits a factorization into linear factors; that is, there exist such that

(14) |

**Theorem 4—Zeros from factors**

Consider a polynomial whose factor terms are ; that is, admits a factorization of the form (14). If the similarity classes , , are distinct, then has exactly zeros , which are given by:

(15) |

(16) |

Following the idea of the Weierstrass method in its sequential version (13), the next results show how to obtain sequences converging, at a quadratic rate, to the factor terms in (14) of a given polynomial . Moreover, by making use of Theorem 4, it is possible to construct sequences converging quadratically to the roots of .

**Theorem 5 **([11])

(17) |

(18) |

(19) |

(20) |

with denoting the characteristic polynomial of , that is, . If the initial approximations are sufficiently close to the factor terms in a factorization of in the form (14), then the sequences converge quadratically to . Moreover, the sequences defined by

(21) |

The functions , and are implemented as the functions , and , respectively. The support file associated with [2] needs to be loaded.

The iterative functions associated with (17) and (21) are built into the function.

The quaternionic Weierstrass iterative method is implemented in the function .

The usual convergence test has been replaced in by

in order to let the function recognize a sphere of zeros. Since we also include a test on the value of , there is no risk of misidentifying an isolated root.

Example 7

We consider now the application of the Weierstrass method to the computation of the roots of the polynomial of Example 3, which we recall are , and . All of the initial approximations , and have to lie in distinct congruence classes.

Some explanation of the output is needed. The first entry indicates the convergence or divergence of the method. The second entry is the error in the approximations to the zeros. The last two entries contain approximations to the roots and factors terms. Since there are two real roots and just one nonreal root, the roots and factor terms coincide.

Example 8

Our next test example is a polynomial that also fulfills the assumptions of Theorem 5 and has simple zeros (see [11], Example 1). First, we check that the polynomial

(22) |

(23) |

The convergence to the roots is in a order different from the one given in (22) because the convergence to the factor terms also occurs in a sequence different from the one given in (23).

Example 9

The polynomial has an isolated root and a sphere of zeros . The assumptions of Theorem 5 do not apply to this polynomial, but we can observe convergence to the roots as we increase the precision of the computations. When a polynomial has a spherical root, two of its factor terms are in the same congruence class. Therefore, as the iteration proceeds, the values in (17) become close to zero and some care is required.

Using the usual precision, it was not possible to reach the required tolerance. However, performing the calculations with more decimal places causes a fast convergence, under the same assumptions.

The spherical root can be identified at once by observing that, up to the required precision, we have .

This is the second article on several computational aspects of polynomials in the ring . One can find in the literature methods for numerically approximating the zeros of quaternionic polynomials based on the use of complex techniques, but numerical methods relying on quaternion arithmetic remain scarce, with the exceptions of the Newton and Weierstrass methods discussed in this article. We developed several functions to implement those methods and we also added some visualization tools.

Research at the Centre of Mathematics (CMAT) was financed by Portuguese Funds through FCT – Fundação para a Ciência e a Tecnologia, within the Project UID/MAT/00013/2013. Research at the Economics Politics Research Unit (NIPE) was carried out within the funding with COMPETE reference number POCI-01-0145-FEDER-006683 (UID/ECO/03182/2013), with the FCT/MEC’s (Fundação para a Ciência e a Tecnologia, I.P.) financial support through national funding and by the European Regional Development Fund (ERDF) through the Operational Programme on “Competitiveness and Internationalization – COMPETE 2020” under the PT2020 Partnership Agreement.

[1] | I. Niven, “Equations in Quaternions,” The American Mathematical Monthly, 48(10), 1941pp. 654–661. www.jstor.org/stable/2303304. |

[2] | M. I. Falcão, F. Miranda, R. Severino, and M. J. Soares, “Computational Aspects of Quaternionic Polynomials: Part 1,” The Mathematica Journal, 20(4), 2018. doi.org/10.3888/tmj.20-4. |

[3] | R. Farouki, G. Gentili, C. Giannelli, A. Sestini and C. Stoppato, “A Comprehensive Characterization of the Set of Polynomial Curves with Rational Rotation-Minimizing Frames,” Advances in Computational Mathematics, 43(1), 2017 pp. 1–24.doi.org/10.1007/s10444-016-9473-0. |

[4] | R. Pereira, P. Rocha and P. Vettori, “Algebraic Tools for the Study of Quaternionic Behavioral Systems,” Linear Algebra and Its Applications, 400, 2005 pp. 121–140. doi.org/10.1016/j.laa.2005.01.008. |

[5] | R. Serôdio, E. Pereira and J. Vitória, “Computing the Zeros of Quaternion Polynomials,” Computers and Mathematics with Applications, 42(8-9) 2001 pp. 1229–1237. doi.org/10.1016/S0898-1221(01)00235-8. |

[6] | S. De Leo, G. Ducati, and V. Leonardi, “Zeros of Unilateral Quaternionic Polynomials,” The Electronic Journal of Linear Algebra, 15(1), 2006 pp. 297–313.doi.org/10.13001/1081-3810.1240. |

[7] | D. Janovská and G. Opfer, “Computing Quaternionic Roots by Newton’s Method,” Electronic Transactions on Numerical Analysis, 26, 2007 pp. 82–102. |

[8] | D. Janovská and G. Opfer, “A Note on the Computation of All Zeros of Simple Quaternionic Polynomials,” SIAM Journal on Numerical Analysis, 48(1), 2010 pp. 244–256. doi.org/10.1137/090748871. |

[9] | M. I. Falcão, “Newton Method in the Context of Quaternion Analysis,” Applied Mathematics and Computation, 236, 2014 pp. 458–470. doi.org/10.1016/j.amc.2014.03.050. |

[10] | F. Miranda and M. I. Falcão, “Modified Quaternion Newton Methods,” in Computational Science and Its Applications (ICCSA 2014), Guimarães, Portugal, Lecture Notes in Computer Science, 8579 (B. Murgante et al., eds.), Berlin, Heidelberg: Springer, 2014 pp. 146–161. doi.org/10.1007/978-3-319-09144-0_ 11. |

[11] | M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Weierstrass Method for Quaternionic Polynomial Root-Finding,” Mathematical Methods in the Applied Sciences, 2017 pp. 1–15. doi:10.1002/mma.4623. |

[12] | M. I. Falcão and F. Miranda, “Quaternions: A Mathematica Package for Quaternionic Analysis,” in Computational Science and Its Applications (ICCSA 2011), Lecture Notes in Computer Science, 6784 (B. Murgante, O. Gervasi, A. Iglesias, D. Taniar and B. O. Apduhan, eds.), Berlin, Heidelberg: Springer, 2011 pp. 200–214. doi:10.1007/978-3-642-21931-3_17. |

[13] | F. Miranda and M. I. Falcão. “QuaternionAnalysis Mathematica Package.” w3.math.uminho.pt/QuaternionAnalysis. |

[14] | K. Gürlebeck, K. Habetha and W. Sprössig, Holomorphic Functions in the Plane and ‐Dimensional Space, Basel: Birkhäuser, 2008. doi.org/10.1007/978-3-7643-8272-8. |

[15] | B. Kalantari, “Algorithms for Quaternion Polynomial Root-Finding,” Journal of Complexity, 29(3–4) 2013 pp. 302–322. doi.org/10.1016/j.jco.2013.03.001. |

[16] | K. Weierstrass, “Neuer Beweis des Satzes, dass jede ganze rationale Function einer Veränderlichen dargestellt werden kann als ein Product aus linearen Functionen derselben Veränderlichen,” in Sitzungsberichte der Königlich Preussischen Akademie der Wissenschaften zu Berlin, 1891. |

[17] | E. Durand, Solutions numériques des équations algébriques. Tome I: Equations du type F(x); racines d’un polynôme, Paris: Masson,1960. |

[18] | K. Dočev, “A Variant of Newton’s Method for the Simultaneous Approximation of All Roots of an Algebraic Equation,” Fiziko-Matematichesko Spisanie. Bulgarska Akademiya na Naukite, 5(38), 1962 pp. 136–139. |

[19] | I. O. Kerner, “Ein Gesamtschrittverfahren zur Berechnung der Nullstellen von Polynomen,” Numerische Mathematik, 8(3), 1966 pp. 290–294. doi.org/10.1007/BF02162564. |

[20] | S. B. Prešić, “Un procédé itératif pour la factorisation des polynômes,” Comptes Rendus de l’Académie des Sciences Paris Série A, 262, 1966 pp. 862–863. |

M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Computational Aspects of Quaternionic Polynomials: Part 2,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-5. |

Available at: w3.math.uminho.pt/QuaternionAnalysis

Available at: content.wolfram.com/uploads/sites/19/2018/05/QPolynomial.m

M. Irene Falcão is an associate professor in the Department of Mathematics and Applications of the University of Minho. Her research interests are numerical analysis, hypercomplex analysis and scientific software.

Fernando Miranda is an assistant professor in the Department of Mathematics and Applications of the University of Minho. His research interests are differential equations, quaternions and related algebras and scientific software.

Ricardo Severino is an assistant professor in the Department of Mathematics and Applications of the University of Minho. His research interests are dynamical systems, quaternions and related algebras and scientific software.

M. Joana Soares is an associate professor in the Department of Mathematics and Applications of the University of Minho. Her research interests are numerical analysis, wavelets mainly in applications to economics, and quaternions and related algebras.

**M. Irene Falcão**

CMAT – Centre of Mathematics

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*mif@math.uminho.pt*

**Fernando Miranda**

CMAT – Centre of Mathematics

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*fmiranda@math.uminho.pt*

**Ricardo Severino**

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*ricardo@math.uminho.pt*

**M. Joana Soares**

NIPE – Economics Politics Research Unit

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*jsoares@math.uminho.pt*

dx.doi.org/doi:10.3888/tmj.20-4

This article discusses a recently developed Mathematica tool––a collection of functions for manipulating, evaluating and factoring quaternionic polynomials. relies on the package , which is available for download at w3.math.uminho.pt/QuaternionAnalysis.

Some years ago, the first two authors of this article extended the standard Mathematica package implementing Hamilton’s quaternion algebra—the package —endowing it with the ability, among other things, to perform numerical and symbolic operations on quaternion-valued functions [1]. Later on, the same authors, in response to the need for including new functions providing basic mathematical tools necessary for dealing with quaternionic-valued functions, wrote a full new package, . Since 2014, the package and complete support files have been available for download at the Wolfram Library Archive (see also [2] for updated versions).

Over time, this package has become an important tool, especially in the work that has been developed by the authors in the area of quaternionic polynomials ([3–5]). While this work progressed, new Mathematica functions were written to appropriately deal with problems in the ring of quaternionic polynomials. The main purpose of the present article is to describe these Mathematica functions. There are two parts.

In this first part, we discuss the tool, containing several functions for treating the usual problems in the ring of quaternionic polynomials: evaluation, Euclidean division, greatest common divisor and so on. A first version of was already introduced in [4], having in mind the user’s point of view. Here, we take another perspective, giving some implementation details and describing some of the experiments performed.

The second part of the article (forthcoming) is entirely dedicated to root-finding methods.

In 1843, the Irish mathematician William Rowan Hamilton introduced the quaternions, which are numbers of the form

where the imaginary units , and satisfy the multiplication rules

This noncommutative product generates the well-known algebra of real quaternions, usually denoted by .

**Definition 1**

The standard package adds rules to , , , and the fundamental . Among others, the following quaternion functions are included: , , , , , , and . In , a quaternion is an object of the form and must have real numeric valued entries; that is, applying the function to an argument gives .

The extended version allows the use of symbolic entries, assuming that all symbols represent real numbers. The package adds functionality to the following functions: , , , , , , , , , and . We briefly illustrate some of the quaternion functions needed in the sequel. In what follows, we assume that the package has been installed.

These are the imaginary units.

These are the multiplication rules.

Here are two quaternions with symbolic entries and their product.

The product is noncommutative.

Here are some basic functions.

The function , which was extended in through the use of de Moivre’s formula for quaternions, works quite well for quaternions with numeric entries.

contains a different implementation of the power function, , which we recommend whenever a quaternion has symbolic entries.

We refer the reader to the package documentation for more details on the new functions included in the package.

We focus now on the polynomial in one formal variable of the form

(1) |

where the coefficients are to the left of the powers. Denote by the set of polynomials of the form (1), defining addition and multiplication as in the commutative case and assuming the variable commutes with the coefficients. This is a ring, referred to as the ring of left one-sided (or unilateral) polynomials.

When working with the functions contained in , a polynomial in is an object defined through the use of the function , which returns the simplest form of , taking into account the following rules.

The function tests if an argument is a scalar in the sense that it is not a complex number, a quaternion number or a polynomial.

For polynomials in , the rules , , and have to be defined.

** **■ *Addition*

■** ***Product by a scalar*

■** ***Multiplication*

■** ***Power*

Example* *1

The polynomials and can be defined using their coefficients in in descending order.

We now define three particularly important polynomials, the first two associated with a given polynomial and the last one associated with a given quaternion .

**Definition 2**

With a polynomial as in equation (1) and a quaternion, define:

The first two polynomials are constructed with the functions and .

The built-in function now accepts a quaternion argument.

Observe that is a polynomial with real coefficients. For simplicity, in this context and in what follows, we assume that a quaternion with vector part zero is real.

Example* *2

Consider the polynomial of Example 1 and the quaternion .

The evaluation map at a given quaternion , defined for the polynomial given by (1), is

(2) |

It is not an algebra homomorphism, as does not lead, in general, to , as the next theorem remarks.

As usual, we say that is a zero (or root) of if An immediate consequence of Theorem 1 is that if , then is a zero of if and only if is a zero of .

A straightforward implementation of equation (2) can be obtained through .

As in the classical (real or complex) case, the evaluation of a polynomial can also be obtained by the use of Horner’s rule [3]. The nested form of equation (2) is

and the quaternionic version of Horner’s rule can be implemented as .

Example* *3

Consider again the polynomial . The problem of evaluating at can be solved through one of the following (formally) equivalent expressions.

Example* *4

We now illustrate some of the conclusions of Theorem 1 by considering the polynomials , and and the quaternion .

For the theoretical background of this section, we refer the reader to [6] (see also [7] where basic division algorithms in are presented). Since is a principal ideal domain, left and right division algorithms can be defined. The following theorem gives more details.

If and are polynomials in (with ), then there exist unique , , and such that

(3) |

(4) |

If in equation (3), , then is called a right divisor of , and if in equation (4), , is called a left divisor of . This article only presents right versions of the division functions; in both the left and right versions are implemented. The function performs the right division of two quaternionic polynomials, returning a list with the quotient and remainder of the division.

Example* *5

Consider the polynomials and .

Since , is a right divisor of and . On the other hand, does not right-divide (but it is a left divisor).

The greatest common (right or left) divisor polynomial of two polynomials can now be computed using the Euclidean algorithm by a basic procedure similar to the one used in the complex setting. The function implements this procedure for the case of the greatest common right divisor.

Example* *5 (continued)

and .

Before describing the zero set of a quaternionic polynomial , we need to introduce more concepts.

**Definition 3**

We say that a quaternion is congruent (or similar) to a quaternion (and write ) if there exists a nonzero quaternion such that .

This is an equivalence relation in that partitions into congruence classes. The congruence class containing a given quaternion is denoted by . It can be shown (see, e.g. [8]) that

This result gives a simple way to test if two or more quaternions are similar, implemented with the function .

For zero or equality testing, we use the test function.

It follows that if and only if . The congruence class of a nonreal quaternion can be identified with the three-dimensional sphere in the hyperplane with center and radius .

**Definition 4**

A zero of is called an isolated zero of if contains no other zeros of . Otherwise, is called a spherical zero of and is referred to as a sphere of zeros.

It can be proved that if is a zero that is not isolated, then all quaternions in are in fact zeros of (see Theorem 4); therefore the choice of the term spherical to designate this type of zero is natural. According to the definition, real zeros are always isolated zeros. Identifying zeros can be done, taking into account the following results.

A nonreal zero is a spherical zero of if and only if any of the following equivalent conditions hold:

3. The characteristic polynomial of is a right divisor of ; that is, there exists a polynomial such that .

Example* *6

We are going to show that the polynomial

has a spherical zero: and an isolated one: .

We first observe that both and are zeros of .

Now we use Theorem 4-1 to conclude that the zero is spherical, while the zero is isolated.

We can reach the same conclusion from Theorem 4-3.

Taking all this into account, the verification of the nature of a zero can be done using the function .

Consider the same polynomial and quaternions again.

We now list other results needed in the next section.

Let and . Then is a zero of if and only if there exists such that .

Any nonconstant polynomial in always has a zero in .

In this section, we address the problem of factoring a polynomial . We mostly follow [4]. As in the classical case, it is always possible to write a quaternionic polynomial as a product of linear factors; however the link between these factors and the corresponding zeros is not straightforward. As an immediate consequence of Theorems 5 and 6, one has the following theorem.

Any monic polynomial of degree in factors into linear factors; that is, there exist such that

(5) |

**Definition 5**

In a factorization of of the form (5), the quaternions are called factor terms of and the -tuple is called a factor terms chain associated with or simply a chain of .

If and are chains associated with the same polynomial , then we say that the chains are similar and write .

The function constructs a polynomial with a given chain, and the function checks if two given chains are similar.

The repeated use of the next result allows the constructions of similar chains, if any.

Theorem 8 can be implemented using the function .

Example* *7

This constructs chains similar to the chain .

Observe that , and are similar chains.

We emphasize that there are polynomials with just one chain. This issue is addressed in Theorem 12. For the moment, we just give an example of such a polynomial.

These computations lead us to the conclusion that the polynomial factors uniquely as .

The next fundamental results shed light on the relation between factor terms and zeros of a quaternionic polynomial.

Let be a chain of the polynomial . Then every zero of is similar to some factor term in the chain and conversely, every factor term is similar to some zero of .

Consider a chain of the polynomial . If the similarity classes are distinct, then has exactly zeros , which are given by:

(6) |

The function determines the zeros of a polynomial with a prescribed chain in the case where no two factors in the chain are similar quaternions, giving a warning if this condition does not hold.

Example* *8

Consider the polynomial . One of its chains is , and it follows at once that the similarity classes of the factor terms are all distinct. Therefore, we conclude from Theorem 10 that has four distinct isolated roots, which can be obtained with the following code.

On the other hand, the polynomial has as one of its chains. Since , one cannot apply Theorem 10 to find the roots of .

Observe that this does not mean that the roots of are spherical.

This issue will be resumed later in connection with the notion of the multiplicity of a zero. The following theorem indicates how, under certain conditions, one can construct a polynomial having prescribed zeros.

If are quaternions such that the similarity classes are distinct, then there is a unique polynomial of degree with zeros that can be constructed from the chain , where

where is the polynomial (6).

The function implements the procedure described in Theorem 11.

Example* *9

Consider the problem of constructing a polynomial having the isolated roots . We first determine one chain associated with these zeros.

Now we determine the polynomial associated with this chain.

Check the solution.

Let be a quaternionic polynomial of degree . Then is the unique zero of if and only if admits a unique chain with the property

(7) |

Moreover, if a chain associated with a polynomial has property (7), is a polynomial of degree such that is its unique zero and , then the polynomial (of degree ) has only two zeros, namely and .

We can now introduce the concept of the multiplicity of a zero and a new *kind* of zero. In this context, we have to note that several notions of multiplicity are available in the literature (see [9], [15–17]).

**Definition 6**

The multiplicity of a zero of is defined as the maximum degree of the right factors of with as their unique zero and is denoted by . The multiplicity of a sphere of zeros of , denoted by , is the largest for which divides .

Example* *10

The polynomial has an isolated root with multiplicity and an isolated root with multiplicity .

The polynomial has an isolated root with multiplicity and a sphere of zeros with multiplicity .

The polynomial has a mixed root with multiplicity and .

Finally, one can construct a polynomial with assigned zeros by the repeated use of the following result.

A polynomial with and as its isolated zeros with multiplicities and , respectively, and a sphere of zeros with multiplicity can be constructed through the chain

An alternative syntax for the function addresses the problem of constructing a polynomial (in fact it constructs a chain) once one knows the nature and multiplicity of its roots.

Example* *11

We reconsider here Example 6 of [4]. An example of a polynomial that has as a zero of multiplicity three, as a zero of multiplicity two and as a sphere of zeros with multiplicity two is

Of course this solution is not unique. For example, the polynomial

We confirm this using the function with the new syntax.

Here are two spherical roots corresponding to the same sphere.

Observe that the result is, of course, the same as this one.

Recall that a real root is always an isolated root, and two roots in the same congruence class cannot be isolated.

This article has discussed implementation issues related to the manipulation, evaluation and factorization of quaternionic polynomials. We recommend that interested readers download the support file to get complete access to all the implemented functions. The increasing interest in the use of quaternions in areas such as number theory, robotics, virtual reality and image processing [18] makes us believe that developing a computational tool for operating in the quaternions framework will be useful for other researchers, especially taking into account the power of Mathematica as a symbolic language.

In the ring of quaternionic polynomials, new problems arise mainly because the structure of zero sets, as we have described, is very different from the complex case. In this article, we did not discuss the problem of computing the roots or the factor terms of a polynomial; all the results we have presented assumed that either the zeros or the factor terms of a given polynomial are known. Methods for computing the roots or factor terms of a quaternionic polynomial are considered in Part II.

Research at the Centre of Mathematics at the University of Minho was financed by Portuguese Funds through FCT – Fundação para a Ciência e a Tecnologia, within the Project UID/MAT/00013/2013. Research at the Economics Politics Research Unit was carried out within the funding with COMPETE reference number POCI-01-0145-FEDER-006683 (UID/ECO/03182/2013), with the FCT/MEC’s (Fundação para a Ciência e a Tecnologia, I.P.) financial support through national funding and by the European Regional Development Fund through the Operational Programme on “Competitiveness and Internationalization – COMPETE 2020” under the PT2020 Partnership Agreement.

[1] | M. I. Falcão and F. Miranda, “Quaternions: A Mathematica Package for Quaternionic Analysis,” in Computational Science and Its Applications (ICCSA 2011), Lecture Notes in Computer Science, 6784 (B. Murgante, O. Gervasi, A. Iglesias, D. Taniar and B. O. Apduhan, eds.), Berlin, Heidelberg: Springer, 2011 pp. 200–214. doi:10.1007/978-3-642-21931-3_17. |

[2] | F. Miranda and M. I. Falcão. “QuaternionAnalysis Mathematica Package.” w3.math.uminho.pt/QuaternionAnalysis. |

[3] | M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Evaluation Schemes in the Ring of Quaternionic Polynomials,” BIT Numerical Mathematics, 58(1), pp. 51–72. doi:10.1007/s10543-017-0667-8. |

[4] | M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Mathematica Tools for Quaternionic Polynomials,” in Computational Science and Its Applications (ICCSA 2017), Lecture Notes in Computer Science, 10405, (O. Gervasi, B. Murgante, S. Misra, G. Borruso, C. M. Torre, A. M. A. C. Rocha, D. Taniar, B. O. Apduhan, E. Stankova and A. Cuzzocrea, eds.), Berlin, Heidelberg: Springer, 2017 pp. 394–408. doi:10.1007/978-3-319-62395-5_27. |

[5] | M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Weierstrass Method for Quaternionic Polynomial Root-Finding,” Mathematical Methods in the Applied Sciences, 2017 pp. 1–15. doi:10.1002/mma.4623. |

[6] | N. Jacobson, The Theory of Rings (Mathematical Surveys and Monographs), New York: American Mathematical Society, 1943. |

[7] | A. Damiano, G. Gentili and D. Struppa, “Computations in the Ring of Quaternionic Polynomials,” Journal of Symbolic Computation, 45(1), 2010 pp. 38–45. doi:10.1016/j.jsc.2009.06.003. |

[8] | F. Zhang, “Quaternions and Matrices of Quaternions,” Linear Algebra and Its Applications, 251, 1997 pp. 21–57. doi:10.1016/0024-3795(95)00543-9. |

[9] | B. Beck, “Sur les équations polynomiales dans les quaternions,” L’ Enseignement Mathématique, 25, 1979 pp. 193–201. |

[10] | A. Pogorui and M. Shapiro, “On the Structure of the Set of Zeros of Quaternionic Polynomials,” Complex Variables. Theory and Application, 49(6), 2004 pp. 379–389. doi:10.1080/0278107042000220276. |

[11] | B. Gordon and T. S. Motzkin, “On the Zeros of Polynomials over Division Rings,” Transactions of the American Mathematical Society, 116, 1965 pp. 218–226.doi:10.1090/S0002-9947-1965-0195853-2. |

[12] | T.-Y. Lam, A First Course in Noncommutative Rings, New York: Springer-Verlag, 1991. |

[13] | I. Niven, “Equations in Quaternions,” The American Mathematical Monthly, 48(10), 1941pp. 654–661. www.jstor.org/stable/2303304. |

[14] | R. Serôdio and L.-S. Siu, “Zeros of Quaternion Polynomials”. Applied Mathematics Letters, 14(2), 2001 pp. 237–239. doi:10.1016/S0893-9659(00)00142-7. |

[15] | R. Pereira, Quaternionic Polynomials and Behavioral Systems, Ph.D. thesis, Departamento de Matemática, Universidade de Aveiro, Portugal, 2006. |

[16] | G. Gentili and D. C. Struppa, “On the Multiplicity of Zeroes of Polynomials with Quaternionic Coefficients,” Milan Journal of Mathematics, 76(1), 2008 pp. 15–25.doi:10.1007/s00032-008-0093-0. |

[17] | M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Quaternionic Polynomials with Multiple Zeros: A Numerical Point of View,” in 11th International Conference on Mathematical Problems in Engineering, Aerospace and Sciences (ICNPAA 2016), La Rochelle, France, AIP Conference Proceedings, 1798(1), 2017 p. 020099. doi:10.1063/1.4972691. |

[18] | H. R. Malonek, “Quaternions in Applied Sciences. A Historical Perspective of a Mathematical Concept,” in 17th International Conference on the Applications of Computer Science and Mathematics in Architecture and Civil Engineering (IKM 2003) (K. Gürlebeck and C. Könke, eds.), Weimar, Germany, 2003. |

M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Computational Aspects of Quaternionic Polynomials,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-4. |

Available at: w3.math.uminho.pt/QuaternionAnalysis

Available at: content.wolfram.com/uploads/sites/19/2018/05/QPolynomial.m

M. Irene Falcão is an associate professor in the Department of Mathematics and Applications of the University of Minho. Her research interests are numerical analysis, hypercomplex analysis and scientific software.

Fernando Miranda is an assistant professor in the Department of Mathematics and Applications of the University of Minho. His research interests are differential equations, quaternions and related algebras and scientific software.

Ricardo Severino is an assistant professor in the Department of Mathematics and Applications of the University of Minho. His research interests are dynamical systems, quaternions and related algebras and scientific software.

M. Joana Soares is an associate professor in the Department of Mathematics and Applications of the University of Minho. Her research interests are numerical analysis, wavelets mainly in applications to economics, and quaternions and related algebras.

**M. Irene Falcão**

CMAT – Centre of Mathematics

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*mif@math.uminho.pt*

**Fernando Miranda**

CMAT – Centre of Mathematics

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*fmiranda@math.uminho.pt*

**Ricardo Severino**

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*ricardo@math.uminho.pt*

**M. Joana Soares**

NIPE – Economics Politics Research Unit

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*jsoares@math.uminho.pt*

dx.doi.org/doi:10.3888/tmj.20-3

The action of Möbius transformations with real coefficients preserves the hyperbolic metric in the upper half-plane model of the hyperbolic plane. The modular group is an interesting group of hyperbolic isometries generated by two Möbius transformations, namely, an order-two element and an element of infinite order . Viewing the action of the group elements on a model of the hyperbolic plane provides insight into the structure of hyperbolic 2-space. Animations provide dynamic illustrations of this action.

This article updates an earlier article [1].

Transformations of spaces have long been objects of study. Many of the early examples of formal group theory were the transformations of spaces. Among the most important transformations are the *isometries*, those transformations that preserve lengths. Euclidean isometries are translations, rotations and reflections. The groups and subgroups of Euclidean isometries of the plane are so familiar to us that we may not think of them as revealing much about the space they transform. In hyperbolic space, however, light traveling or even a person traveling on a hyperbolic shortest-distance path tends to veer away from the boundary. Thus, the geometry is unusual enough so that viewing the actions of isometries of hyperbolic 2-space reveals some of the shape of that space. Two-dimensional hyperbolic space is referred to as the *hyperbolic plane*.

Here are graphic building blocks used for all of the animations.

Figure 1 shows four cyan and white regions, each bounded by some combination of three arcs or rays. Any two adjacent regions make up a *fundamental region*. The two fundamental regions shown on either side of the axis (each with one white and one cyan half) are related by the function , which is an inversion over the unit circle composed with a reflection in the axis.

**Figure 1. **Two fundamental domains on either side of the axis.

This article examines how elements of the modular group rearrange the triangular-shaped regions shown in Figure 2. The curved paths are arcs of circles orthogonal to the axis. Arcs on these circles are hyperbolic geodesics, that is, shortest-distance paths in hyperbolic 2-space. In Euclidean space, the shortest-distance paths lie on straight lines. In hyperbolic space, shortest paths lie on circles that intersect the boundary of the space at right angles. Hyperbolic distances are computed as if there is a penalty to pay for traveling near to the plane’s boundary. Thus, the shortest-distance paths between two points must bend away from the boundary.

**Figure 2. **The upper half-plane model of the hyperbolic plane.

In the animations that follow, it is instructive to focus on the action that a transformation takes on the family of circles that meet the axis at right angles. The transformations that we consider, namely members of the modular group, preserve this family of circles. The circles in the family are shuffled onto different members in the family, but no new circles are created and none are taken away. One could say that in the context of hyperbolic geometry, the transformations preserve the family of all shortest-distance paths. Indeed this is an excellent thing for an isometry to do!

The context of this article can be found described in Chapter 2 of [2]. In this small text, one can find illustrations that inspired our animations. The formulas, which made coding the animations much simpler than one might expect, are given and justified in detail.

First consider a class of functions known as Möbius transformations. These transformations are named after the same mathematician with whom we associate the one-sided, half-twisted Möbius band. Möbius transformations are defined by

Here stands for the complex numbers. Over the reals, a Möbius transformation with real coefficients falls into one of two categories: either , and the graph is a straight line, or , and the graph is a hyperbola. A representation of this latter type of function is shown in Figure 3.

**Figure 3. **Graph of shown with dashed asymptotes.

Our purpose is to investigate how Möbius transformations stretch and twist regions in the *extended complex plane*. The *complex plane* is the usual Euclidean plane with each point identified as a complex number—namely, . The *extended complex plane* is formed from the complex plane by adding the point at infinity. A Möbius transformation is one-to-one (injective) on the extended complex plane .

When a Möbius transformation acts on a complex number, , we may view the action as moving the point to the point Importantly, a Möbius transformation maps the set of circles and lines in back to in . A comprehensive proof of this fact may be found in most elementary texts on complex variables, for example, in [3], p. 158.

The figures of our animations live in the extended complex plane. Each point of a figure, taken as a complex number, is acted on by the Möbius transformations. These transformations spin hyperbolic 2-space about a fixed point or shift the space in one direction or another.

The *modular group* is a special class of Möbius transformations:

That is, if , the coefficients of are integers and the coefficient matrix has determinant equal to one.

What is a group? Recall that a *group* is a set together with a binary operation satisfying certain properties: (1) the set must be closed under the operation; (2) the operation must be associative; and (3) there must be an identity element for the operation, and all inverses of elements in must themselves be elements of . The proof that the modular group is, in fact, a group under the operation of function composition is a standard exercise in a course on complex analysis. (See, for example, [3], p. 277–278.)

We take as established that the elements of the modular group do indeed form a group and investigate some of the interesting subgroups.

One of our main goals is to investigate how the elements of the modular group act on fundamental regions. That is to say, how the regions are stretched and bent when we view them as Euclidean objects. As hyperbolic objects, the regions are all carbon copies of each other, in much the same way that the squares on a checkerboard are all identical in ordinary Euclidean geometry.

In general, a group of one-to-one transformations acting on a topological space partitions that space into *fundamental regions*. For a collection of sets to be a collection of fundamental regions, certain properties must hold. First and foremost, the must be pairwise disjoint. Second, given any transformation in the group other than the identity, and are disjoint. Finally, given any two regions and , there exists some transformation such that .

Generally, in order to cover the entire space without overlapping, each fundamental region must contain some but not all of its boundary points. This technicality is set aside for the purposes of this article.

In fact, in this article we relax the definition to include *all* of the boundary points for a particular fundamental region. Thus, adjacent fundamental regions can only overlap on their boundaries. The essential feature remains that there is no *area* in the intersection of adjacent regions.

A group of transformations does not necessarily yield a unique partition of the space into fundamental regions. Thus, the fundamental regions we view are merely representative fundamental regions.

Figure 4 shows a fundamental region of the modular group with some parts highlighted.

**Figure 4. **A fundamental region with vertices marked and a pair of tangents.

Each fundamental region contains four vertices that can be fixed by elements of the modular group. (A point is fixed by if .) Tangents are drawn at one vertex; the angle is 60 degrees. The vertex at the top has a straight, 180 degree angle. The vertex at the bottom has a zero degree angle because the tangents to the intersecting arcs coincide there. Any hyperbolic polygon with a vertex on the boundary of the space (the axis in this case of the upper half-plane) has a zero degree angle at that vertex. The corresponding four angles in each fundamental region have the same measures as those indicated here. Each vertex can be fixed by some element in the modular group. Further, each fundamental region can be mapped onto any other fundamental region by an element of the modular group.

A classic view of the matter is to see the upper half-plane as tessellated (or tiled) by triangular-shaped regions, as in Figure 2. A checkerboard tessellation of the Euclidean plane can be constructed by sliding copies of a square to the left, right, up and down. Eventually, the plane is covered with square tiles. The modular group tessellates the hyperbolic plane in an analogous way. The elements of the group move copies of a fundamental region until triangular-shaped tiles cover the upper half-plane model of the hyperbolic plane. Of course, these tiles do not appear to be identical to our eyes, trained to match shapes and lengths in Euclidean geometry. However, the triangular-shaped tiles are all identical if measured using the hyperbolic metric. In the tiling process, all areas in the upper half-plane are covered by tiles and no two tiles have any overlapping area. In fact, this procedure is precisely how the hyperbolic plane illustration was constructed. The boundary points for a single fundamental region were acted on by function elements of the modular group, and the resulting points were drawn as a boundary line in the illustration.

It helps to note that each transformation in has at least one *fixed point*. Some transformations in have two fixed points. Only the identity map has more than two. In the illustrations that follow, we observe the placement of fixed points and the way transformations map fundamental regions near them.

The hyperbolic metric is a rather curious metric that challenges our notion of distance. Under the hyperbolic metric in the upper half-plane, the shortest distance between two points is along a vertical line or an arc of a circle perpendicular to the boundary (the real axis). For example, the shortest hyperbolic path between the points and is the top arc of the circle , which passes through both points and is perpendicular to the real axis (Figure 5).

**Figure 5. **The shortest hyperbolic path between the points and .

Without discussing precisely how hyperbolic lengths and areas are measured, we state that every image under a transformation in the modular group is *congruent* to every other image under the hyperbolic metric. Thus, all of our fundamental regions shown in the animations are actually the same size in the hyperbolic metric. For a discussion on hyperbolic metrics, [4] is a good place to start.

We structure our investigation of the modular group by considering four cyclic subgroups. Recall that a *cyclic subgroup* can be generated by computing all powers of a single group element. The four cyclic subgroups we present are representative of the four possible types of subgroups found in the modular group.

For the first subgroup, consider the function ; it is a Möbius transformation with coefficients and coefficient matrix . The subscript indicates that is of order two in ; that is, , and so is its own inverse. In this case, generates a subgroup with only two elements, namely .

In this article, we adopt the standard notation that angular parentheses indicate the set of elements generated by taking products from the elements enclosed by the parentheses. Curly braces , on the other hand, enclose the delineated list of elements in a set.

The Möbius transformation , its inverse and the function are used for the motion in Figure 6.

**Figure 6. **Action of the order-two element .

The depicts the way in which maps the two fundamental regions shown in Figure 2 onto one another. In fact, the action of on the fundamental regions is to hyperbolically rotate them 180° onto each other about the central fixed point . The actual mapping is performed instantaneously without rotation. In particular, only the first, middle and final frames contain illustrations of fundamental regions. However, the sequence of intermediate mappings illustrates through animation the mapping properties of . In a later section, we discuss how the functions illustrated were broken into a composition of functions so that the hyperbolic nature of their motion was made continuous.

This example highlights the fact that vertical lines are paths of least distance in the upper half-plane model of the hyperbolic plane. Indeed, it is usual to view straight lines as circles that have radii with infinite length and that pass through the point at infinity. With this bending of the definition of a circle, a vertical line has all the characteristics required of a geodesic in the hyperbolic plane. Like the circles, a vertical line is perpendicular to the axis, which is the boundary of the upper half-plane model of the hyperbolic plane. A vertical line is the limit of a sequence of geodesic circles.

The second example (see Figure 7) is a subgroup of infinite order generated by the linear shift (or translation) ; it is a Möbius transformation with and matrix . The function has infinite order because , and no point of ever returns to its original position no matter how many times is applied, though in the extended plane. The subgroup produced by taking all powers of and its inverse is denoted as .

Every point in the plane shifts one unit to the right under the action of . The infinite half-strips in the following are images of each other under powers of . For contrast, we also provide images of these infinite half-strip regions under the map . These images are bunched in a flower-like arrangement attached to the real axis at the origin. As the blue infinite regions are pushed from left to right, their magenta images echo their motion in a counterclockwise direction. These two actions are not produced by a single transformation. The two transformations that cause these actions are closely related to each other as algebraic conjugates, but more on that in a later section.

**Figure 7. **This animation shows copies of fundamental regions moving back and forth, with corresponding regions anchored at the origin.

The hyperbolic isometry is notable among the elements of the modular group because it is also a Euclidean isometry. Under the hyperbolic metric, the magenta regions are each congruent to the half-strip regions in blue.

The third cyclic subgroup of that we consider is generated by the composition of the first two functions and : define . The subgroup generated by this element is denoted , a subgroup of order 3.

Here is the fixed point of .

Define and its inverse .

In Figure 8, the function moves the fixed point to the origin and moves it back.

The function is an order-three hyperbolic rotation made continuous; it is used in Figures 8 and 11.

**Figure 8. **Action of the order-three element .

A red fundamental region and a green fundamental region are shown associated with the blue fundamental region attached to the origin in the animation’s first frame. We include these in order to provide a better orientation for the scene. Of special interest is how the point of the blue region on the axis moves as the rotation takes place. The point begins at the origin and slides toward the right along the positive axis. The blue lines of the cluster become vertical precisely when that point arrives at the point at infinity! The point continues by sliding along the negative axis to arrive back at the origin. It is fair to say that the motion of a point as it passes through the origin is a “mirror image” of the motion of the point as it passes through the point at infinity. The function used here is a composition of Möbius transformations that is described and demonstrated in Figure 12.

The rotations we saw in the action of and are of orders two and three, respectively. That is to say, after the rotation is repeated a number of times, all points are back to their original positions. In contrast, the function generates an infinite subgroup. When we iterate , the right shifts accumulate at the point at infinity. Points in the left half-plane get repelled by infinity, while points in the right half-plane get attracted to infinity. Of course, since all points in the left half-plane eventually map to points in the right half-plane, all points are, in some sense, simultaneously attracted to and repelled by infinity under the action of . Indeed, the point at infinity is the single fixed point for the action of .

The transformation in the modular group generates an infinite subgroup that differs from in the sense that has two distinct fixed points, an attractor and a repeller.

Define the well-known golden ratio ; its reciprocal is .

This defines with fixed points and .

Define to move the fixed points and to infinity and zero, respectively.

Define to be the inverse of ; sends infinity back to and zero back to .

For Figure 9, makes the hyperbolic translation continuous. The ratio is the length of the hyperbolic translation.

**Figure 9. **Action of the hyperbolic element .

The depicts the action of on fundamental regions in the plane. All points exterior to the red circle on the left are mapped to the interior of the green circle on the right. The animation begins with regions that lie exterior to the red and green circles. These regions are all mapped to the area between the green circle and smaller cyan circle. If the action of were to be repeated, the regions would be mapped into the interior of increasingly small circles inside the smallest (cyan) circle shown. The attracting fixed point for lies within these shrinking, nested circles.

The rotations and translations we have seen as examples are intimately related to Euclidean rotations and translations, as discussed in Section 6. The transformation is related in a similar way to a Euclidean dilation, which turns a figure into a similar but not congruent image figure. A curious characteristic of hyperbolic space is that the distinction between similarity and congruence disappears. In the hyperbolic plane, it is enough for two figures to have the same angles to guarantee congruence. In marked contrast to Euclidean space, equal angles guarantee that corresponding side lengths are equal in the hyperbolic metric.

The entire group can be generated by the two functions and . In symbols, . Establishing this fact requires tools from linear algebra about which we make only a few brief comments. The group of matrices with real number entries and with nonzero determinants is denoted by . This group has been studied extensively and much is known about it. Thus, there are great advantages for any group that can be represented as a subgroup of . While the modular group cannot be represented in exactly this way, it almost can be. For instance, the elements and are considered distinct in . On the other hand, the actions of the two associated Möbius transformations are identical, since . In general, for every element in the modular group, there are two associated elements in .

A remarkable feature of Möbius transformations is that the group operation of compositions produces coefficients that are identical to the results of matrix multiplication. To see this, consider the two Möbius transformations and . First, multiply the associated matrices.

Second, carry out the composition of the two functions.

The coefficients and the four matrix entries are the same!

In this way, the group operation of composition of functions in the modular group can be replaced with the group operation of matrix multiplication in . It is down this path we would travel if we were to present a complete proof of the claim that the modular group is generated by the two elements and .

A major part of this claim is that any element of can be written as a composition of and Consider the following examples of compositions.

Each of these functions has a coefficient matrix with determinant equal to one. A worthy exercise for undergraduate mathematics students is to verify by direct computations that each equality holds for the indicated compositions.

The four examples of cyclic subgroups outlined in Section 4 give a complete description of the four types of subgroups possible in the modular group. Any subgroup of the modular group is *conjugate* to a subgroup generated by , , an iterate of or an iterate of an element of the same type as

Recall that in a group , a subgroup is *conjugate* to a subgroup if there exists an element such that the entire subgroup can be generated by computing for every More compactly, we write , and even more compactly, .

Here we define the hyperbolic translation that relates two order-three elements: and .

Consider a function in the modular group that generates an order-three subgroup (Figure 10).

**Figure 10. **The action of the order-three function on a selection of fundamental regions.

We view side by side the actions of on the right and on the left. For the figure on the left, first the function moves the fixed point of onto the fixed point of . Then the function rotates the attached fundamental regions, as we have seen it do before, while at the same time the function acts on the right-hand figure. Finally, the inverse of returns the fixed point and associated regions to the original position, except that the fundamental regions have been rotated in the same way as those on the right. Thus, the final results are the same in both cases.

The function is used for the continuous motion in Figure 11.

**Figure 11. **The action of and .

This animation demonstrates what it means for two functions to be conjugate equivalent.

All functions in the modular group are abruptly discontinuous; that is, their actions move triangular regions onto other regions all in one jump. The facility to produce transformations that seem continuous is due to the following.

Every element of the modular group is conjugate equivalent to one of three Euclidean transformations, namely a rotation about the origin, a scaling from the origin or a rigid translation of the entire plane ([2], pp. 12–20).

These Euclidean transformations have very simple continuous forms:

Rotation: .

Scaling: .

Translation: .

The left-hand side of Figure 12 shows the fixed point of translated to the origin. Following this transformation, all circles that passed through the original fixed point become straight lines passing through the origin. A Euclidean rotation about the origin accomplishes the desired rearrangement of the regions. Finally, translating the fixed points back to their original positions maps the fundamental regions to their proper, final positions. We see that the final results are the same for the right and left animations.

Indeed, each frame in the right-hand animation was computed by composing the functions that are explicitly portrayed in the left-hand animation.

**Figure 12. **Conjugation with rotation of 120°.

In this way, the action of the hyperbolic motions can be animated as continuous because the Euclidean rotations, translations and dilations can all be coded as continuous functions.

[1] | P. McCreary, T. J. Murphy and C. Carter, “The Modular Group,” The Mathematica Journal, 9(3), 2005. www.mathematica-journal.com/issue/v9i3. |

[2] | L. Ford, Automorphic Functions, New York: McGraw-Hill, 1929. |

[3] | N. Levinson and R. M. Redheffer, Complex Variables, San Francisco: Holden-Day, 1970. |

[4] | E. W. Weisstein. “Hyperbolic Metric” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/HyperbolicMetric.html. |

P. R. McCreary, T. J. Murphy and C. Carter, “The Modular Group,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-3. |

**Paul R. McCreary**

*The Evergreen State College-Tacoma
Tacoma, WA*

**Teri Jo Murphy**

*Department of Mathematics & Statistics
Northern Kentucky University
Highland Heights, KY*

**Christan Carter**

*Department of Mathematics
Xavier University of Louisiana
New Orleans, LA*