This article presents a numerical pseudo-dynamic approach to solve a nonlinear stationary partial differential equation (PDE) with bifurcations by passing from to a pseudo-time-dependent PDE . The equation is constructed so that the desired nontrivial solution of represents a fixed point of . The numeric solution of is then obtained as the solution of at a high enough value of the

pseudo-time.

The method described here can be applied to solve PDEs coming from different domains. However, it was initially developed to get the numerical solution of a stationary nonlinear PDE with a *bifurcation*. The method’s application to a broader class of equations is briefly discussed at the end of the article.

The term “bifurcation” describes a phenomenon that occurs in some nonlinear equations that depend on one or several parameters. These equations can be algebraic, differential, integral or integro-differential. At some values of a parameter, such an equation may exhibit a fixed number of solutions. However, as soon as the parameter exceeds a critical value (referred to as the *bifurcation point*), the number of solutions changes and either new solutions emerge or some old ones disappear. To be specific, we discuss the case of dependence on a single parameter .

The new solutions can emerge continuously at the bifurcation point. The norm of the solution exhibits a continuous though nonsmooth dependence on the parameter at the bifurcation point (left, Figure 1). An explicit example is in Section 4.5. A bifurcation at which the solution is continuous at the bifurcation point is referred to as *supercritical* or *soft*.

The behavior of the solution in the case of a *subcritical* or *hard* bifurcation is different: the norm of the solution is finite at the bifurcation point but has a jump discontinuity there (right, Figure 1).

**Figure 1.** Soft versus hard bifurcation. In the case of the soft bifurcation, the solution has a continuous dependence of the solution norm on the control parameter , with a kink at the bifurcation point, . In contrast, in the case of a hard bifurcation, the solution is discontinuous at the bifurcation point.

In this article, we focus only on the case of a nonlinear PDE with soft bifurcations; some peculiarities of hard bifurcations are briefly discussed in Section 5.3.

In the most general form, a nonlinear PDE can be written as:

(1) |

Here so that (1) indicates a system of nonlinear PDEs; is an -dimensional vector representing the dependent variable. The subscript indicates that is the solution of a stationary equation. Further, is a -dimensional vector. Finally, is a real numerical parameter. The system of equations (1) is analyzed in a domain subject to zero Dirichlet boundary conditions:

(2) |

Also assume that

(3) |

and thus represents a trivial solution of (1, 2).

It is convenient to separate out the linear part of the operator (1), which is often (though not always) representable in the form and to write it down in the following form:

(4) |

Here is a linear differential operator (such as, for example, the Laplace operator). Further, is the nonlinear part of the operator . The assumption that solves equation (1) implies that .

In its explicit form, we use the representation (4) only in Section 2.2, where we derive the critical slowing-down phenomenon. In all other cases, a general form of the dependence of equation (4) on is valid: and . Nevertheless, we stick to the form (4) for simplicity, while the generalization is straightforward.

Let us also consider an auxiliary equation

(5) |

that yields the linear part of the nonlinear equation (4). Equation (5) represents the eigenvalue problem, where the are its eigenfunctions and the are its eigenvalues, indexed by the discrete variable , provided the discrete spectrum of (5) exists. Let us assume that at least a part of the spectrum of (5) is discrete. We assume here that starts from zero: . The state with is referred to as the *ground state*.

Without proofs, we recall a few facts from bifurcation theory [1] valid for soft bifurcations of such equations.

Assume that the trivial solution is stable for some values of . As soon as the parameter becomes equal to the smallest discrete eigenvalue of the auxiliary equation (5), this solution becomes unstable. As a result, a nontrivial solution branches off from the trivial one. In the close vicinity of the bifurcation point , this solution has the asymptotics

(6) |

where is the set of eigenfunctions of the equation (5) belonging to the eigenvalue . The vector is the set of amplitudes. The scalar product stands for the expression . Here the index (where ) enumerates the eigenfunctions in the -dimensional subspace of the functional space where (5) has a nonzero solution. The exponent exceeds unity: .

There are a few methods available to determine . Listing them is out of the scope of this article. However, the simplest of these methods can be applied if there exists a generating functional enabling one to obtain the system of equations (1) as its minimum condition:

(7) |

where is the variational derivative. This functional we refer to as *energy* in analogy with physics. Substituting the representation (6) into the energy functional and integrating out the spatial coordinates, one finds the energy as a function of the amplitudes and parameter . Minimizing the energy with respect to the amplitudes yields the system of equations for the amplitudes, referred to as the *ramification equation*:

(8) |

Their solution is only accurate close to the bifurcation point . Assuming that the bifurcation takes place with decreasing (as is the case in the following example), one finds the typical solution for the amplitudes,

(9) |

where and are real numbers to be determined using the original equation. One of the methods to analytically find these parameters is discussed in Section 3. Further analytical methods may be found in [1]. This article focuses on finding these parameters numerically (Section 4.5).

All theorems and proofs for the preceding statements, along with more general methods of the derivation of the ramification equation, can be found in [1].

The bifurcation theory formulated so far is quite general: equation (1) can be differential, integral or integro-differential [1]. In what follows, we focus only on a more specific class of nonlinear partial differential equations.

The solution of the spectral system of equations (5) yields the bifurcation point ; the solutions (6) and (9) are only valid very close to this point. With increasing , the solution soon deviates from the correct behavior quantitatively, and the solution often fails to resemble (6) even qualitatively. For this reason, to get the solution at some finite that would be correct both qualitatively and quantitatively, one needs to solve (1) numerically.

In the case of a hard bifurcation, none of the machinery of the theory of soft bifurcations described so far works. Studying the bifurcation numerically often becomes the only possibility.

However, the direct numerical solution of nonlinear equations like (1) and (4) with some nonlinear solvers only returns the trivial solution for equation (4), even at the values of the parameter at which the trivial solution is unstable and a stable nontrivial solution already exists.

A plausible reason may be as follows: the solver starts to construct the PDE solution from the boundary. Here, however, the boundary condition is already part of the trivial solution. Thus the solver appears to be placed at the true solution of the equation and is then unable to climb down from it.

To find a nontrivial solution, one needs to use a method that would start from some initial approximation that, even if rough, should be quite different from the trivial solution. Furthermore, this method should converge to the nontrivial solution by a chain of successive steps.

One can do this with the pseudo-dynamic approach formulated in the present article.

Let us introduce pseudo-time . The word “pseudo” indicates that is not real time. It just represents a technical trick that helps with the simulation. Assume now that the dependent variable is a function of both the set of spatial coordinates and the pseudo-time: . Instead of the stationary equation (1), let us study the behavior of the pseudo-time-dependent equation:

(10) |

One solves equation (10) with a suitable nonzero initial condition . Let us stress that the solution of the time-dependent equations (10) is not the same as the solution of the stationary equation (1).

One could also construct the pseudo-time-dependent equation as follows: , that is, with a minus sign in front of . The idea of such an extension is that either or exhibits a fixed point, so that , while the other diverges as . By trial and error, one chooses the equation whose solution converges to the fixed point .

The operator has not yet been specified; for definiteness let us assume that the fixed point at takes place for equation (10), that is, with the plus sign in front of .

The convergence of the solution of the dynamic equation to the fixed point enables one to apply the following strategy. Instead of the static equation (1), which is difficult to solve numerically, one simulates the quasi-dynamic equation (10) using a suitable time-stepping algorithm.

The advantage of this approach is in the possibility of starting the simulation from an arbitrary distribution chosen as the initial condition, provided it agrees with the boundary conditions. From the very beginning, such a choice takes one away from the trivial solution. The time-stepping process takes the initial condition for each step from the previous solution. The solution starting from any function gradually converges to with time if belongs to its attraction basin.

After having obtained the solution of the pseudo-time-dependent equation, one approximates the function , as at a large enough value of the pseudo-time . The meaning of the words “large enough” is clarified in Section 4.3.

The approach can be given a pictorial interpretation (Figure 2). In the infinite-dimensional functional space, let be an infinite set of basis functions. Then the function can be represented as

(11) |

**Figure 2.** Schematic view of the 3D projection of the infinite-dimensional functional space with a trajectory from the initial state (blue dot) to the fixed point (red dot).

The trajectory in this space goes from the initial state to the final state , as shown by the two dots.

The time derivative represents the velocity of the motion of a point through this space, while can be regarded as a force driving this point. Thus equation (10) can be interpreted as describing a driven motion of a massless point particle with viscous friction through the functional space. In these terms, the condition (1) means that the driving force is equal to zero at some point of the space, which is the location of the fixed point of the nonlinear equation (10).

If the energy functional for equation (1) exists, one can make one further step in the interpretation (Figure 3).

**Figure 3.** Schematic view of the energy functional as the function of the coordinate in the functional space (A) above and (B) below the bifurcation point. The cross section of the infinite-dimensional space along a single coordinate is shown. The points show initial positions of the particle, while the arrows indicate its motion to the nearest minimum of the potential well.

Indeed, according to the definition given, equation (1) delivers a minimum to the energy functional. In this case, one can regard the dynamic equation (10) as describing a viscous motion of the massless point particle along a hypersurface in the -dimensional space, , the surface forming a potential well. The motion goes from some initial position to the minimum of the potential well as shown schematically in Figure 3. Above the bifurcation, this minimum only corresponds to the trivial solution (A) situated at . Below the bifurcation, the energy hypersurface exhibits a new configuration with new minima, while the previous minima vanish. As a result, below the bifurcation, the point particle moves from the initial position (shown by dots in Figure 3) to one of the newly formed minima (as the red and green arrows show in B). The functional space has infinite dimension, and essential features of the numeric process may involve several dimensions. The D representation displayed in Figure 3 is therefore oversimplified and only partially represents the bifurcation phenomenon.

Equation (10) can be rewritten as:

(12) |

Though lacking a stationary nonlinear solver at present, Mathematica offers the option , efficiently applicable to dynamic equations like (12). This method is applied everywhere in the rest of this article.

The evident penalty of this approach is that the computation time can become large, especially in the vicinity of the bifurcation point; this peculiarity is discussed next.

Close to the critical point , the relaxation of the solution to the fixed point dramatically slows down. This is referred to as *critical slowing down*. Its origin is illustrated in Section 4. To simplify the argument, let us consider a single equation with the one-component dependent variable that still depends on the D-dimensional coordinate . The generalization for a system of equations is straightforward, though a bit cumbersome.

According to (6), close to the bifurcation point, one can look for the solution of equation (12) in the form:

(13) |

Ignore the higher-order terms, assuming that is small. Substitute (13) into the first equation (12) and linearize it. Here one should distinguish between the case at , where the linearization should be done around , and that at , where one linearizes with the center at (the second line of equation 9). In the former case, one finds

Making use of (5), one finally obtains the dynamic equation for at :

(14) |

implying that , and the relaxation time has the form .

At , analogous but somewhat more lengthy arguments give the characteristic time, twice as small as that above the critical point. One comes to the relation:

(15) |

One can see that the relaxation time diverges with from both sides. From the practical point of view, this suggests increasing the simulation time according to (15) near the critical point.

The result (15) is valid for equation (12), in which the linear part of the pseudo-dynamic equation has the form . That is, the parameter enters this equation only linearly, in the form of the product . In the general case , one still finds diverging relaxation time , though the factors (such as above, and below the bifurcation point) may be different.

The phenomenon of critical slowing down was first discussed in the framework of the kinetics of phase transitions [2].

As an example, let us study the 1D PDE:

(16) |

where is the dependent variable of the single coordinate . This equation exhibits a cubic nonlinearity . A classical Ginzburg–Landau equation only has constant coefficients for the terms and . In contrast, equation (16) possesses the inhomogeneity with

(17) |

shown by the solid line in Figure 4. It thus represents a nonhomogeneous version of the Ginzburg–Landau equation. One can see that (16) has the trivial solution .

**Figure 4.** The potential from equation (17) (solid, red) and the solution of the auxiliary equation (18) (dashed, blue).

Equations (16) and (17) play an important role in the theory of the transformation of types of domain walls into one another [3].

The auxiliary equation (5) in this case takes the following form:

(18) |

where enumerates the eigenvalues and eigenfunctions belonging to the discrete spectrum. One can see that equation (18) represents the Schrödinger equation [4] with potential well (17) and energy .

The exact solution of the auxiliary equation (18) is known [3, 4]. It has two discrete eigenvalues when and , and the ground-state () solution has the form

(19) |

which can be easily checked by direct substitution.

The energy functional generating the Ginzburg–Landau equation (16, 17) has the form:

(20) |

Equation (6) can be written as . Substituting that into equation (20) for the energy, eliminating the term with the derivative using equation (18) and applying the Gauss theorem, one finds the energy as a function of the amplitude :

(21) |

The *ramification* equation takes the form :

(22) |

with the following solution for the amplitude:

(23) |

Let us now look for the numerical solution of equation (16). The problem to be solved is to find the point of bifurcation and the overcritical solution at . The pseudo-time-dependent equation can be written as:

(24) |

The choice of the initial condition is not critical, provided it is nonzero. The method of lines employed in the following is relatively insensitive to whether or not the initial condition precisely matches the boundary conditions. We demonstrate its solution with three initial conditions

in the in the next section.

The method of lines is applied here since it can solve nonlinear PDEs, provided these equations are dynamic, which is exactly the case within the pseudo-time-dependent approach.

To address the problem numerically, let us start with the initial conditions taken at a finite distance, rather than at infinity. The distance must be greater than the characteristic dimension of the equation, which is the distance for which exhibits a considerable variation. For the Ginzburg–Landau equation (16), the characteristic dimension is defined by the width of the potential for (17), which is about 1. That is, let us start with the boundaries at with . We check the quality of the result obtained with such a boundary later.

To obtain a precise enough solution, one needs to make a spatial discretization providing a step comparable to the characteristic dimension of the equation, which we just saw is of the order of . Therefore, a step that is small enough can be a few times . The value appears to be enough.

The following code solves the equation. To keep the discretization with the step comparable to the characteristic equation dimension, we chose .

To avoid conflicts with variables that may have been previously set, this notebook has the setting Evaluation ▶ Notebook’s Default Context ▶ Unique to This Notebook.

According to Section 2, the time-dependent solution obtained converges to the solution of the stationary problem . In practice, however, one can instead take some finite value, provided that it is large enough.

We solve the pseudo-dynamic equation (24) with each of the three initial conditions stated before.

Further, in order to give the feeling of the method, we visualize and animate the solution, varying as well as the initial conditions. This requires a few comments. As discussed in Section 2.2, the maximum time of simulation strongly depends on . This is accounted for by introducing according to (15), where was chosen by trial so that the simulation does not last too long, but also so that the value of always ensures the convergence for any combination of and initial condition.

In the simulations, you can observe two essential features of the present method.

First, near the fixed point, the solution converges more slowly and the curve gradually appears to stop changing.

Second, near the critical point, close to , the critical slowing down (see Section 2.2) takes place, which requires considerably longer to approach the fixed point. In the animation, the curve evolves much more slowly at and , and the convergence, therefore, requires much more time.

In the , choose one of the three initial conditions and a value of . Click the button with the arrow to start the animation. The value of the current time is shown at the top-left corner. The distribution shown by the blue curve at corresponds to the initial condition, while at the animation shows its further evolution.

For each of the three initial conditions, the solution converges to the same bell-shaped curve. One can make sure that for low , the solution is nonzero. However, for greater than about 0.5, the solution is trivial.

To get an accurate solution, one needs to control the convergence as the pseudo-time increases. Here we control the convergence by analyzing the behavior of the integral

(25) |

(the norm of the solution in Hilbert space) at a fixed value of the parameter as a function of . The norm is zero above the bifurcation but nonzero below it.

We show how depends on the time limit at three fixed values of the control parameter : , and , which are all below the bifurcation point .

The following code makes a nested list containing three sublists corresponding to the three values. Each sublist consists of pairs at different values of the simulation time , which increases from 10 to approximately 3000. The exponential rate of increase is chosen so as to make the plot on a semilogarithmic scale look equally spaced (Figure 5).

**Figure 5.** Semilogarithmic plots of the Hilbert norm of the solution for (disks), (squares) and (diamonds) depending on the simulation time, .

There is convergence for all three values of . However, the value of for which the convergence is satisfactory depends on . For example, at the solution at slightly exceeding 100 is already near convergence. Thus, with , one can be sure that the solution is satisfactory. We use this in Section 4.4 to determine the expression for accounting for the critical slowing down.

In contrast, the solution for shows some evolution even at .

As we showed in Section 2.2, the value that gives satisfactory convergence depends on . To get an accurate solution, must considerably exceed the relaxation time . For example, in the calculation of the result shown in Figure 4, substituting and into (15), one finds , while the convergence only becomes good enough at , which is eight times greater than . This implies that to find an accurate solution in the close vicinity of the bifurcation point, one has to define depending on by

(26) |

where is the regularization parameter.

The bifurcation point can be found by analyzing the same integral calculated at in (26). Let us denote . This time we study the integral as a function of the parameter .

The transition from to occurs at the bifurcation point. Accordingly, the integral at this point changes from to .

To find the critical point, bifurcation theory (23) predicts the norm to be expressed in the form:

(27) |

We find the constant parameters and by fitting.

We now find the numerical solution of the equation (16) as a function of the control parameter ; the norm obtained from this solution depends on . We vary from 0.45 to to create a list consisting of pairs . The most critical region for dependence is close to the critical point, so the points there are taken to be about 10 times more dense. This list is fitted to the function (27). The list is plotted with the analytic function obtained by fitting (Figure 6).

**Figure 6.** Behavior of the Hilbert norm of the solution in the vicinity of the bifurcation point. Dots show the integrals (25), while the solid line indicates the result of fitting with the relation (27), yielding .

The values of the integrals at various are shown by the red dots in Figure 6, while its fitting curve is shown by the solid blue curve. The fitted value of the bifurcation point is and .

We used equation (26) for the used in the solution. However, this equation depends on the spectral value . In the present case, the value was known, which considerably simplifies the task. In general, the value of is only established in the course of the fitting procedure, requiring an iterative approach. For the first simulation, we fix some large enough value of independent of and obtain a fit. This fit gives the first guess for , which can then be used for the simulation with the equation (26). This procedure can be repeated until a satisfactory is achieved.

To check how the choice of the boundary affects the results, we solve the problem by gradually increasing (Figure 7). (This takes some time.)

**Figure 7.** A double-logarithmic plot showing the convergence of the bifurcation point with

increasing .

Figure 7 displays the error in the spectral value obtained by the numerical process. As one could have expected, with the increase of , it decreases from to about .

The preceding example has shown the application of the pseudo-dynamic approach for solving a 1D nonlinear PDE with zero boundary conditions that exhibits a supercritical (soft) bifurcation. That simple problem was chosen to keep the processing time as short as possible. Now possible extensions are discussed.

Recall that zero boundary conditions often (if not always) represent a problem for a nonlinear solver. Starting from along the boundary, such a solver often only returns the trivial solution, since zero is, indeed, the solution of the equation considered here. For this reason, a solution to a problem like the one discussed in this article necessarily requires some specific approach that can converge to a nontrivial solution. It is for this type of equation that the approach presented here has been developed.

One should, however, make two comments.

First, there are numerous problems where the bifurcation takes place from a solution that is nonzero. The boundary condition in this case has the form . A trivial observation shows that one comes back to the original problem by the shift .

Second, the approach formulated here can be applied to nonlinear equations with no bifurcation. These equations can have boundary conditions that are either zero or nonzero. Indeed, such equations can often be solved by a nonlinear solver if one is available. Among other approaches, the present one can be applied; the nonzero boundary conditions are not an obstacle for the transition to the pseudo-time-dependent equation.

Though the present approach takes longer, in certain cases it is preferable; for example, when due to a strong nonlinearity the nonlinear solvers fail. The solver moves along the pseudo-time parameter in small steps from to , gradually passing from the initial condition to the final solution. Such a slow ramping can be stable.

The space dimensionality does not limit the application of our approach (for 2D examples, see [5, 6]).

In the case of a soft bifurcation, the energy can have only one type of minimum, as shown in Figure 2 describing the convergence either to the trivial or the nontrivial solution. The trajectory always flows into the minimum along the steepest slope of . The minimum is a fixed point.

An essentially different situation occurs for a hard bifurcation, when the hypersurface may have multiple minimums. Figure 8 (A) shows a schematic cross section of the infinite-dimensional functional along the plane, leaving out all other dimensions. This cross section shows the situation with minima of different types. One of these minima is more pronounced than the others. The arrows schematically indicate the trajectories in the functional space. These start from the initial conditions displayed by the dots in Figure 8 (A, B) and converge to the minima (Figure 8 A). The green arrow shows the convergence of the process to the principal minimum, while the red one converges to a secondary minimum.

**Figure 8.** Schematic view of the energy functional along a direction of the functional space, where it exhibits a metastable minimum (A). The green point schematically indicates the initial condition starting from which the solution converges to the one corresponding to the principal energy minimum (green arrow), while the red dot shows the initial condition leading to the convergence to the secondary minimum. (B) The trajectory ends at an inflection point.

As a result, depending on the choice of initial condition, some solution trajectories may end up at a fixed point that is a secondary minimum rather than in the main one.

Also, keep in mind that the dimension of the functional space is infinite and can have many unobvious secondary minima.

There can also be inflection and saddle points of the energy hypersurface (Figure 8 B). The trajectory completely stops at such a point.

It is a fundamental question whether or not such secondary fixed points as well as the inflection points belong to the problem under study. The answer is not straightforward. One should look for such an answer based on the origin of the equation.

Let us also mention possible gently sloping valleys in the energy relief. In this case, the motion along such a shallow slope may appear practically indistinguishable from an asymptotic falling into a fixed point during the numerical process.

This article offers an approach to solve nonlinear stationary partial differential equations numerically. It is especially useful in the case of equations with zero boundary conditions that have both a trivial solution and nontrivial solutions. The approach is based on solving a pseudo-time-dependent equation instead of the stationary one, the initial condition being different from zero. Then the solver can avoid sticking to the trivial solution and is able to converge to a nontrivial solution. However, the penalty is increased simulation time.

[1] | M. M. Vainberg and V. A. Trenogin, Theory of Branching of Solutions of Non-linear Equations, Leyden, Netherlands: Noordhoff International Publishing, 1974. |

[2] | E. M. Lifshitz and L. P. Pitaevskii, Physical Kinetics: Course of Theoretical Physics, Vol. 10, Oxford, UK: Pergamon, 1981 Chapter 101. |

[3] | A. A. Bullbich and Yu. M. Gufan, “Phase Transitions in Domain Walls,” Ferroelectrics, 98(1), 1989 pp. 277–290. doi:10.1080/00150198908217589. |

[4] | L. D. Landau and E. M. Lifshitz, Quantum Mechanics: Course of Theoretical Physics, Vol. 3, 3rd ed., Oxford, UK: Butterworth-Heinemann, 2003. |

[5] | A. Boulbitch and A. L. Korzhenevskii, “Field-Theoretical Description of the Formation of a Crack Tip Process Zone,” European Physical Journal B, 89(261), 2016 pp. 1–18. doi:10.1140/epjb/e2016-70426-6. |

[6] | A. Boulbitch, Yu. M. Gufan and A. L. Korzhenevskii, “Crack-Tip Process Zone as a Bifurcation Problem,” Physics Review E, 96(013005), 2017 pp. 1–19. doi:10.1103/PhysRevE.96.013005. |

A. Boulbitch, “Pseudo-Dynamic Approach to the Numerical Solution of Nonlinear Stationary Partial Differential Equations,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-8. |

Alexei Boulbitch graduated from Rostov University (USSR) in 1980 and obtained his Ph.D. in theoretical solid-state physics in 1988 from this university. In 1990 he moved to the University of Picardie (France) and later to the Technical University of Munich (Germany). The Technical University of Munich granted him his habilitation degree in theoretical biophysics in 2001. His areas of interest are bacteria, biomembranes, cells, defects in crystals, phase transitions, physics of fracture (currently active), polymers and sensors (currently active). He presently works in industrial physics with a focus on sensors and gives lectures at the University of Luxembourg.

**Alexei Boulbitch**

*Zum Waldeskühl 12
54298 Igel
Germany*

This article is a summary of my book *A Numerical Approach to Real Algebraic Curves with the Wolfram Language* [1].

The nineteenth century saw great progress in geometric (real) and analytic (complex) algebraic plane curves. In the absence of an ability to do the large number of computations for a concrete theory, the twentieth century saw the abstraction to algebraic geometry of this material. Ideas of ideals, rings, fields, varieties, divisors, characters, sheaves, schemes and many types of homology and cohomology arose. The added benefit of this approach is that it became possible to apply geometric techniques to other fields. Probably the most striking accomplishment of this abstract approach was the solution of Fermat’s problem by Wiles and Taylor at the end of the century.

The plane geometric curve theory of the nineteenth century was collateral damage. All modern books on the subject want to follow the abstract approach, which raises the bar for those who want to know this theory. In addition, little attention was given to the concrete geometric theory. One goal of my book is to rectify this problem; substituting software for the abstract theory, we can give the theory in terms the non-mathematician can follow.

Since most algebraic curves have only finitely many rational points, I work numerically. The methods are constructive, heuristic and visual rather than the traditional theorem-proof of contemporary mathematics. In fact there is a fundamental oxymoron at the heart of my approach: a numerical algebraic curve is the solution set of an equation , where is a polynomial with integer or machine-number coefficients. Evaluating this polynomial at a point with machine-number coordinates gives a machine number on the left-hand side, while the right-hand side is a symbolic number, so actual equality is impossible. So my book is not an algebraic geometry book. Having worked during my career as a mathematician in both the abstract and numerical realms, I believe that while these approaches are incompatible, they can and should coexist within mathematics.

We will generally describe an algebraic plane curve by giving a polynomial in two variables with integer or real machine-number coefficients.

From an operational point of view, with an exception noted later, for a given curve we accept the output of `NSolve[{f,g},{x,y},{y,y _{0}}]` and

For example, suppose some calculation claims is a point on . (If you have set values in your session for , and so on, now is the time to store them if needed and apply to them.)

We find a random line containing and use to check the point of intersection of with .

We see the residue is not zero.

But can be reconstructed from .

It checks.

The simplest example of an algebraic plane curve is a line. The first problem for lines is to find the equation of a line through two given points. We give our solution, found at the beginning of Chapter 1 of [1], as it will give the flavor of our approach to this subject.

Let , be the given points. The desired equation is of the form

We thus consider the coefficients , , as unknowns, but , as coordinates of the given points. So we have two equations in the three variables , , .

But this system is underdetermined. It is also not symmetric in the variables, so we use a dummy variable and add a third equation to get the system

(1) |

where , , are random real numbers.

Suppose the points are and . Here are the random reals.

Then the line is constructed as follows.

Perhaps this is not what you expected. But we are working with machine numbers so, particularly if this is not our final answer, we should not mind. If this does still bother us, we can always look for integers.

But system (1) gives more options. Suppose instead we were given the point and slope 2. We can change the second equation by setting , and .

Since our original already had slope 2, we are now not surprised to get the same result. Now consider the possibility that our line was given parametrically.

This time we replace the second equation in (1) with , , and again . We solve the

new system.

This is the same answer, because again we have the line through and

We can put all of this into one program if we simply make the convention that a slope or direction vector is denoted by a triple with third coordinate 0. So here is our universal code for creating a line.

Our results will differ from the previous ones because we are now choosing the random numbers each run but normalizing the output. The advantage is that each run gives the same answer up to a factor of .

Computing a point far away from by taking in our parametric equation, we get approximately .

So we can consider to be the *infinite point* on the line. But putting in our function gave us the same thing, so these infinite points are *homogeneous*; that is, they can be multiplied by a scalar getting the same infinite point. Note also that adding a coordinate 1 to a coordinate pair *homogenizes* a Cartesian point of the plane.

In Chapter 5 of [1], we find that we have invented the *projective plane*. So that we do not get confused, we will henceforth call points (pairs) of our standard Cartesian plane *affine points* and the triples *projective points*.

The method for finding equations of lines can be generalized to find curves of degree through sufficiently general points. See [2] for the code of (here stands for affine).

We do define two families of curves that are used extensively as examples in [1]. The first are *Gaussian curves*. We start with a single variable polynomial , typically with integer coefficients but possibly complex integers such as . Replace by ; after expanding, the formal real part forms a curve. Gauss used this construction in his first and fourth proofs of the fundamental theorem of algebra, published 50 years apart.

For example, the following is said to be Gauss’s original example for the fourth proof. Note that it has a singular point!

A second family of curves I call Newton’s hyperbolas. Here can range from 1 to .

The *total degree* of a plane curve is an important invariant, but not quite as simple in the numerical case as it may seem. Small coefficients of the highest degrees matter little near the origin but strongly affect the asymptotic and infinite behavior of the curve. Therefore, we approach this symbolically using .

Sometimes a little care is necessary to make sure that coefficients that are the result of roundoff error only are not allowed to increase the degree; a judicious use of may be required.

Because we are often working numerically, we use a slightly stronger criterion for a plane curve to be called regular at a point on . The quantity is known as the *Jacobian determinant* of the intersection of the curve and line at . We say *is a regular point of * if the Jacobian is not numerically zero for almost all pairs , of machine numbers. In practice, this can be checked by letting , be random real numbers. For a regular point , a *tangent line* is defined as follows.

On the other hand, a point of is called *singular* if the Jacobian is zero at for all numbers , . Again, in practice it is enough to check for a random pair , .

An alert reader may notice that since we are working constructively, *regular* and *singular* are not logical negations of each other, but a practical test does distinguish regular from singular points.

An important kind of point for [1] is a *critical point*. A point on curve is critical if it is also on the curve defined by . All real critical points of a curve can be found easily in practice by the following.

Unlike the conditions regular and singular, which are invariant under transformations such as translation, being a critical point is a positional property. Among the critical points are local extrema of the distance from the origin to a point on the curve and, by our definition, singular points. The most important thing about critical points is that *every affine topological component of a plane curve contains at least one critical point*. This means that from our simple function for finding critical points, we will be able to locate all components on the curve, no matter how small—even one-point components.

Consider the following contrived example of a numerical cubic curve, which has an isolated point.

The point with coordinates is a one-point component of the curve `h1`.

The same idea allows us to find the closest point on a curve to a given point in the plane.

In this case, the closest point may be one invisible on a plot.

We may also find the infinite points of a curve. Here is code that is slightly different from [1] but avoids subroutines. This uses a random variable so that different runs give the infinite points in possibly a different order.

Here is an example using Newton hyperbola 376.

We start with an idea Gauss used in his 1849 proof of the fundamental theorem of algebra characterizing real plane curves. Given a bivariate polynomial , Gauss considered the semialgebraic set . *The algebraic curve ** is the complete topological boundary in ** of *.

Among other things, this nicely solves our conundrum as to the precise meaning of the curve when is a polynomial with machine-number coefficients, as the inequality does make sense numerically.

Another consequence of this definition is that for each regular point of the curve, a line different from the tangent line intersecting the curve at this point travels from to the negative set at . We will see later in this section that a curve defined by a square-free has only finitely many singular points, so a contour plot gives a reasonable picture of the curve in a bounded region with appropriate scaling. Contour plots may miss large parts or all of the curve if the polynomial has a factor repeated an even number of times. Fortunately, if is a polynomial with integer coefficients, then the built-in function finds the repeated factors, and one can produce a square-free polynomial with the same curve. For machine-coefficient polynomials, there is a function given in Appendix 1 of [1] and in [2] that can check to see if is square free and if not, produce a square-free polynomial giving the same curve.

This last paragraph also tells us that the complement of a square-free curve is two colored, with the curve separating the colors. In particular, an algebraic real plane curve cannot have bifurcations [3]. That is, the following cannot be a plot of an algebraic curve.

There are always an even number of *branches* going in and out of singular points, an essential idea we will use in the next section.

For now, the main use of the Gauss point of view is that a square-free curve is oriented; that is, we can specify a direction of travel along the curve. In his proof, Gauss proposed “walking along the curve” with on our right. Essentially, we are traveling around topological components clockwise. As an aside, the curve Gauss was using is our Gaussian curve of the particular complex univariate polynomial that he was proving has a zero. Thinking of points of the plane as complex numbers, Gauss showed the walker would always stumble over a zero of .

We implement this by noting that for regular points, this right-hand direction is given by the vector , so we can use the following code (`g` stands for Gauss, `T` for tangent, and `vec` for vector).

Example: Consider the curve . (In the PDF and HTML versions, the graphic is not interactive.)

This leads to path tracing. In [1], we consider various methods, including using a method based on the built-in . Here, we use a very common method given by the following.

This function traces from point to point in the direction defined by with steps of size . By default, it stops after 40 steps, but that can be changed by an option. If is the wrong direction from , this fails with a warning. The direction can be changed by replacing the curve by . If there is a singular point in the path between and , then this will likely get hung up there. The key is that one can trace into a singularity, but not out. Normally, we use critical points for the endpoints , , but we may need to add points between singularities.

The bow curve is a good example of using path tracing.

We proceed as follows with the positive direction clockwise around the positive region, but always into the singularity at

In [1], we develop a number of utility functions to make tracing easier and do many examples, particularly of Gaussian curves. But the main point we are making is that a square-free curve can be reasonably approximated by a piecewise linear curve, and the instructions to do so can be given by a graph (network) consisting of the endpoints of each trace as vertices with the direction traveled, not traced, as directed edges. Here is the graph for the previous example.

In this section, we touch base with contemporary algebraic geometry. We operate in the *real and complex projective planes * and .

Our construction follows our discussion on lines in Section 2. A point in the real (or complex) projective plane is a triple of real (or complex) numbers so that not all of , , are zero. Two such triples that differ by a nonzero real (or complex) multiple are considered the same. For example, if , then loosely speaking, is an affine point. We called points infinite points; in the projective plane they are just points. Just as we added a variable for the third coefficient in the equation of a line, in the projective plane we again add a third variable for equations. We call this *homogenization*. Now we want all of our monomials to have the same total degree. The next function homogenizes a bivariate polynomial.

That is, if we are working with a polynomial of degree , a monomial is converted by . There is a 1-1 correspondence between two-variable monomials of total degree less than or equal to and three-variable monomials of degree exactly .

If particular , , with , then also, so being a zero is a property of the projective point . Thus *projective curves* are the zero set of homogeneous polynomials in three variables. Also in the example for the bow curve , is a point of the homogenization of , which means is an infinite point of .

The opposite of homogenization is *specialization*. We can substitute the number 1 for any of the three variables in a homogeneous polynomial and get a two-variable polynomial that is in general nonhomogeneous. For example, if we homogenize and then specialize at , we get back the original. But specializing at or produces a new polynomial.

We say is a singular curve if any complex projective singular point exists. So may be a singular curve even though there are no affine singularities. We do this partly to be consistent with the algebraic geometers, but also because singular curves (even with infinite or complex singularities) do behave differently from regular curves.

Likewise, a curve is *reducible* if its homogenization is reducible over the complex numbers. Because homogenization preserves polynomial multiplication, the homogeneous polynomial is reducible if and only if all its specializations are reducible. It is fairly rare that a bivariate real polynomial has complex factors, but an important class of examples is the homogeneous functions in two variables. These always factor into linear factors, but some factors may be complex. Consider the next example.

This seems to be irreducible, but the plot appears to be a straight line rather than a cubic. Furthermore, it is singular at . Think of this curve as a homogenization of a polynomial of one variable and specialize at .

So gives a complex numerical factorization of ; the two complex factors are invisible on the contour plot.

Related to singular points are intersection points. Here is an example.

We say the intersection of these curves at has *multiplicity* 8. To explain what this means, particularly in the case of numerical curves, we use the formulation given in [4], which has been implemented numerically by Z. Zeng and the author. The implementation in the plane curve case is given in Appendix 1 of [1], the code and examples are in [2] and further information can be found in [5].

Intersections and singularities are connected, in that if and intersect at , then the curve has a singularity at However, there is an important difference. If we perturb a curve with a singularity by adding some terms with very small coefficients, the singularity often goes away. But if we perturb both of the curves intersecting at , then locally we have the same multiplicity. Here is an example.

What this shows is that singularities are numerically unstable, but intersections are numerically stable. Thus in [1], which emphasizes the numerical point of view, we avoid getting deeply into singularities, but we can deal with intersections.

This leads to the most important theorem of complex projective plane algebraic geometry, Bézout’s theorem.

Given complex algebraic curves and of degrees (respectively) and with no common nonconstant factor, there are exactly complex projective points on both curves counting intersection multiplicity.

There are many proofs in the literature, and we will not give a complete proof here or in [1]. The complicating issue is when there are infinite or multiple intersection points. The typical proof involves use of the *resultant*. In the case of possibly infinite but not multiple points, one approach is to apply a random projective transformation. The resulting curves then, with high probability, will have no infinite intersection points and moreover, each intersection point will have a unique coordinate. We can find these by applying the resultant with respect to , which will then give a polynomial of degree with distinct and hence non-multiple zeros. One can easily find the coordinates of the transformed system by substituting each in either equation and solving for . Finally, transforming back will give the solutions of the original system. We will study these transformations and find infinite intersection points by transforming, solving the affine system and transforming back in the next section.

As an example, consider the following Gaussian cubic and quadratic. There is one infinite solution. Applying the random projective linear transformation with matrix

gives a system of equations that leads to polynomials with rational coefficients, no infinite solutions and unique coordinates for the affine solutions.

Pictured are the original system and the transformed system. The indicated point in the second plot corresponds to the infinite solution of the first plot. Even in this simple example with equations and transformation using one-digit integers, the resultant polynomial was a rational polynomial with numerators of 17 digits and a denominator of 21 digits!

Later in [1], Bézout’s theorem is used in the discussion of Cayley’s theorem and Harnack’s theorem. In this section, we use Bézout’s theorem to argue the *singularity theorem*:

An irreducible curve of degree has at most complex projective singular points.

In [1] I take a constructive point of view and show instead that a curve of degree with or more singular points is reducible. In the argument, we produce a polynomial of smaller degree that meets the given curve in too many points, so has a common factor with the given curve. In fact, in Appendix 1 of [1] we implement this argument with a function that factors the defining polynomial of any curve with or more singularities.

Going back to the cubic, we homogenize and then specialize at .

The resulting plot shows the infinite points of in the specialization where the dashed line is the original infinite line. The original infinite points are named , , . The first critical point becomes the infinite point in the - plane, and the other two go to the points , .

So in the projective plane, infinite points look just like affine points. We can trace projective paths just like affine paths. Thus, we can form graphs just like in the affine case; in particular, the projective graphs now have the property that every vertex is even. This gives my *fundamental theorem of real plane projective algebraic curves*, henceforth called just the *fundamental theorem*, which completely describes the topology of the projective curve.

Let be a homogeneous real plane projective algebraic curve . Then there is a finite set of points in , called vertices, and a set of edges between pairs of vertices satisfying:

*Each edge corresponds to a continuous arc (or path) in**connecting the two vertices*.*Every singular point of**is a vertex*.*The interiors of any two arcs corresponding to edges are disjoint; that is, arcs only meet at vertices*.*Every point of**is either a vertex or an interior point of an arc*.*The graph is an Euler graph; that is, every vertex is even*.

In the previous example, the graph can be rendered as follows, where the vertex names refer to the original affine specialization.

Several comments are in order. First, *critical points* are not a concept in the projective plane; they come from some affine specialization. They make good vertices, but in this context are somewhat arbitrary. The same is true of the *direction* of the curve, but these graphs can be given a directed Euler graph structure. The fact that these are Euler graphs implies they can be decomposed into (not necessarily disjoint) directed circuits.

Already in his 1799 proof of the fundamental theorem of algebra, Gauss essentially calculates the infinite points of Gauss curves coming from a monic polynomial of degree as

Since the Gaussian curve already approximately intersects large circles about the origin in the affine points (and their antipodal points) given by the first two coordinates, one can infer that the graph will have edges pointing directly out from boundary points on a large circle to the appropriate infinite point. Thus by treating any two antipodal points of the curve on a large circle about the origin as the same infinite vertex, we convert the bounded graph to the projective graph.

A more interesting example of a Gaussian curve is Gauss’s example, which has two components and a singular point.

We find the critical and boundary points on a circle of radius 4 and put them in an association for labeling.

We show a contour plot and the bounded graph. Then, by treating boundary points as infinite points and identifying pairs of antipodal points (, , , , ), here is the projective graph.

We mention the Riemann–Roch theorems, whose main subject is the concept of *genus*. These theorems are the backbone of complex curve theory and even real space curve theory. However, for real plane curves the important invariant of a curve is the degree, not genus, so we do not dwell on these theorems.

An important tool in [1] is utilizing the projective linear transformations. We follow Abhyankar [6] by keeping the discussion mainly in the affine realm, where it is easier to compute, viewing these as *fractional linear transformations*.

A *fractional linear transformation* is a function defined by

where and are real (or sometimes complex) numbers in the form of integers or machine numbers. Setting the common denominator to zero defines a line, so the domain of is the affine plane minus this line.

The notation suggests describing the fractional linear transformation compactly by the matrix:

This is more compact as well as useful, as the fractional linear transformation is actually given by the two-step procedure using matrix multiplication:

In the Wolfram Language, this becomes the function .

To the extent that we want to work completely in the affine domain, we note that the Wolfram Language also includes fractional linear transformation under the name *linear fractional transformation*. So one can also use the Wolfram Language to evaluate a fractional linear transformation.

Here is an example.

In [1], to keep things simple we assume the matrix is invertible. Matrix multiplication corresponds to composition of transformations; in particular, since our matrices are invertible, so are our fractional linear transformations.

Somewhat unique to [1], we have our transformations work on curves as well as points.

The fractional linear transformation takes the curve (i.e. the bivariate polynomial ) to a curve such that whenever .

Here is an example using as defined before.

The relationship between and is shown by the following example; maps points to points and maps curves to curves. The image of a point under is a point of the image of the curve under .

In this case, the transformation takes the circle to a conic, a parabola. One can use the various transformations given by the Wolfram Language. We provide some additional ones in [2] (such as , which takes line to line and , the reflection about the line ) as Euclidean transformations and as an affine transformation. As an example, we give .

More importantly, we have two fractional linear transformations that act on the projective plane. The transformation takes the infinite point to the origin and the original infinite line to the axis. The transformation specializes the projective plane by removing the line from the affine plane and making it the infinite line; the new axis is the original infinite line.

As an example, we are interested in the behavior of the infinite point of the preceding curve .

The transformation puts the infinite point at the origin of this plot, which shows the infinite line as the axis. It appears that the parabola is actually tangent to the infinite line at the infinite point. To check, we can calculate the tangent line to at the origin to see it is the axis, that is, .

There are various alternate versions of the functions and to handle working projectively. For example, accepts infinite points as input or returns them as output.

The main application for is that we can now find all complex projective singular points or intersection points in one step by picking a random line that, with high probability, will not go through any of the finite number of singular or intersection points. For details, see [1].

In [1], the theory so far is applied to recover known results in lower-dimension geometry. First we consider nonsingular conics in the form

where the are integers or machine numbers. We identify them as to type (hyperbola, parabola, ellipse) and write them in standard form. We parameterize them by rational or trigonometric functions, find their foci and directrix or conversely, construct them from arbitrary foci and another appropriate value such as the semilatus rectum.

We then discuss the numerical theory for nonsingular cubics. Unlike the number theory case, which is one of the most difficult subjects in mathematics, the numerical case is very simple. We give a function to find the numerical inflection points; then, with a choice of inflection point, we have a deterministic black-box function to calculate the Weierstrass normal form and -invariant. The -invariant almost completely classifies numerical cubics relative to fractional linear transformations, that is, relative to the real projective linear group: there are two conjugate classes for each . Under the complex projective linear group, the classification is complete.

We end with Cayley’s theorem: an irreducible curve of degree with double points has a rational parameterization. This means that with parameter , the parameterization components have the form of a polynomial in divided by another polynomial in . The coefficients are not, however, expected to be rational numbers; Cayley only promises algebraic numbers. Thus in practice, this parameterization works best with machine-number coefficients. We illustrate by parameterizing the hypocycloid.

The gap occurs because we only plotted the parameter on a closed interval; in theory it should run from . Details of how the parametric functions were calculated are in [1].

Topologists often think of the real projective plane as a Möbius band where the entire outer boundary is squashed to the affine origin. Alternatively, the Möbius band can be viewed as the real projective plane with a tiny disk about the affine origin removed, the boundary of that disk being the boundary of the Möbius band. In either case, the center line of the band is the *infinite line*.

It is common to construct a Möbius band out of a strip of paper. Here is a slightly different but useful way, but shown in Figure 1 by a physical deconstruction: cut from a boundary point to the center (infinite) line, then cut around the center line.

**Figure 1. **Constructing a Möbius band.

This gives a long skinny strip that we can identify with the real projective plane shown in Figure 2. The vertical yellow lines are the negative and positive axes, and the standard quadrants of the affine plane are numbered in Roman numerals.

**Figure 2. **The real projective plane.

We implement the mappings from the projective plane to this strip, called the *rectangular hemisphere* in [1] for reasons given there, and from the strip to the Möbius band by the following functions.

A simple example is the hyperbola ; we give the construction. The infinite points are and . Unfortunately, even this simple example takes up a great deal of space, so we will just get started. We consider the part of this hyperbola in the second quadrant of the affine plane and plot it on the Möbius band. We start at the infinite point and trace to the infinite point , which is an ambiguous point.

The affine part is well known, so we inspect the obvious infinite points .

A technicality is that uses the line function, which could randomly differ by a multiple of . We need a specific choice, so we set the value of .

Again, the axis represents the infinite line for , and the origin the infinite point.

To connect this plot to the affine curve, find the points where intersects the circle of radius 1.

Now map these back to the affine plane to see that it is the first point in that is related to a point in the second quadrant.

So the part of the hyperbola in the second quadrant can be traced using two parts: the part from the intercept to the point and the image of the part from to the infinite point represented by .

We see no error messages, so we assume the tracing went correctly. Now apply .

Here we get some warning messages. We have to set the ambiguous points correctly and can then draw Figure 3.

So we have drawn this section of the hyperbola on the rectangular hemisphere.

**Figure 3.** The part of the hyperbola in the second quadrant on the rectangular hemisphere.

The reader may wish to attempt the other sections of the hyperbola; the one in the third quadrant is similar to , and the parts in the first and fourth quadrants can be done together since the intercept does not give an ambiguity (Figure 4).

**Figure 4.** The full plot of the hyperbola on the hemisphere looks like this.

Finally, we lift to the actual Möbius band using and (Figure 5).

**Figure 5.** The hyperbola on a Möbius band.

This last example was simple! From now on we just show the final output (Figure 6).

**Figure 6.** Here are two Möbius plots of lines, the first a line through the origin, and then a typical line.

Next, Figure 7 shows two affinely parallel lines meeting at an infinite point and three circles; the black one contains the origin in its interior.

**Figure 7.** Affinely parallel lines and three circles.

In Figure 8, we plot the rational function . This has two infinite points: a singular one at and a regular one at .

**Figure 8.** Plot of a rational function.

Experimenting with these plots we see, as the fundamental theorem tells us, that these curves are comprised of loops, that is, simple closed curves. Draw these yourself using the pattern in our hyperbola example that stops at the rectangle, print it out (preferably in landscape orientation with smaller aspect ratio), then cut it out, twist and tape together the two copies of the infinite line to make the Möbius band. Now cut out a loop. Two things can happen: either you get two pieces, one of topologically a disk and the other not, or only one piece, as in the classic example of cutting a Möbius band along the center line.

In the first case, we call the loop an *oval*, and the complementary piece shaped like a disk is called the *interior*. In the other case, we call this a *pseudo-line*. Notice both kinds of lines have this property. One easy way to tell, without going through the trouble of constructing a physical Möbius band, is that an oval meets the infinite line (or any other line for that matter, since up to fractional linear transformations all lines are the same) in an even number of points (possibly zero). A pseudo-line meets the infinite line in an odd number of points. Again, since in the projective plane all lines are equivalent, two pseudo-lines always meet in an odd number of points, in particular, at least one.

From Bézout’s theorem, a curve of even degree meets any line in an even number of points. A consequence is that* a nonsingular curve can contain at most one pseudo-line. Further, if the degree is even, each loop of the curve must be an oval. On the other hand, each nonsingular curve of odd degree must contain exactly one pseudo-line and possibly some ovals*.

This last paragraph is in italics because it essentially tells us the topological structure of nonsingular plane curves.

We can now find the specific topological (and even some geometrical) structure of any particular real plane curve, at least up to degree six. We concentrate on the Newton’s hyperbola family of curves introduced in Section 2. These are not well conditioned, so they present interesting problems. It may be necessary to go to arbitrary-precision numbers to get further with these, although for well-conditioned curves I have used the methods of [1] for curves up to degree nine.

We first consider Harnack’s theorem [7] and related problems from Hilbert’s problem, Part 1 [8]. Harnack’s first theorem states that a nonsingular curve of degree can have at most topological components in . A rigorous proof requires advanced concepts in topology, but a heuristic proof is easy from Bézout’s theorem, especially given the ideas of ovals and pseudo-lines in the last section.

As mentioned in the last section, an oval is a loop that cuts the Möbius band in two parts, one topologically a disk. That part is known as the *interior* of the oval. It is possible that the interior of an oval contains another oval. Consider the following example with fractional linear transformation given by matrix that cuts the axis out of the affine plane.

We say the smaller oval has *depth* 2. If there were another oval inside that oval, it would have depth 3, and so on. It is easy to prove that the maximal depth of an oval in an irreducible curve of degree is ; simply consider a line through a point in the interior of the deepest oval and apply Bézout’s theorem. The next example generalizes this.

Continuing this way, we can in principle construct an oval of depth using a curve of degree for even and a curve of depth for odd.

An -curve is a nonsingular curve with the maximum number of components. To best show the possible arrangements of the components of an -curve, we use *diamond diagrams*. We have two main types, first the *Descartes–Viro diagrams* (or more simply the *Viro diagrams*), which depend on the signs of coefficients of the equation of the curve [9]. These diagrams turn out to be in 1-1 correspondence with the Newton hyperbolas. We also use *Gauss diagrams*, which show the complementary positive and negative value sets of .

The code for drawing diamond diagrams is very long and explained in [1] and [2]; in this article we do not give code, only graphics.

For the Viro diagram in the first quadrant including the positive axes, the color of the dot at the point is green if the coefficient of is positive and red if it is negative. We do not allow equations with any term 0 for a Viro diagram. The curve then separates the red and green lattice points.

As an example, consider Newton hyperbola 413 (Figure 8).

**Figure 9.** The Viro diagram, region plot, graph and diamond diagram for the function `nh413`.

In this case, the Viro diagram and Gauss diagram (not shown) are the same, other than the color of the lattice points; orange indicates where and brown where . A graph is given using only the infinite points, which are labeled , , , . The outer boundary of the diamond represents the infinite line. The diamond diagram indicates that: (1) on the positive axis the curve crosses three times; (2) it does not cross the negative axis; and (3) it crosses the positive axis once and the negative axis twice. The Viro diagram gives the maximal number of crossings according to Descartes’s theorem on each positive and negative , and axis, viewing as a single-variable polynomial restricted to these lines. In the projective plane, the axis is the line of infinite points where infinite points in the first/third quadrant are positive and those in the second/fourth quadrant are considered negative in this context.

In this example, the crossing points are given as follows.

Let , , be the infinite points.

The Newton hyperbola 613 is more complicated (Figure 10).

**Figure 10.** The Viro diagram for the function `nh613`.

We see there are three ambiguous cells, that is, four lattice points with , one color and , the other. There are two different possible ways to connect regions given by dashed curves in the colors aqua and magenta. Without further investigation, there is no a priori way to determine the correct choice; a slight perturbation of the curve can affect this. A suggests an answer.

Checking infinite points and critical points confirms that there is nothing unexpected going on outside the region plot, so we get the Gauss diagram and graph (Figure 11).

**Figure 11.** Gauss diagram and graph.

In this case, a tiny perturbation changes the geometry and the Gauss diagram (Figure 12).

**Figure 12.** The region plot and Gauss diagram for the perturbed `nh613` are different from `nh613`.

Originally, the negative complement was connected and the positive complement had three components; after the perturbation, it is the positive complement that is connected. Luckily, we did not need to change any values of the lattice points when changing from the Viro diagram to the Gauss diagram. In general, the user will need to do that. See [1] or some later examples.

Now that we have explained our diagrams, we can show some -curves. Hilbert gave a series of -curves for each degree that are given by Viro diagrams and hence exist by the work of Viro. We simply give the diagrams here (Figure 13). For more information see [1].

**Figure 13.** Viro diagrams of -curves of degree .

Hilbert suggested other possibilities with more nesting in degree six.

Many more details on these diagrams and Hilbert’s problem [8] are given in [1].

We now have all of our tools. In [1] we illustrate more complicated examples, two of them the curve and the Newton hyperbola 336941. Both of these curves have interesting behavior at or near the infinite line, so a contour plot, even with large scale, cannot show everything.

At present, we have shown how to analyze and plot curves of degree up to six in various ways. For well-conditioned curves, these machine-number methods often work with higher degree; the author has had success with curves of degree eight and nine. To adequately deal with Newton hyperbolas of degree greater than six, one would perhaps like to rewrite some of the code to use arbitrary precision.

Our forthcoming book is a first attempt to apply numerical methods to a formerly abstract subject. There is a lot more that can be done in this area. We hope the book will be a starting point.

I want to thank the people at Wolfram Research for their help on the book project, especially Jeremy Sykes, Daniel Lichtblau and, for this article, George Beck.

[1] | B. H. Dayton, A Numerical Approach to Real Algebraic Curves with the Wolfram Language, Champaign, IL: Wolfram Media, 2018.www.wolfram-media.com/products/dayton-algebraic-curves.html. |

[2] | Global Functions. (Jul 18, 2018) barryhdayton.space/curvebook/GlobalFunctionsTMJ.nb. |

[3] | E. W. Weisstein. “Bifurcation” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/Bifurcation.html. |

[4] | F. S. Macaulay, The Algebraic Theory of Modular Systems, Cambridge: Cambridge University Press, 1916. |

[5] | B. H. Dayton, T.Y. Li, Z. Zeng, “Multiple Zeros of Nonlinear Systems,” Mathematics of Computation, 80(276), 2011 pp. 2143–2168.www.ams.org/journals/mcom/2011-80-276/S0025-5718-2011-02462-2/S0025-5718-2011-02462-2.pdf. |

[6] | S. S. Abhyankar, Algebraic Geometry for Scientists and Engineers, Providence, RI:AMS, 1990. |

[7] | E. W. Weisstein. “Harnack’s Theorems” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/HarnacksTheorems.html. |

[8] | E. W. Weisstein. “Hilbert’s Problems” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/HilbertsProblems.html. |

[9] | E. W. Weisstein. “Descartes’ Sign Rule” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/DescartesSignRule.html. |

B. H. Dayton, “A Wolfram Language Approach to Real Numerical Algebraic Plane Curves,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-7. |

Barry H. Dayton is Professor Emeritus at Northeastern Illinois University, where he taught for 33 years. His Ph.D. was in the field of algebraic topology, but he has done research in a variety of fields, including algebraic geometry and numerical algebraic geometry.

**Barry H. Dayton**

*Department of Mathematics
Northeastern Illinois University
Chicago, IL 60625-4699*

dx.doi.org/doi:10.3888/tmj.20-6

An important problem in graph theory is to find the number of complete subgraphs of a given size in a graph. If the graph is very large, it is usually only possible to obtain upper bounds for these numbers based on the numbers of complete subgraphs of smaller sizes. The Kruskal–Katona bounds are often used for these calculations. We investigate these bounds in specific cases and study how they might be improved.

Graph theory has many interesting problems that lend themselves to computer investigation. Mathematica has many graph theory functions that can enable these investigations. We shall introduce the reader to Mathematica’s graph theory capability while investigating a problem in extremal graph theory. Extremal graph theory tries to find graphs satisfying certain extreme properties—for example, having the most triangles for a fixed number of edges.

We first introduce some basic graph theory concepts and show how to represent them in Mathematica. Formerly, most graph theory functions were contained in the Combinatorica package; however, this graph theory functionality has, for the most part, been absorbed by the main program, making it unnecessary to load this package.

A finite graph consists of two finite sets, and . The elements of the set are called *vertices* and the elements of the set consist of (unordered) pairs of vertices called *edges*. We often write . A graph is called a *subgraph* of graph if and ; that is, if each vertex in the subgraph is also a vertex in the graph and each edge of the subgraph is also an edge of the graph .

Graphs are often depicted as points (the vertices) and line segments (the edges) that join pairs of vertices in . Thus, to draw the graph consisting of the five labeled vertices and with edge set being all pairs of vertices, we enter the following command.

This graph is known as the *complete graph on five vertices* and denoted by .

In general, denotes the complete subgraph on vertices, that is, the graph with vertex set and edge set consisting of all pairs of elements of .

The *complement* of the graph is the graph having the same set of vertices and whose edges are exactly those pairs of vertices of that *do not* belong to . Thus, the graph complement of the complete graph has no edges at all.

Mathematica can also add vertices to an already existing graph. This adds two vertices labeled and to the graph .

This is how to add edges to graph . (The symbol “” can be entered from the keyboard using the Esc key; press Esc, type u and e, then press Esc again.)

Vertices and edges can be deleted with the commands and .

Another useful operation, , contracts a set of vertices into one vertex. For example, this contracts the graph by contracting the vertices labeled and into a single vertex.

An important problem in graph theory is to find the number of complete subgraphs (or cliques) of a graph. For example, find the number of cliques of a certain size in a large social network graph. Here, the people are the vertices, an edge joins two people who know each other and a clique consists of people who all know each other; that is, they form a complete subgraph.

There is a dual version of this problem, equally important, that asks for the number of independent sets of a graph. A set of vertices is *independent* in the graph if there are *no* edges of connecting the vertices in the set. (Clearly, a set of vertices of is independent if and only if they form a complete graph in the complement of .)

We consider next how to compute the number of complete subgraphs exactly for small graphs and how one might obtain useful upper bounds for larger graphs.

Given , it is easy to determine how many complete subgraphs there are having, say, four vertices; the answer is the number of ways to choose four of the seven vertices, since for each such choice, all edges between the chosen vertices are also present in the original graph. The number of ways to choose four out of seven objects is just the binomial coefficient .

However, if the graph is not a complete graph, it is not so easy to determine how many complete subgraphs of a certain size it contains. We turn to this problem next.

Consider the graph previously defined.

Suppose we want to know how many subgraphs contains. We start with the list of its vertices.

Next, we form the set of all subsets of size four of this set.

Here is an example of a subset of vertices that generates a complete graph in .

We now wish to repeat this calculation for all the subsets of size four.

Finally, we count the number of times occurs in .

Let be the number of complete subgraphs with vertices contained in the graph . The following program implements .

Let us count the number of times , and occur in the graph ; that is, we calculate , and .

Suppose we know that a certain graph satisfies . Can we determine the maximum of ? Surely, can have no subgraphs; that is, , as in this example.

However, the example has been shown to have 19 subgraphs and 10 subgraphs.

In fact, we have shown in a joint paper [1] that for all graphs with , the maximum value of is 10.

The following definition, due to Bollobás [2], is useful in what follows.

**Definition**

If , then is the maximum number of subgraphs that a graph can have if the number of its subgraphs is less than or equal to .

Thus, using this notation, we have shown in [1] that .

Suppose now that the number of triangles a graph can have is fixed and we want to determine the graph with the fewest edges that can have that many triangles.

For example, suppose we want a graph having 23 triangles with the fewest possible edges. Our intuition is to look for graphs that are “tightly packed,” that is, as close to complete graphs as possible. has 20 triangles and has 35. So let us start by adding a vertex to and then add three edges from to three of the vertices of . This adds new triangles.

That shows that a graph with 23 triangles can be gotten with just 18 edges. Is there a graph with fewer edges that has 23 subgraphs?

The next theorem is due to Erdös and Hanani (see [3]).

**Theorem**

If the number of edges is , then since , the theorem says that ; that is, the maximum number of subgraphs in a graph with 18 edges is 23.

Is there a graph with fewer edges and 23 triangles? To answer this question, we can use the Erdös–Hanani theorem to compute the various maximum numbers of triangles with fewer edges. The program gives the maximum number of triangles for a given number of edges, as determined by the Erdös–Hanani theorem; the table computes these values for edge numbers between three and 18.

We see from the table that there is no graph with fewer than 18 edges having 23 triangles. Hence the fewest edges needed to produce a graph with exactly 23 triangles is indeed 18.

If we specify the number of triangles rather than the number of edges a graph can have, computing the maximum numbers of larger complete graphs is not as simple as in the previous section. Exact maximum numbers are not known in most cases, only upper bounds.

A well-known theorem of extremal graph theory (proved independently by Kruskal [4] and Katona [5]) can provide an upper bound for , given , . In fact, the Kruskal–Katona result is for more general objects than graphs, but we will only be using it for graphs and only when ; that is, we specify how many triangles a graph can have and want to bound for some .

**Theorem**

Suppose a graph has triangles, where , where each of , , is chosen in order and to be as large as possible at the time of choosing. Then for , .

For example, if the number of triangular subgraphs of is , then the Kruskal–Katona upper bounds for and are and .

To use this theorem in Mathematica, we first need to express as the binomial sum, . Given , the function finds the numbers , , .

Next, we define the functions and , the Kruskal–Katona upper bounds for and , given that .

Here are the results for 19 triangles.

Thus, and .

Here again is the graph with 19 triangles. It has 10 subgraphs and two subgraphs.

As mentioned, we proved in [1] that and .

Finding exactly is often very difficult and results are not known in most cases. However, if there is a graph with triangles, with (the Kruskal–Katona bound), then . Complete graphs are obvious examples. Also, as remarked in [2], if the number of triangles , we define the graph by adding to a single vertex and edges joining to vertices of ; then (the Kruskal–Katona bound). For example, suppose . We construct such a graph, .

Thus, and .

Next, we find those numbers of triangles for which this construction works; that is, the third entry in their binomial representation is 0.

Also, it is not difficult to see that if , the Kruskal–Katona bound is the same as for (since for ). So the list of the numbers of triangles for which is known can be expanded. Here are the known values for the first 100 positive integers.

This leaves the following numbers of triangles up to 100 still unknown.

We have in fact settled (see [1]) the cases (where , respectively). We add these cases to the list of the known values.

And these are the unknown values of for .

We had started listing the integer sequence (see [6]), and the preceding results can be used to add to this sequence; for example, the first term of the sequence, (when there are four triangles allowed, the graph has at most one ); the sequence continued up to . Since we now know the consecutive values up to , we can add four more consecutive values: , , , . Our conjecture in the next section implies that . Other selected values of the sequence can obviously also be obtained, given that we know so many of the first 100 cases.

We have used complete graphs in building the maximal examples. However, even if we remove an edge from a complete graph, the number of subgraphs in this new graph `gr1` is the same as the Kruskal–Katona bound based on the number of subgraphs (calculated by ), as the following computation shows. (This can also be established with a simple argument that we omit.)

In fact, even if we remove several edges from a complete graph, as long as they share a common vertex, the number of subgraphs in the resulting graph and the Kruskal–Katona bound (based on the number of subgraphs) remain the same, as the reader can easily check with Mathematica. However, if we remove two edges that do not share a common vertex, this is no longer the case, as the following computations show.

This time, the number of subgraphs is one less than the upper bound! These graphs are also Turan graphs. The Turan graph is the graph formed by partitioning a set of vertices into subsets with sizes as equal as possible (differing by at most 1) and connecting two vertices by an edge if and only if they belong to different sets of the partition. The built-in Mathematica function draws this graph. For example, partitions the vertices into the subsets , , ; the edge is omitted since it would connect vertices in the same subset, ; is omitted since it would connect vertices in the same subset .

The graphs are especially interesting to us as they are the same as the graphs , defined before. It may not be immediately apparent that is the same graph as ; however, if we chose the right for the Turan graph, it becomes rather obvious.

In addition, can always be used to check.

Therefore, for the Turan graph , the actual number of subgraphs is just one less than the Kruskal–Katona bounds! Do these graphs provide the true maximum values of subgraphs based upon the number of their triangles? We conjecture below that they do for . In addition, it is not hard to show that the number of triangles in is . (This follows because the vertex sets of are , , , , …, ; then consider all possible ways of choosing three elements from these sets without choosing two elements from the same set.) This expression can be simplified.

The sequence of these triangle numbers for the graphs is the same (with an offset) as the sequence A000297 in [7]. (We recently added a comment to this entry that mentions this.)

**Conjecture**

If , the Turan graph has the maximum number of subgraphs for any graph with triangles.

Mathematica has a large database of graphs accessible with the function .

We wish to find maximal examples, that is, graphs that have the greatest number of subgraphs for their number of triangles. We make this precise with the following definition. A graph is a -maximal graph with respect to subgraphs if it has the greatest number of subgraphs for any graph with the same number of subgraphs (triangles) as . We believe that these -maximal graphs are “tightly packed” and thus have a relatively small number of vertices given their number of triangles. Suppose first we wish to find all the -maximal graphs with exactly nine triangles. The Kruskal–Katona bound is three.

However, the maximal number of subgraphs is really two, that is, (see [1]). We first look for examples in the set of nonisomorphic graphs with six vertices, that is, subgraphs of .

Here is the first such graph.

We apply to each entry to get the graphs themselves but suppress the large output.

Within those, we next search for graphs with nine triangular subgraphs. There is only one.

We attach labels to the vertices of this graph for later use.

This graph has two subgraphs.

Hence is a -maximal example (see [1]).

This graph embedding is also extremal in another way; see [8].

We search for examples in the set of graphs with seven vertices, using the command , which lists all non-isomorphic graphs with seven vertices. (If , only lists some of the graphs with vertices). has 1044 entries, so the result is not immediate.

There are 35 examples of graphs with seven vertices and nine triangular subgraphs.

To see the individual graphs, we use Mathematica’s built-in function , where, in addition to the graph, the number of subgraphs of the graph is listed.

We suspect that one of the -maximal graphs in is really a simple modification of the single -maximal graph with six vertices found in ; that is, the graph plus an edge. (We obviously do not want to add a triangle!) Stepping through the , graph number 22 is easily seen to be that graph.

We then add the edge to the graph found in and finally, use to verify that they are indeed isomorphic.

Next, suppose we want to find the -maximal graphs with 19 triangles. Although has triangles, if we remove even one edge from , the resulting graph has only 16 triangles! Hence, there are no subgraphs of with exactly 19 triangles. However has triangles; thus it is reasonable to search for examples.

Sometimes there are hidden edges in graph drawings in Mathematica; the graph in the middle is a case in point (the edge from the center top to center bottom vertices cannot be seen). We therefore redraw the graphs, setting the option .

We now ask for the number of subgraphs and the number of subgraphs in these graphs.

Also, all three graphs have the same number of edges, 17.

Thus, we have found three examples of graphs that illustrate our result of [1], that .

We now ask what is the maximal number of subgraphs in a graph with 25 vertices. We again search for graphs with seven vertices and 25 triangles.

The one example we have found has 16 subgraphs, and the Kruskal–Katona bound for subgraphs is 17.

We investigate this graph further.

This is the Turan graph ; edges and are missing from . This can also be seen with the function.

Since this is one of the Turan graphs and we have conjectured that they are -maximal, we believe we have found a -maximal example with 25 triangles.

Another example: Suppose we have a graph with 29 triangles. The Kruskal–Katona bound is 22.

A search in yields no graphs with 29 triangles. If we search in , we find one graph with 29 triangles. It has 16 subgraphs.

So it looks like the most subgraphs we can find for a graph with 29 triangles is 16. We can do better! For 26 triangles, since , the Kruskal–Katona bound is . Thus, if we add vertices and to and connect them to four and three vertices of , respectively, we get a graph with 29 triangles and 20 subgraphs.

The graph has eight vertices, but it did not show up in , which has only 289 out of 12346 possible graphs. contains all 1044 graphs with seven vertices.

We have looked at examples with relatively few vertices, and these examples might give the impression that the Kruskal–Katona bounds do not differ significantly from the real values—so why expend so much effort trying to improve them? We construct a larger example to show that the difference can be quite large.

Here is another example.

The Kruskal–Katona bound, however, usually yields more than twice as many subgraphs! More extreme examples can easily be constructed.

There have been some efforts to improve the Kruskal–Katona bounds in the case of graphs (see, for example [1] and [9]); however these have had very limited success. We feel that not enough insight into this problem has been gained and that perhaps by using computer experiments, conjectures can formulated and then proved to advance our knowledge in this area. For example, if we knew that a maximal example with 19 triangles must occur in a graph with seven vertices, our search of would be sufficient to prove that the maximum number of subgraphs a graph with 19 subgraphs can have is 10. We succeeded in proving this result in [1] without using computers, but only with a great deal of effort.

To read more on using Mathematica’s graph theory capability to investigate other maximal problems in graph theory, see [10] and [11].

I wish to thank the editor, whose sage advice substantially improved this paper’s programing and content.

[1] | R. Cowen and W. Emerson, “On Finding ,” Graph Theory Notes, New York Academy of Sciences, 34, 1998 pp. 26–30.www.researchgate.net/publication/287991696_On_Finding_k4k3_x. |

[2] | B. Bollobás, “Relations between Sets of Complete Subgraphs,” Proceedings of the Fifth British Combinatorial Conference, Aberdeen, 1975 pp. 79–84. www.researchgate.net/publication/268543809_Relations_between_sets_of_complete_subgraphs. |

[3] | P. Erdös, “On the Number of Complete Subgraphs Contained in Certain Graphs,” Publications of the Mathematics Institute of the Hungarian Academy of Sciences, 7, 1962 pp. 459–464. |

[4] | J. Kruskal, “The Number of Simplices in a Complex,” Mathematical Optimization Techniques(R. Bellman, ed.), Berkeley: University of California Press, 1963 pp. 251–278. |

[5] | G. Katona, “A Theorem of Finite Sets,” The Theory of Graphs (P. Erdös, ed.), Budapest: Akadémia Kiadó, 1968 pp. 187–207. |

[6] | N. J. A. Sloane. The Online Encyclopedia of Integer Sequences. oeis.org/A020917. |

[7] | N. J. A. Sloane. The Online Encyclopedia of Integer Sequences. oeis.org/A000297. |

[8] | E. W. Weisstein. “Graham’s Biggest Little Hexagon” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/GrahamsBiggestLittleHexagon.html. |

[9] | A. Frohmader, “A Kruskal–Katona Type Theorem for Graphs.” arxiv.org/abs/0710.3960. |

[10] | E. W. Weisstein. “Cage Graph” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/CageGraph.html. |

[11] | E. W. Weisstein. “Degree-Diameter Problem” from MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/Degree-DiameterProblem.html. |

R. Cowen, “Improving the Kruskal–Katona Bounds for Complete Subgraphs of a Graph,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-6. |

Robert Cowen is Professor Emeritus in Mathematics, Queens College, CUNY. He does research in logic, combinatorics and set theory. He taught a course in Mathematica programming for many years, emphasizing discovery of mathematics, and is currently working on a text on learning Mathematica through discovery with John Kennedy. His website is sites.google.com/site/robertcowen.

**Robert Cowen**

*16422 75th Avenue
Fresh Meadows, NY 11366*

dx.doi.org/doi:10.3888/tmj.20-5

This article explores the numerical mathematics and visualization capabilities of Mathematica in the framework of quaternion algebra. In this context, we discuss computational aspects of the recently introduced Newton and Weierstrass methods for finding the roots of a quaternionic polynomial.

Since Niven proved in his pioneering work [1] that every nonconstant polynomial of the form

(1) |

has at least one zero in , thereby extending the fundamental theorem of algebra to quaternionic polynomials, the use of such polynomials has been considered by different authors and in different contexts. Quaternionic polynomials ([2]) have found a wealth of applications in a number of different areas and have motivated the design of efficient methods for numerically approximating their zeros (see e.g. [3–8]).

This article discusses two numerical methods to approximate the zeros (or roots) of polynomials of the form (1). They can be seen as the quaternionic versions of the well-known Newton and Weierstrass iterative root-finding methods and they both rely on quaternion arithmetic. Here we explain in detail how we have used Mathematica to produce the numerical results recently presented in [9–11].

All the computations in this article require the package , available for download at w3.math.uminho.pt/QuaternionAnalysis (see [12] and [13]).

We introduce the basic definitions and results needed; we refer to Part 1 of this article [2] for recalling the main aspects of the quaternion algebra and to [14] for details on quaternionic calculus.

The real vector space can be identified with by means of

where , and are Hamilton’s imaginary units. Thus, throughout the article, we do not distinguish an element in from the corresponding quaternion in , unless we need to stress the context.

Using the simplified notation for the vector part of , any arbitrary nonreal quaternion can be written as

(2) |

where is the norm of and is the quaternion

(3) |

also referred to as the sign of . In addition, since and , one can say that behaves like the complex imaginary unit, and for this reason we call (2) the complex-like form of the quaternion .

In what follows, we consider domains and functions that can be written in the form

(4) |

where , and and are real-valued functions. Continuity and differentiability are defined coordinate-wise.

We define on the set the so-called radial operators

where and .

We introduce the following concept.

**Definition 1**

Let be a function of the form (4), and , with . Such a function is called radially holomorphic (or radially regular) in if

**Theorem 1**

A function of the form (4) is radially holomorphic iff . In that case, we have .

It follows at once that any quaternionic polynomial of the form (1) but with is radially holomorphic and its radial derivative is

(5) |

For holomorphic complex functions of one complex variable, the well-known Newton method for finding a zero consists of approximating by means of the iterative process

(6) |

with sufficiently close to and . Identifying a real quaternion with a vector in , the problem of solving any quaternionic equation can always be transformed into the problem of solving a system of four nonlinear equations, whose solutions, in turn, can be obtained by using the multivariate version of (6):

(7) |

with sufficiently close to and a nonsingular Jacobian matrix . Not surprisingly, recent experiments performed by some of the authors of this article ([9], [10]) have shown the substantial gain in computational effort that can be achieved when using a direct quaternionic approach to this problem.

Newton methods in the quaternion context were formally adapted for the first time by Janovská and Opfer in [7], where the authors solved equations of the form . Later, Kalantari in [15], using algebraic-combinatorial arguments, proposed a Newton method for finding roots of special quaternionic polynomials. In [9], the equivalence between the classical multivariate Newton method (7) and quaternionic versions of Newton methods for a class of functions was established.

Due to the noncommutativity of multiplication for quaternions, the quotient of two quaternions and may be interpreted in two different ways: either as (the right quotient) or (the left quotient). This leads naturally to considering two versions of Newton iteration in the quaternionic setting:

(8) |

(9) |

The derivative in equations (8) and (9) has been considered in [9] and [10] as the radial derivative of a radially holomorphic function. In fact, in Corollary 2 of [9] it was proved that for such functions, equations (7), (8) and (9) produce, for each , the same sequence, provided that is nonsingular. Here is a more general result.

**Theorem 2 (**[9], Theorem 4)

Let be a function defined on the set such that the , , are radially holomorphic functions in and the are quaternions not all zero. If is a root of such that is nonsingular and is Lipschitz continuous on a neighborhood of , then for all sufficiently close to such that commutes with all , the Newton processes

(10) |

(11) |

both produce the same sequence as (7), which converges quadratically to .

Each step of the iterative schemes (10) and (11) is implemented in the function , which has as arguments the quaternion and the indication of the version: for (10) or for (11). At each step, a test of the value of is also performed. We recall again that all the functions presented here require the package .

The -Newton methods consist of the successive application of the iterative schemes (9) or (10) through the function , using a stopping criteria based on the incremental size and on the maximum number of iterations .

Example 1

Consider the radially holomorphic polynomial , whose only roots in are the real isolated roots , and . For the concepts of isolated and spherical roots, we refer the reader to [2], Definition 4.

The use of the initial guess requires nine iterations to get an approximation to the root 0 with precision The fact that both methods produce the same sequence is also confirmed.

The use of the initial guesses and requires 14 iterations to get an approximation to the roots and , respectively.

Example 2

The polynomial has a real root 0 and the sphere of zeros . Since the polynomial is radially holomorphic, both methods produce the same sequence. Here we would like to call attention to the convergence to the spherical root.

As pointed out in Example 3 of [10], the behavior of the Newton methods in case of convergence to values generating a spherical root is clear: if is the initial guess, then the Newton sequence converges to the root such that . This phenomenon can be easily seen from the preceding results or by computing the sign (3) of the vector part of the iterations.

Example 3

Now consider the polynomial with the three isolated roots , and (cf. [9], Example 3). This polynomial is not radially holomorphic, which means that we cannot anticipate the behavior of Newton methods unless we choose initial guesses such that Theorem 2 applies, that is, such that commutes with . In other words, must be of the form .

What happens if the assumptions of Theorem 2 are not valid? In fact, as we next illustrate, although the left and right Newton methods do not give the same sequence, we can observe convergence in both cases.

With the choice , the right version of the Newton method converges to the root , while the left version converges to .

It is interesting that the 4D Newton method (7) gives convergence to the other root , as observed in [9].

Following [9] and [10], consider a function that gives the number of iterations required for each process to converge, within a certain precision, to one of the solutions of the problem under consideration, using as the initial guess.

We now consider different initial guesses by choosing points in special regions and we show density plots of . The white regions that may appear correspond to a choice of for which the method under consideration does not reach the level of precision with iterations. The default choices of and usually lead to realistic plots that require some minutes to be produced. A smoother density can be obtained by increasing the option .

Example 4

We consider again the polynomial of Example 3, whose roots are the isolated roots , and . The following code produces the plots corresponding to the choice of in one of the following regions:

As was already pointed out, Theorem 2 can be applied only in ; this is why both methods produce the same plots in this case.

Here is the behavior of the -Newton methods in .

Here is the behavior of the -Newton methods in .

The plots produced by give information on the number of iterations required by each of the quaternionic Newton methods to converge within a certain precision to any of the roots of the polynomial under consideration. However, those plots do not give any information about the root and how the convergence occurs. This issue can be easily overcome by plotting the basins of attraction of the roots with respect to the iterative function. More precisely, we introduce a new input parameter in the function with the information of the root for which we want to compute the basin of attraction. A new function takes into account the existence of spheres of zeros. The functions and give the number of iterations needed to observe convergence to an isolated root or a spherical one, respectively. These functions return when the corresponding convergence test fails.

The functions that plot the basin of attraction of an isolated root or a spherical root have an input parameter associated to that root. The color coding used is the following: if the initial guess , chosen in a domain , causes the process to converge to a certain isolated root to which the color was associated, then the point is plotted with the color . For a sphere of zeros , all the points that converge to a point in have the color assigned to . Dark shades of a color mean fast convergence, while lighter-colored points lead to slower convergence. As before, white regions mean that the method does not converge.

Example 5

We consider once more the polynomial of Example 4, now from the perspective of the basins of attraction of each of the roots , and . We associate with these roots the colors red, blue and green, respectively, and consider the domains , and , described in Example 4. The corresponding plots can be obtained as follows (it can take some time to produce the figures).

Here are the basins of attraction in (left).

Here are the basins of attraction in (left and right).

Here are the basins of attraction in (left and right).

Example 6

This example concerns the polynomial studied in Example 2, which has an isolated root 0 (red) and a sphere of zeros (blue). The corresponding plots can be obtained as follows.

Here are the basins of attraction in (left).

Here are the basins of attraction in (left); as expected, the behavior is similar to that in , since .

Here are the basins of attraction in (left).

The Weierstrass method is one of the most popular iterative methods for obtaining simultaneously approximations to all the roots of a polynomial with complex coefficients. The method was first proposed by Weierstrass [16] in 1891 and later rediscovered and derived in different ways by Durand [17] in 1960, Dočev [18] in 1962 and Kerner [19] and Prešić [20] in 1966.

Let be a complex monic polynomial of degree with roots and let be distinct numbers. The classical Weierstrass method for approximating the roots is defined by the iterative scheme:

(12) |

If the roots are distinct and are sufficiently good initial approximations to these roots, then the method converges at a quadratic rate, as was first proved by Dočev [18]. The iteration procedure (12) computes one approximation at a time based on the already computed approximations. For this reason, it is usually referred to as the *total-step* or *parallel* mode. The convergence of the method can be accelerated by using a variant—the so-called *single-step, serial* or *sequential mode—*that makes use of the most recent updated approximations to the roots as soon as they are available:

(13) |

In a recent article [11], we adapted the Weierstrass method to the quaternion algebra setting. We refer to [2] and references therein to recall the main concepts and properties of the ring of unilateral quaternionic polynomials. In particular, we recall the factorization of polynomials in into linear terms and the relation between zeros and factors of .

**Theorem 3—Factorization into linear terms**

Any monic polynomial of degree in admits a factorization into linear factors; that is, there exist such that

(14) |

**Theorem 4—Zeros from factors**

Consider a polynomial whose factor terms are ; that is, admits a factorization of the form (14). If the similarity classes , , are distinct, then has exactly zeros , which are given by:

(15) |

(16) |

Following the idea of the Weierstrass method in its sequential version (13), the next results show how to obtain sequences converging, at a quadratic rate, to the factor terms in (14) of a given polynomial . Moreover, by making use of Theorem 4, it is possible to construct sequences converging quadratically to the roots of .

**Theorem 5 **([11])

(17) |

(18) |

(19) |

(20) |

with denoting the characteristic polynomial of , that is, . If the initial approximations are sufficiently close to the factor terms in a factorization of in the form (14), then the sequences converge quadratically to . Moreover, the sequences defined by

(21) |

The functions , and are implemented as the functions , and , respectively. The support file associated with [2] needs to be loaded.

The iterative functions associated with (17) and (21) are built into the function.

The quaternionic Weierstrass iterative method is implemented in the function .

The usual convergence test has been replaced in by

in order to let the function recognize a sphere of zeros. Since we also include a test on the value of , there is no risk of misidentifying an isolated root.

Example 7

We consider now the application of the Weierstrass method to the computation of the roots of the polynomial of Example 3, which we recall are , and . All of the initial approximations , and have to lie in distinct congruence classes.

Some explanation of the output is needed. The first entry indicates the convergence or divergence of the method. The second entry is the error in the approximations to the zeros. The last two entries contain approximations to the roots and factors terms. Since there are two real roots and just one nonreal root, the roots and factor terms coincide.

Example 8

Our next test example is a polynomial that also fulfills the assumptions of Theorem 5 and has simple zeros (see [11], Example 1). First, we check that the polynomial

(22) |

(23) |

The convergence to the roots is in a order different from the one given in (22) because the convergence to the factor terms also occurs in a sequence different from the one given in (23).

Example 9

The polynomial has an isolated root and a sphere of zeros . The assumptions of Theorem 5 do not apply to this polynomial, but we can observe convergence to the roots as we increase the precision of the computations. When a polynomial has a spherical root, two of its factor terms are in the same congruence class. Therefore, as the iteration proceeds, the values in (17) become close to zero and some care is required.

Using the usual precision, it was not possible to reach the required tolerance. However, performing the calculations with more decimal places causes a fast convergence, under the same assumptions.

The spherical root can be identified at once by observing that, up to the required precision, we have .

This is the second article on several computational aspects of polynomials in the ring . One can find in the literature methods for numerically approximating the zeros of quaternionic polynomials based on the use of complex techniques, but numerical methods relying on quaternion arithmetic remain scarce, with the exceptions of the Newton and Weierstrass methods discussed in this article. We developed several functions to implement those methods and we also added some visualization tools.

Research at the Centre of Mathematics (CMAT) was financed by Portuguese Funds through FCT – Fundação para a Ciência e a Tecnologia, within the Project UID/MAT/00013/2013. Research at the Economics Politics Research Unit (NIPE) was carried out within the funding with COMPETE reference number POCI-01-0145-FEDER-006683 (UID/ECO/03182/2013), with the FCT/MEC’s (Fundação para a Ciência e a Tecnologia, I.P.) financial support through national funding and by the European Regional Development Fund (ERDF) through the Operational Programme on “Competitiveness and Internationalization – COMPETE 2020” under the PT2020 Partnership Agreement.

[1] | I. Niven, “Equations in Quaternions,” The American Mathematical Monthly, 48(10), 1941pp. 654–661. www.jstor.org/stable/2303304. |

[2] | M. I. Falcão, F. Miranda, R. Severino, and M. J. Soares, “Computational Aspects of Quaternionic Polynomials: Part 1,” The Mathematica Journal, 20(4), 2018. doi.org/10.3888/tmj.20-4. |

[3] | R. Farouki, G. Gentili, C. Giannelli, A. Sestini and C. Stoppato, “A Comprehensive Characterization of the Set of Polynomial Curves with Rational Rotation-Minimizing Frames,” Advances in Computational Mathematics, 43(1), 2017 pp. 1–24.doi.org/10.1007/s10444-016-9473-0. |

[4] | R. Pereira, P. Rocha and P. Vettori, “Algebraic Tools for the Study of Quaternionic Behavioral Systems,” Linear Algebra and Its Applications, 400, 2005 pp. 121–140. doi.org/10.1016/j.laa.2005.01.008. |

[5] | R. Serôdio, E. Pereira and J. Vitória, “Computing the Zeros of Quaternion Polynomials,” Computers and Mathematics with Applications, 42(8-9) 2001 pp. 1229–1237. doi.org/10.1016/S0898-1221(01)00235-8. |

[6] | S. De Leo, G. Ducati, and V. Leonardi, “Zeros of Unilateral Quaternionic Polynomials,” The Electronic Journal of Linear Algebra, 15(1), 2006 pp. 297–313.doi.org/10.13001/1081-3810.1240. |

[7] | D. Janovská and G. Opfer, “Computing Quaternionic Roots by Newton’s Method,” Electronic Transactions on Numerical Analysis, 26, 2007 pp. 82–102. |

[8] | D. Janovská and G. Opfer, “A Note on the Computation of All Zeros of Simple Quaternionic Polynomials,” SIAM Journal on Numerical Analysis, 48(1), 2010 pp. 244–256. doi.org/10.1137/090748871. |

[9] | M. I. Falcão, “Newton Method in the Context of Quaternion Analysis,” Applied Mathematics and Computation, 236, 2014 pp. 458–470. doi.org/10.1016/j.amc.2014.03.050. |

[10] | F. Miranda and M. I. Falcão, “Modified Quaternion Newton Methods,” in Computational Science and Its Applications (ICCSA 2014), Guimarães, Portugal, Lecture Notes in Computer Science, 8579 (B. Murgante et al., eds.), Berlin, Heidelberg: Springer, 2014 pp. 146–161. doi.org/10.1007/978-3-319-09144-0_ 11. |

[11] | M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Weierstrass Method for Quaternionic Polynomial Root-Finding,” Mathematical Methods in the Applied Sciences, 2017 pp. 1–15. doi:10.1002/mma.4623. |

[12] | M. I. Falcão and F. Miranda, “Quaternions: A Mathematica Package for Quaternionic Analysis,” in Computational Science and Its Applications (ICCSA 2011), Lecture Notes in Computer Science, 6784 (B. Murgante, O. Gervasi, A. Iglesias, D. Taniar and B. O. Apduhan, eds.), Berlin, Heidelberg: Springer, 2011 pp. 200–214. doi:10.1007/978-3-642-21931-3_17. |

[13] | F. Miranda and M. I. Falcão. “QuaternionAnalysis Mathematica Package.” w3.math.uminho.pt/QuaternionAnalysis. |

[14] | K. Gürlebeck, K. Habetha and W. Sprössig, Holomorphic Functions in the Plane and ‐Dimensional Space, Basel: Birkhäuser, 2008. doi.org/10.1007/978-3-7643-8272-8. |

[15] | B. Kalantari, “Algorithms for Quaternion Polynomial Root-Finding,” Journal of Complexity, 29(3–4) 2013 pp. 302–322. doi.org/10.1016/j.jco.2013.03.001. |

[16] | K. Weierstrass, “Neuer Beweis des Satzes, dass jede ganze rationale Function einer Veränderlichen dargestellt werden kann als ein Product aus linearen Functionen derselben Veränderlichen,” in Sitzungsberichte der Königlich Preussischen Akademie der Wissenschaften zu Berlin, 1891. |

[17] | E. Durand, Solutions numériques des équations algébriques. Tome I: Equations du type F(x); racines d’un polynôme, Paris: Masson,1960. |

[18] | K. Dočev, “A Variant of Newton’s Method for the Simultaneous Approximation of All Roots of an Algebraic Equation,” Fiziko-Matematichesko Spisanie. Bulgarska Akademiya na Naukite, 5(38), 1962 pp. 136–139. |

[19] | I. O. Kerner, “Ein Gesamtschrittverfahren zur Berechnung der Nullstellen von Polynomen,” Numerische Mathematik, 8(3), 1966 pp. 290–294. doi.org/10.1007/BF02162564. |

[20] | S. B. Prešić, “Un procédé itératif pour la factorisation des polynômes,” Comptes Rendus de l’Académie des Sciences Paris Série A, 262, 1966 pp. 862–863. |

M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Computational Aspects of Quaternionic Polynomials: Part 2,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-5. |

Available at: w3.math.uminho.pt/QuaternionAnalysis

Available at: www.mathematica-journal.com/data/uploads/2018/05/QPolynomial.m

M. Irene Falcão is an associate professor in the Department of Mathematics and Applications of the University of Minho. Her research interests are numerical analysis, hypercomplex analysis and scientific software.

Fernando Miranda is an assistant professor in the Department of Mathematics and Applications of the University of Minho. His research interests are differential equations, quaternions and related algebras and scientific software.

Ricardo Severino is an assistant professor in the Department of Mathematics and Applications of the University of Minho. His research interests are dynamical systems, quaternions and related algebras and scientific software.

M. Joana Soares is an associate professor in the Department of Mathematics and Applications of the University of Minho. Her research interests are numerical analysis, wavelets mainly in applications to economics, and quaternions and related algebras.

**M. Irene Falcão**

CMAT – Centre of Mathematics

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*mif@math.uminho.pt*

**Fernando Miranda**

CMAT – Centre of Mathematics

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*fmiranda@math.uminho.pt*

**Ricardo Severino**

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*ricardo@math.uminho.pt*

**M. Joana Soares**

NIPE – Economics Politics Research Unit

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*jsoares@math.uminho.pt*

dx.doi.org/doi:10.3888/tmj.20-4

This article discusses a recently developed Mathematica tool––a collection of functions for manipulating, evaluating and factoring quaternionic polynomials. relies on the package , which is available for download at w3.math.uminho.pt/QuaternionAnalysis.

Some years ago, the first two authors of this article extended the standard Mathematica package implementing Hamilton’s quaternion algebra—the package —endowing it with the ability, among other things, to perform numerical and symbolic operations on quaternion-valued functions [1]. Later on, the same authors, in response to the need for including new functions providing basic mathematical tools necessary for dealing with quaternionic-valued functions, wrote a full new package, . Since 2014, the package and complete support files have been available for download at the Wolfram Library Archive (see also [2] for updated versions).

Over time, this package has become an important tool, especially in the work that has been developed by the authors in the area of quaternionic polynomials ([3–5]). While this work progressed, new Mathematica functions were written to appropriately deal with problems in the ring of quaternionic polynomials. The main purpose of the present article is to describe these Mathematica functions. There are two parts.

In this first part, we discuss the tool, containing several functions for treating the usual problems in the ring of quaternionic polynomials: evaluation, Euclidean division, greatest common divisor and so on. A first version of was already introduced in [4], having in mind the user’s point of view. Here, we take another perspective, giving some implementation details and describing some of the experiments performed.

The second part of the article (forthcoming) is entirely dedicated to root-finding methods.

In 1843, the Irish mathematician William Rowan Hamilton introduced the quaternions, which are numbers of the form

where the imaginary units , and satisfy the multiplication rules

This noncommutative product generates the well-known algebra of real quaternions, usually denoted by .

**Definition 1**

The standard package adds rules to , , , and the fundamental . Among others, the following quaternion functions are included: , , , , , , and . In , a quaternion is an object of the form and must have real numeric valued entries; that is, applying the function to an argument gives .

The extended version allows the use of symbolic entries, assuming that all symbols represent real numbers. The package adds functionality to the following functions: , , , , , , , , , and . We briefly illustrate some of the quaternion functions needed in the sequel. In what follows, we assume that the package has been installed.

These are the imaginary units.

These are the multiplication rules.

Here are two quaternions with symbolic entries and their product.

The product is noncommutative.

Here are some basic functions.

The function , which was extended in through the use of de Moivre’s formula for quaternions, works quite well for quaternions with numeric entries.

contains a different implementation of the power function, , which we recommend whenever a quaternion has symbolic entries.

We refer the reader to the package documentation for more details on the new functions included in the package.

We focus now on the polynomial in one formal variable of the form

(1) |

where the coefficients are to the left of the powers. Denote by the set of polynomials of the form (1), defining addition and multiplication as in the commutative case and assuming the variable commutes with the coefficients. This is a ring, referred to as the ring of left one-sided (or unilateral) polynomials.

When working with the functions contained in , a polynomial in is an object defined through the use of the function , which returns the simplest form of , taking into account the following rules.

The function tests if an argument is a scalar in the sense that it is not a complex number, a quaternion number or a polynomial.

For polynomials in , the rules , , and have to be defined.

** **■ *Addition*

■** ***Product by a scalar*

■** ***Multiplication*

■** ***Power*

Example* *1

The polynomials and can be defined using their coefficients in in descending order.

We now define three particularly important polynomials, the first two associated with a given polynomial and the last one associated with a given quaternion .

**Definition 2**

With a polynomial as in equation (1) and a quaternion, define:

The first two polynomials are constructed with the functions and .

The built-in function now accepts a quaternion argument.

Observe that is a polynomial with real coefficients. For simplicity, in this context and in what follows, we assume that a quaternion with vector part zero is real.

Example* *2

Consider the polynomial of Example 1 and the quaternion .

The evaluation map at a given quaternion , defined for the polynomial given by (1), is

(2) |

It is not an algebra homomorphism, as does not lead, in general, to , as the next theorem remarks.

As usual, we say that is a zero (or root) of if An immediate consequence of Theorem 1 is that if , then is a zero of if and only if is a zero of .

A straightforward implementation of equation (2) can be obtained through .

As in the classical (real or complex) case, the evaluation of a polynomial can also be obtained by the use of Horner’s rule [3]. The nested form of equation (2) is

and the quaternionic version of Horner’s rule can be implemented as .

Example* *3

Consider again the polynomial . The problem of evaluating at can be solved through one of the following (formally) equivalent expressions.

Example* *4

We now illustrate some of the conclusions of Theorem 1 by considering the polynomials , and and the quaternion .

For the theoretical background of this section, we refer the reader to [6] (see also [7] where basic division algorithms in are presented). Since is a principal ideal domain, left and right division algorithms can be defined. The following theorem gives more details.

If and are polynomials in (with ), then there exist unique , , and such that

(3) |

(4) |

If in equation (3), , then is called a right divisor of , and if in equation (4), , is called a left divisor of . This article only presents right versions of the division functions; in both the left and right versions are implemented. The function performs the right division of two quaternionic polynomials, returning a list with the quotient and remainder of the division.

Example* *5

Consider the polynomials and .

Since , is a right divisor of and . On the other hand, does not right-divide (but it is a left divisor).

The greatest common (right or left) divisor polynomial of two polynomials can now be computed using the Euclidean algorithm by a basic procedure similar to the one used in the complex setting. The function implements this procedure for the case of the greatest common right divisor.

Example* *5 (continued)

and .

Before describing the zero set of a quaternionic polynomial , we need to introduce more concepts.

**Definition 3**

We say that a quaternion is congruent (or similar) to a quaternion (and write ) if there exists a nonzero quaternion such that .

This is an equivalence relation in that partitions into congruence classes. The congruence class containing a given quaternion is denoted by . It can be shown (see, e.g. [8]) that

This result gives a simple way to test if two or more quaternions are similar, implemented with the function .

For zero or equality testing, we use the test function.

It follows that if and only if . The congruence class of a nonreal quaternion can be identified with the three-dimensional sphere in the hyperplane with center and radius .

**Definition 4**

A zero of is called an isolated zero of if contains no other zeros of . Otherwise, is called a spherical zero of and is referred to as a sphere of zeros.

It can be proved that if is a zero that is not isolated, then all quaternions in are in fact zeros of (see Theorem 4); therefore the choice of the term spherical to designate this type of zero is natural. According to the definition, real zeros are always isolated zeros. Identifying zeros can be done, taking into account the following results.

A nonreal zero is a spherical zero of if and only if any of the following equivalent conditions hold:

3. The characteristic polynomial of is a right divisor of ; that is, there exists a polynomial such that .

Example* *6

We are going to show that the polynomial

has a spherical zero: and an isolated one: .

We first observe that both and are zeros of .

Now we use Theorem 4-1 to conclude that the zero is spherical, while the zero is isolated.

We can reach the same conclusion from Theorem 4-3.

Taking all this into account, the verification of the nature of a zero can be done using the function .

Consider the same polynomial and quaternions again.

We now list other results needed in the next section.

Let and . Then is a zero of if and only if there exists such that .

Any nonconstant polynomial in always has a zero in .

In this section, we address the problem of factoring a polynomial . We mostly follow [4]. As in the classical case, it is always possible to write a quaternionic polynomial as a product of linear factors; however the link between these factors and the corresponding zeros is not straightforward. As an immediate consequence of Theorems 5 and 6, one has the following theorem.

Any monic polynomial of degree in factors into linear factors; that is, there exist such that

(5) |

**Definition 5**

In a factorization of of the form (5), the quaternions are called factor terms of and the -tuple is called a factor terms chain associated with or simply a chain of .

If and are chains associated with the same polynomial , then we say that the chains are similar and write .

The function constructs a polynomial with a given chain, and the function checks if two given chains are similar.

The repeated use of the next result allows the constructions of similar chains, if any.

Theorem 8 can be implemented using the function .

Example* *7

This constructs chains similar to the chain .

Observe that , and are similar chains.

We emphasize that there are polynomials with just one chain. This issue is addressed in Theorem 12. For the moment, we just give an example of such a polynomial.

These computations lead us to the conclusion that the polynomial factors uniquely as .

The next fundamental results shed light on the relation between factor terms and zeros of a quaternionic polynomial.

Let be a chain of the polynomial . Then every zero of is similar to some factor term in the chain and conversely, every factor term is similar to some zero of .

Consider a chain of the polynomial . If the similarity classes are distinct, then has exactly zeros , which are given by:

(6) |

The function determines the zeros of a polynomial with a prescribed chain in the case where no two factors in the chain are similar quaternions, giving a warning if this condition does not hold.

Example* *8

Consider the polynomial . One of its chains is , and it follows at once that the similarity classes of the factor terms are all distinct. Therefore, we conclude from Theorem 10 that has four distinct isolated roots, which can be obtained with the following code.

On the other hand, the polynomial has as one of its chains. Since , one cannot apply Theorem 10 to find the roots of .

Observe that this does not mean that the roots of are spherical.

This issue will be resumed later in connection with the notion of the multiplicity of a zero. The following theorem indicates how, under certain conditions, one can construct a polynomial having prescribed zeros.

If are quaternions such that the similarity classes are distinct, then there is a unique polynomial of degree with zeros that can be constructed from the chain , where

where is the polynomial (6).

The function implements the procedure described in Theorem 11.

Example* *9

Consider the problem of constructing a polynomial having the isolated roots . We first determine one chain associated with these zeros.

Now we determine the polynomial associated with this chain.

Check the solution.

Let be a quaternionic polynomial of degree . Then is the unique zero of if and only if admits a unique chain with the property

(7) |

Moreover, if a chain associated with a polynomial has property (7), is a polynomial of degree such that is its unique zero and , then the polynomial (of degree ) has only two zeros, namely and .

We can now introduce the concept of the multiplicity of a zero and a new *kind* of zero. In this context, we have to note that several notions of multiplicity are available in the literature (see [9], [15–17]).

**Definition 6**

The multiplicity of a zero of is defined as the maximum degree of the right factors of with as their unique zero and is denoted by . The multiplicity of a sphere of zeros of , denoted by , is the largest for which divides .

Example* *10

The polynomial has an isolated root with multiplicity and an isolated root with multiplicity .

The polynomial has an isolated root with multiplicity and a sphere of zeros with multiplicity .

The polynomial has a mixed root with multiplicity and .

Finally, one can construct a polynomial with assigned zeros by the repeated use of the following result.

A polynomial with and as its isolated zeros with multiplicities and , respectively, and a sphere of zeros with multiplicity can be constructed through the chain

An alternative syntax for the function addresses the problem of constructing a polynomial (in fact it constructs a chain) once one knows the nature and multiplicity of its roots.

Example* *11

We reconsider here Example 6 of [4]. An example of a polynomial that has as a zero of multiplicity three, as a zero of multiplicity two and as a sphere of zeros with multiplicity two is

Of course this solution is not unique. For example, the polynomial

We confirm this using the function with the new syntax.

Here are two spherical roots corresponding to the same sphere.

Observe that the result is, of course, the same as this one.

Recall that a real root is always an isolated root, and two roots in the same congruence class cannot be isolated.

This article has discussed implementation issues related to the manipulation, evaluation and factorization of quaternionic polynomials. We recommend that interested readers download the support file to get complete access to all the implemented functions. The increasing interest in the use of quaternions in areas such as number theory, robotics, virtual reality and image processing [18] makes us believe that developing a computational tool for operating in the quaternions framework will be useful for other researchers, especially taking into account the power of Mathematica as a symbolic language.

In the ring of quaternionic polynomials, new problems arise mainly because the structure of zero sets, as we have described, is very different from the complex case. In this article, we did not discuss the problem of computing the roots or the factor terms of a polynomial; all the results we have presented assumed that either the zeros or the factor terms of a given polynomial are known. Methods for computing the roots or factor terms of a quaternionic polynomial are considered in Part II.

Research at the Centre of Mathematics at the University of Minho was financed by Portuguese Funds through FCT – Fundação para a Ciência e a Tecnologia, within the Project UID/MAT/00013/2013. Research at the Economics Politics Research Unit was carried out within the funding with COMPETE reference number POCI-01-0145-FEDER-006683 (UID/ECO/03182/2013), with the FCT/MEC’s (Fundação para a Ciência e a Tecnologia, I.P.) financial support through national funding and by the European Regional Development Fund through the Operational Programme on “Competitiveness and Internationalization – COMPETE 2020” under the PT2020 Partnership Agreement.

[1] | M. I. Falcão and F. Miranda, “Quaternions: A Mathematica Package for Quaternionic Analysis,” in Computational Science and Its Applications (ICCSA 2011), Lecture Notes in Computer Science, 6784 (B. Murgante, O. Gervasi, A. Iglesias, D. Taniar and B. O. Apduhan, eds.), Berlin, Heidelberg: Springer, 2011 pp. 200–214. doi:10.1007/978-3-642-21931-3_17. |

[2] | F. Miranda and M. I. Falcão. “QuaternionAnalysis Mathematica Package.” w3.math.uminho.pt/QuaternionAnalysis. |

[3] | M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Evaluation Schemes in the Ring of Quaternionic Polynomials,” BIT Numerical Mathematics, 58(1), pp. 51–72. doi:10.1007/s10543-017-0667-8. |

[4] | M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Mathematica Tools for Quaternionic Polynomials,” in Computational Science and Its Applications (ICCSA 2017), Lecture Notes in Computer Science, 10405, (O. Gervasi, B. Murgante, S. Misra, G. Borruso, C. M. Torre, A. M. A. C. Rocha, D. Taniar, B. O. Apduhan, E. Stankova and A. Cuzzocrea, eds.), Berlin, Heidelberg: Springer, 2017 pp. 394–408. doi:10.1007/978-3-319-62395-5_27. |

[5] | M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Weierstrass Method for Quaternionic Polynomial Root-Finding,” Mathematical Methods in the Applied Sciences, 2017 pp. 1–15. doi:10.1002/mma.4623. |

[6] | N. Jacobson, The Theory of Rings (Mathematical Surveys and Monographs), New York: American Mathematical Society, 1943. |

[7] | A. Damiano, G. Gentili and D. Struppa, “Computations in the Ring of Quaternionic Polynomials,” Journal of Symbolic Computation, 45(1), 2010 pp. 38–45. doi:10.1016/j.jsc.2009.06.003. |

[8] | F. Zhang, “Quaternions and Matrices of Quaternions,” Linear Algebra and Its Applications, 251, 1997 pp. 21–57. doi:10.1016/0024-3795(95)00543-9. |

[9] | B. Beck, “Sur les équations polynomiales dans les quaternions,” L’ Enseignement Mathématique, 25, 1979 pp. 193–201. |

[10] | A. Pogorui and M. Shapiro, “On the Structure of the Set of Zeros of Quaternionic Polynomials,” Complex Variables. Theory and Application, 49(6), 2004 pp. 379–389. doi:10.1080/0278107042000220276. |

[11] | B. Gordon and T. S. Motzkin, “On the Zeros of Polynomials over Division Rings,” Transactions of the American Mathematical Society, 116, 1965 pp. 218–226.doi:10.1090/S0002-9947-1965-0195853-2. |

[12] | T.-Y. Lam, A First Course in Noncommutative Rings, New York: Springer-Verlag, 1991. |

[13] | I. Niven, “Equations in Quaternions,” The American Mathematical Monthly, 48(10), 1941pp. 654–661. www.jstor.org/stable/2303304. |

[14] | R. Serôdio and L.-S. Siu, “Zeros of Quaternion Polynomials”. Applied Mathematics Letters, 14(2), 2001 pp. 237–239. doi:10.1016/S0893-9659(00)00142-7. |

[15] | R. Pereira, Quaternionic Polynomials and Behavioral Systems, Ph.D. thesis, Departamento de Matemática, Universidade de Aveiro, Portugal, 2006. |

[16] | G. Gentili and D. C. Struppa, “On the Multiplicity of Zeroes of Polynomials with Quaternionic Coefficients,” Milan Journal of Mathematics, 76(1), 2008 pp. 15–25.doi:10.1007/s00032-008-0093-0. |

[17] | M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Quaternionic Polynomials with Multiple Zeros: A Numerical Point of View,” in 11th International Conference on Mathematical Problems in Engineering, Aerospace and Sciences (ICNPAA 2016), La Rochelle, France, AIP Conference Proceedings, 1798(1), 2017 p. 020099. doi:10.1063/1.4972691. |

[18] | H. R. Malonek, “Quaternions in Applied Sciences. A Historical Perspective of a Mathematical Concept,” in 17th International Conference on the Applications of Computer Science and Mathematics in Architecture and Civil Engineering (IKM 2003) (K. Gürlebeck and C. Könke, eds.), Weimar, Germany, 2003. |

M. I. Falcão, F. Miranda, R. Severino and M. J. Soares, “Computational Aspects of Quaternionic Polynomials,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-4. |

Available at: w3.math.uminho.pt/QuaternionAnalysis

Available at: www.mathematica-journal.com/data/uploads/2018/05/QPolynomial.m

M. Irene Falcão is an associate professor in the Department of Mathematics and Applications of the University of Minho. Her research interests are numerical analysis, hypercomplex analysis and scientific software.

Fernando Miranda is an assistant professor in the Department of Mathematics and Applications of the University of Minho. His research interests are differential equations, quaternions and related algebras and scientific software.

Ricardo Severino is an assistant professor in the Department of Mathematics and Applications of the University of Minho. His research interests are dynamical systems, quaternions and related algebras and scientific software.

M. Joana Soares is an associate professor in the Department of Mathematics and Applications of the University of Minho. Her research interests are numerical analysis, wavelets mainly in applications to economics, and quaternions and related algebras.

**M. Irene Falcão**

CMAT – Centre of Mathematics

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*mif@math.uminho.pt*

**Fernando Miranda**

CMAT – Centre of Mathematics

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*fmiranda@math.uminho.pt*

**Ricardo Severino**

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*ricardo@math.uminho.pt*

**M. Joana Soares**

NIPE – Economics Politics Research Unit

DMA – Department of Mathematics and Applications

University of Minho

Campus de Gualtar, 4710-057 Braga

Portugal

*jsoares@math.uminho.pt*

dx.doi.org/doi:10.3888/tmj.20-3

The action of Möbius transformations with real coefficients preserves the hyperbolic metric in the upper half-plane model of the hyperbolic plane. The modular group is an interesting group of hyperbolic isometries generated by two Möbius transformations, namely, an order-two element and an element of infinite order . Viewing the action of the group elements on a model of the hyperbolic plane provides insight into the structure of hyperbolic 2-space. Animations provide dynamic illustrations of this action.

This article updates an earlier article [1].

Transformations of spaces have long been objects of study. Many of the early examples of formal group theory were the transformations of spaces. Among the most important transformations are the *isometries*, those transformations that preserve lengths. Euclidean isometries are translations, rotations and reflections. The groups and subgroups of Euclidean isometries of the plane are so familiar to us that we may not think of them as revealing much about the space they transform. In hyperbolic space, however, light traveling or even a person traveling on a hyperbolic shortest-distance path tends to veer away from the boundary. Thus, the geometry is unusual enough so that viewing the actions of isometries of hyperbolic 2-space reveals some of the shape of that space. Two-dimensional hyperbolic space is referred to as the *hyperbolic plane*.

Here are graphic building blocks used for all of the animations.

Figure 1 shows four cyan and white regions, each bounded by some combination of three arcs or rays. Any two adjacent regions make up a *fundamental region*. The two fundamental regions shown on either side of the axis (each with one white and one cyan half) are related by the function , which is an inversion over the unit circle composed with a reflection in the axis.

**Figure 1. **Two fundamental domains on either side of the axis.

This article examines how elements of the modular group rearrange the triangular-shaped regions shown in Figure 2. The curved paths are arcs of circles orthogonal to the axis. Arcs on these circles are hyperbolic geodesics, that is, shortest-distance paths in hyperbolic 2-space. In Euclidean space, the shortest-distance paths lie on straight lines. In hyperbolic space, shortest paths lie on circles that intersect the boundary of the space at right angles. Hyperbolic distances are computed as if there is a penalty to pay for traveling near to the plane’s boundary. Thus, the shortest-distance paths between two points must bend away from the boundary.

**Figure 2. **The upper half-plane model of the hyperbolic plane.

In the animations that follow, it is instructive to focus on the action that a transformation takes on the family of circles that meet the axis at right angles. The transformations that we consider, namely members of the modular group, preserve this family of circles. The circles in the family are shuffled onto different members in the family, but no new circles are created and none are taken away. One could say that in the context of hyperbolic geometry, the transformations preserve the family of all shortest-distance paths. Indeed this is an excellent thing for an isometry to do!

The context of this article can be found described in Chapter 2 of [2]. In this small text, one can find illustrations that inspired our animations. The formulas, which made coding the animations much simpler than one might expect, are given and justified in detail.

First consider a class of functions known as Möbius transformations. These transformations are named after the same mathematician with whom we associate the one-sided, half-twisted Möbius band. Möbius transformations are defined by

Here stands for the complex numbers. Over the reals, a Möbius transformation with real coefficients falls into one of two categories: either , and the graph is a straight line, or , and the graph is a hyperbola. A representation of this latter type of function is shown in Figure 3.

**Figure 3. **Graph of shown with dashed asymptotes.

Our purpose is to investigate how Möbius transformations stretch and twist regions in the *extended complex plane*. The *complex plane* is the usual Euclidean plane with each point identified as a complex number—namely, . The *extended complex plane* is formed from the complex plane by adding the point at infinity. A Möbius transformation is one-to-one (injective) on the extended complex plane .

When a Möbius transformation acts on a complex number, , we may view the action as moving the point to the point Importantly, a Möbius transformation maps the set of circles and lines in back to in . A comprehensive proof of this fact may be found in most elementary texts on complex variables, for example, in [3], p. 158.

The figures of our animations live in the extended complex plane. Each point of a figure, taken as a complex number, is acted on by the Möbius transformations. These transformations spin hyperbolic 2-space about a fixed point or shift the space in one direction or another.

The *modular group* is a special class of Möbius transformations:

That is, if , the coefficients of are integers and the coefficient matrix has determinant equal to one.

What is a group? Recall that a *group* is a set together with a binary operation satisfying certain properties: (1) the set must be closed under the operation; (2) the operation must be associative; and (3) there must be an identity element for the operation, and all inverses of elements in must themselves be elements of . The proof that the modular group is, in fact, a group under the operation of function composition is a standard exercise in a course on complex analysis. (See, for example, [3], p. 277–278.)

We take as established that the elements of the modular group do indeed form a group and investigate some of the interesting subgroups.

One of our main goals is to investigate how the elements of the modular group act on fundamental regions. That is to say, how the regions are stretched and bent when we view them as Euclidean objects. As hyperbolic objects, the regions are all carbon copies of each other, in much the same way that the squares on a checkerboard are all identical in ordinary Euclidean geometry.

In general, a group of one-to-one transformations acting on a topological space partitions that space into *fundamental regions*. For a collection of sets to be a collection of fundamental regions, certain properties must hold. First and foremost, the must be pairwise disjoint. Second, given any transformation in the group other than the identity, and are disjoint. Finally, given any two regions and , there exists some transformation such that .

Generally, in order to cover the entire space without overlapping, each fundamental region must contain some but not all of its boundary points. This technicality is set aside for the purposes of this article.

In fact, in this article we relax the definition to include *all* of the boundary points for a particular fundamental region. Thus, adjacent fundamental regions can only overlap on their boundaries. The essential feature remains that there is no *area* in the intersection of adjacent regions.

A group of transformations does not necessarily yield a unique partition of the space into fundamental regions. Thus, the fundamental regions we view are merely representative fundamental regions.

Figure 4 shows a fundamental region of the modular group with some parts highlighted.

**Figure 4. **A fundamental region with vertices marked and a pair of tangents.

Each fundamental region contains four vertices that can be fixed by elements of the modular group. (A point is fixed by if .) Tangents are drawn at one vertex; the angle is 60 degrees. The vertex at the top has a straight, 180 degree angle. The vertex at the bottom has a zero degree angle because the tangents to the intersecting arcs coincide there. Any hyperbolic polygon with a vertex on the boundary of the space (the axis in this case of the upper half-plane) has a zero degree angle at that vertex. The corresponding four angles in each fundamental region have the same measures as those indicated here. Each vertex can be fixed by some element in the modular group. Further, each fundamental region can be mapped onto any other fundamental region by an element of the modular group.

A classic view of the matter is to see the upper half-plane as tessellated (or tiled) by triangular-shaped regions, as in Figure 2. A checkerboard tessellation of the Euclidean plane can be constructed by sliding copies of a square to the left, right, up and down. Eventually, the plane is covered with square tiles. The modular group tessellates the hyperbolic plane in an analogous way. The elements of the group move copies of a fundamental region until triangular-shaped tiles cover the upper half-plane model of the hyperbolic plane. Of course, these tiles do not appear to be identical to our eyes, trained to match shapes and lengths in Euclidean geometry. However, the triangular-shaped tiles are all identical if measured using the hyperbolic metric. In the tiling process, all areas in the upper half-plane are covered by tiles and no two tiles have any overlapping area. In fact, this procedure is precisely how the hyperbolic plane illustration was constructed. The boundary points for a single fundamental region were acted on by function elements of the modular group, and the resulting points were drawn as a boundary line in the illustration.

It helps to note that each transformation in has at least one *fixed point*. Some transformations in have two fixed points. Only the identity map has more than two. In the illustrations that follow, we observe the placement of fixed points and the way transformations map fundamental regions near them.

The hyperbolic metric is a rather curious metric that challenges our notion of distance. Under the hyperbolic metric in the upper half-plane, the shortest distance between two points is along a vertical line or an arc of a circle perpendicular to the boundary (the real axis). For example, the shortest hyperbolic path between the points and is the top arc of the circle , which passes through both points and is perpendicular to the real axis (Figure 5).

**Figure 5. **The shortest hyperbolic path between the points and .

Without discussing precisely how hyperbolic lengths and areas are measured, we state that every image under a transformation in the modular group is *congruent* to every other image under the hyperbolic metric. Thus, all of our fundamental regions shown in the animations are actually the same size in the hyperbolic metric. For a discussion on hyperbolic metrics, [4] is a good place to start.

We structure our investigation of the modular group by considering four cyclic subgroups. Recall that a *cyclic subgroup* can be generated by computing all powers of a single group element. The four cyclic subgroups we present are representative of the four possible types of subgroups found in the modular group.

For the first subgroup, consider the function ; it is a Möbius transformation with coefficients and coefficient matrix . The subscript indicates that is of order two in ; that is, , and so is its own inverse. In this case, generates a subgroup with only two elements, namely .

In this article, we adopt the standard notation that angular parentheses indicate the set of elements generated by taking products from the elements enclosed by the parentheses. Curly braces , on the other hand, enclose the delineated list of elements in a set.

The Möbius transformation , its inverse and the function are used for the motion in Figure 6.

**Figure 6. **Action of the order-two element .

The depicts the way in which maps the two fundamental regions shown in Figure 2 onto one another. In fact, the action of on the fundamental regions is to hyperbolically rotate them 180° onto each other about the central fixed point . The actual mapping is performed instantaneously without rotation. In particular, only the first, middle and final frames contain illustrations of fundamental regions. However, the sequence of intermediate mappings illustrates through animation the mapping properties of . In a later section, we discuss how the functions illustrated were broken into a composition of functions so that the hyperbolic nature of their motion was made continuous.

This example highlights the fact that vertical lines are paths of least distance in the upper half-plane model of the hyperbolic plane. Indeed, it is usual to view straight lines as circles that have radii with infinite length and that pass through the point at infinity. With this bending of the definition of a circle, a vertical line has all the characteristics required of a geodesic in the hyperbolic plane. Like the circles, a vertical line is perpendicular to the axis, which is the boundary of the upper half-plane model of the hyperbolic plane. A vertical line is the limit of a sequence of geodesic circles.

The second example (see Figure 7) is a subgroup of infinite order generated by the linear shift (or translation) ; it is a Möbius transformation with and matrix . The function has infinite order because , and no point of ever returns to its original position no matter how many times is applied, though in the extended plane. The subgroup produced by taking all powers of and its inverse is denoted as .

Every point in the plane shifts one unit to the right under the action of . The infinite half-strips in the following are images of each other under powers of . For contrast, we also provide images of these infinite half-strip regions under the map . These images are bunched in a flower-like arrangement attached to the real axis at the origin. As the blue infinite regions are pushed from left to right, their magenta images echo their motion in a counterclockwise direction. These two actions are not produced by a single transformation. The two transformations that cause these actions are closely related to each other as algebraic conjugates, but more on that in a later section.

**Figure 7. **This animation shows copies of fundamental regions moving back and forth, with corresponding regions anchored at the origin.

The hyperbolic isometry is notable among the elements of the modular group because it is also a Euclidean isometry. Under the hyperbolic metric, the magenta regions are each congruent to the half-strip regions in blue.

The third cyclic subgroup of that we consider is generated by the composition of the first two functions and : define . The subgroup generated by this element is denoted , a subgroup of order 3.

Here is the fixed point of .

Define and its inverse .

In Figure 8, the function moves the fixed point to the origin and moves it back.

The function is an order-three hyperbolic rotation made continuous; it is used in Figures 8 and 11.

**Figure 8. **Action of the order-three element .

A red fundamental region and a green fundamental region are shown associated with the blue fundamental region attached to the origin in the animation’s first frame. We include these in order to provide a better orientation for the scene. Of special interest is how the point of the blue region on the axis moves as the rotation takes place. The point begins at the origin and slides toward the right along the positive axis. The blue lines of the cluster become vertical precisely when that point arrives at the point at infinity! The point continues by sliding along the negative axis to arrive back at the origin. It is fair to say that the motion of a point as it passes through the origin is a “mirror image” of the motion of the point as it passes through the point at infinity. The function used here is a composition of Möbius transformations that is described and demonstrated in Figure 12.

The rotations we saw in the action of and are of orders two and three, respectively. That is to say, after the rotation is repeated a number of times, all points are back to their original positions. In contrast, the function generates an infinite subgroup. When we iterate , the right shifts accumulate at the point at infinity. Points in the left half-plane get repelled by infinity, while points in the right half-plane get attracted to infinity. Of course, since all points in the left half-plane eventually map to points in the right half-plane, all points are, in some sense, simultaneously attracted to and repelled by infinity under the action of . Indeed, the point at infinity is the single fixed point for the action of .

The transformation in the modular group generates an infinite subgroup that differs from in the sense that has two distinct fixed points, an attractor and a repeller.

Define the well-known golden ratio ; its reciprocal is .

This defines with fixed points and .

Define to move the fixed points and to infinity and zero, respectively.

Define to be the inverse of ; sends infinity back to and zero back to .

For Figure 9, makes the hyperbolic translation continuous. The ratio is the length of the hyperbolic translation.

**Figure 9. **Action of the hyperbolic element .

The depicts the action of on fundamental regions in the plane. All points exterior to the red circle on the left are mapped to the interior of the green circle on the right. The animation begins with regions that lie exterior to the red and green circles. These regions are all mapped to the area between the green circle and smaller cyan circle. If the action of were to be repeated, the regions would be mapped into the interior of increasingly small circles inside the smallest (cyan) circle shown. The attracting fixed point for lies within these shrinking, nested circles.

The rotations and translations we have seen as examples are intimately related to Euclidean rotations and translations, as discussed in Section 6. The transformation is related in a similar way to a Euclidean dilation, which turns a figure into a similar but not congruent image figure. A curious characteristic of hyperbolic space is that the distinction between similarity and congruence disappears. In the hyperbolic plane, it is enough for two figures to have the same angles to guarantee congruence. In marked contrast to Euclidean space, equal angles guarantee that corresponding side lengths are equal in the hyperbolic metric.

The entire group can be generated by the two functions and . In symbols, . Establishing this fact requires tools from linear algebra about which we make only a few brief comments. The group of matrices with real number entries and with nonzero determinants is denoted by . This group has been studied extensively and much is known about it. Thus, there are great advantages for any group that can be represented as a subgroup of . While the modular group cannot be represented in exactly this way, it almost can be. For instance, the elements and are considered distinct in . On the other hand, the actions of the two associated Möbius transformations are identical, since . In general, for every element in the modular group, there are two associated elements in .

A remarkable feature of Möbius transformations is that the group operation of compositions produces coefficients that are identical to the results of matrix multiplication. To see this, consider the two Möbius transformations and . First, multiply the associated matrices.

Second, carry out the composition of the two functions.

The coefficients and the four matrix entries are the same!

In this way, the group operation of composition of functions in the modular group can be replaced with the group operation of matrix multiplication in . It is down this path we would travel if we were to present a complete proof of the claim that the modular group is generated by the two elements and .

A major part of this claim is that any element of can be written as a composition of and Consider the following examples of compositions.

Each of these functions has a coefficient matrix with determinant equal to one. A worthy exercise for undergraduate mathematics students is to verify by direct computations that each equality holds for the indicated compositions.

The four examples of cyclic subgroups outlined in Section 4 give a complete description of the four types of subgroups possible in the modular group. Any subgroup of the modular group is *conjugate* to a subgroup generated by , , an iterate of or an iterate of an element of the same type as

Recall that in a group , a subgroup is *conjugate* to a subgroup if there exists an element such that the entire subgroup can be generated by computing for every More compactly, we write , and even more compactly, .

Here we define the hyperbolic translation that relates two order-three elements: and .

Consider a function in the modular group that generates an order-three subgroup (Figure 10).

**Figure 10. **The action of the order-three function on a selection of fundamental regions.

We view side by side the actions of on the right and on the left. For the figure on the left, first the function moves the fixed point of onto the fixed point of . Then the function rotates the attached fundamental regions, as we have seen it do before, while at the same time the function acts on the right-hand figure. Finally, the inverse of returns the fixed point and associated regions to the original position, except that the fundamental regions have been rotated in the same way as those on the right. Thus, the final results are the same in both cases.

The function is used for the continuous motion in Figure 11.

**Figure 11. **The action of and .

This animation demonstrates what it means for two functions to be conjugate equivalent.

All functions in the modular group are abruptly discontinuous; that is, their actions move triangular regions onto other regions all in one jump. The facility to produce transformations that seem continuous is due to the following.

Every element of the modular group is conjugate equivalent to one of three Euclidean transformations, namely a rotation about the origin, a scaling from the origin or a rigid translation of the entire plane ([2], pp. 12–20).

These Euclidean transformations have very simple continuous forms:

Rotation: .

Scaling: .

Translation: .

The left-hand side of Figure 12 shows the fixed point of translated to the origin. Following this transformation, all circles that passed through the original fixed point become straight lines passing through the origin. A Euclidean rotation about the origin accomplishes the desired rearrangement of the regions. Finally, translating the fixed points back to their original positions maps the fundamental regions to their proper, final positions. We see that the final results are the same for the right and left animations.

Indeed, each frame in the right-hand animation was computed by composing the functions that are explicitly portrayed in the left-hand animation.

**Figure 12. **Conjugation with rotation of 120°.

In this way, the action of the hyperbolic motions can be animated as continuous because the Euclidean rotations, translations and dilations can all be coded as continuous functions.

[1] | P. McCreary, T. J. Murphy and C. Carter, “The Modular Group,” The Mathematica Journal, 9(3), 2005. www.mathematica-journal.com/issue/v9i3. |

[2] | L. Ford, Automorphic Functions, New York: McGraw-Hill, 1929. |

[3] | N. Levinson and R. M. Redheffer, Complex Variables, San Francisco: Holden-Day, 1970. |

[4] | E. W. Weisstein. “Hyperbolic Metric” from Wolfram MathWorld—A Wolfram Web Resource. mathworld.wolfram.com/HyperbolicMetric.html. |

P. R. McCreary, T. J. Murphy and C. Carter, “The Modular Group,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-3. |

**Paul R. McCreary**

*The Evergreen State College-Tacoma
Tacoma, WA*

**Teri Jo Murphy**

*Department of Mathematics & Statistics
Northern Kentucky University
Highland Heights, KY*

**Christan Carter**

*Department of Mathematics
Xavier University of Louisiana
New Orleans, LA*

dx.doi.org/doi:10.3888/tmj.20-2

We propose and implement an algorithm for solving an overdetermined system of partial differential equations in one unknown. Our approach relies on the Bour–Mayer method to determine compatibility conditions via Jacobi–Mayer brackets. We solve compatible systems recursively by imitating what one would do with pen and paper: Solve one equation, substitute its solution into the remaining equations, and iterate the process until the equations of the system are exhausted. The method we employ for assessing the consistency of the underlying system differs from the traditional use of differential Gröbner bases, yet seems more efficient and straightforward to implement.

The search of solutions of many problems leads to overdetermined systems of partial differential equations (PDEs). These problems comprise the computation of discrete symmetries of differential equations [1], the calculation of differential invariants [2] and the determination of generalized Casimir operators of a finite-dimensional Lie algebra [3]. In this article, we focus solely on the integration of simultaneous systems of scalar first-order PDEs; that is, our systems have at least two equations, one dependent variable (the unknown function) and several independent variables. Our ultimate goal is to automate the search of general symbolic solutions of these systems. The approach we adopt uses the Bour–Mayer method [4] to find compatibility conditions (i.e. obstructions to the integrability) of the underlying system of PDEs and to iteratively prepend these compatibility conditions to the system until a consistent or an inconsistent system is found. This differs from the traditional approach, which uses differential Gröbner bases [5] to discover compatibility conditions. When applicable, it has the advantage of being easy to implement and efficient. Recently, using machinery from differential geometry, Kruglikov and Lychagin [6] have extended the Bour–Mayer method to systems of PDEs in several dependent and independent variables of mixed orders (i.e. the orders of the individual equations in the system can be different). In our approach, for the situation where the completion process leads to a consistent system, we solve the latter by imitating what one would do with pen and paper: Solve one equation, substitute it into the next equation, and continue the process until the equations of the system are exhausted.

To fix ideas, consider a system of PDEs

(1) |

where to are the independent variables, is the partial derivative of the unknown function with respect to , and the rank of the Jacobian matrix is . In the sequel, we will say that a property holds locally if it is true on an open ball of its domain of validity. The system of equations (1) is integrable (i.e. admits a locally smooth solution) provided the expressions to derived from it locally satisfy the conditions

(2) |

To see this, consider a solution of the system of equations (1). Then, locally, . Thus, the latter differential form is locally exact. So, in particular, it is locally closed. Therefore, its exterior differential vanishes; that is, , or equivalently, after some calculations, , which implies (2). Conversely, if the system of equations (2) is locally satisfied, then the differential form is locally closed and by Poincaré’s lemma, it is also locally exact. Hence, for some locally smooth function . Therefore is locally defined by , where is an arbitrary constant.

Bour and Mayer (see e.g. [4]) showed that (1), subject to the condition on the Jacobian matrix of the with respect to the , is integrable if and only if the Jacobi–Mayer

(3) |

whenever (1) is satisfied. From now on, abbreviate the phrase “ whenever (1) is satisfied” to .

For a given system of equations (1) satisfying the nondegeneracy condition mentioned, four cases arise.

The first case is when and all the Jacobi–Mayer brackets vanish whenever (1) is satisfied. In this case, we can solve (1) for to . The solution of the system is then obtained by integrating the exact differential form .

The second case is when there are distinct indices and such that . Then (1) is incompatible and there are no solutions.

In the third case, , and all the Jacobi–Mayer brackets vanish in (1). We must supplement (1) with additional equations until we get to the first or second case. These equations are obtained by solving the system of linear first-order PDEs , where and . For example, we get the additional equation , where is an arbitrary constant, by solving the system of linear first-order PDEs , where . The solution of the completed system depends on arbitrary constants. We obtain the general solution of the initial system of equations (1) by expressing one of the arbitrary constants as a function of the remaining ones, then eliminating the remaining constant between the resulting equations and their first-order partial derivatives with respect to the arbitrary constants.

In the fourth and final case, some brackets are zero in (1) and other brackets have the form , where the depend at least on some . In this case, we must prepend the equations to the equations in (1) and proceed as in the third case.

The procedure just described is the essence of the Bour–Mayer approach to the solution of (1). One has to solve overdetermined systems of linear scalar PDEs and ensure that the equations one adds to the initial system are compatible with them and that the equations of the resulting systems are linearly independent. In our implementation of the Bour–Mayer approach, we complete the initial system of equations (1) by prepending to it the appropriate compatibility constraints prescribed by Jacobi–Mayer brackets until we obtain either a compatible or an incompatible system. Starting from compatibility constraints, we iteratively solve the compatible system obtained by using the built-in function . The remainder of this article is devoted to the implementation and testing of this approach.

Here we focus on the coding of the algorithm described in the introduction. Specifically, we start by iteratively solving a system of consistent first-order PDEs in one dependent variable. Then we implement the test of consistency of a system of first-order PDEs in one unknown. Finally, we couple the last two programs in such a way that a single function is used to compute the general solution of the input system when it exists or to indicate that it is inconsistent.

Our program for iteratively solving a compatible system of scalar first-order PDEs is made of the main function and three helper functions, , and .

is a recursive function that takes as input the system to be solved, the dependent variable , the list of independent variables , a container for the list of successive solutions, a list of equations that could not be solved, a string that is used as a root to form the names of intermediate dependent variables, and a variable that is used to count and name intermediate dependent variables. The output of is a list of rules and a list of unsolved equations.

The function mimics what one would do by hand when solving a system of first-order PDEs in one unknown: Solve an equation, substitute its solution into the remaining equations, and continue as long as possible. At each stage, the number of independent variables is reduced by one and it is necessary to rename the variables before proceeding. Also, the dependent variables are curried functions that must be undone to ensure that the chain rule is applied properly during substitution into the remaining PDEs. This is perhaps the trickiest part of our implementation.

The function takes the output of and converts it into the solution of the system to be solved. The helper function converts an expression depending on several variables into a pure function of these variables. Finally, the function composes and to solve a compatible system of scalar PDEs. Its inputs are like those of and its output is formatted like that of .

This subsection implements the compatibility test provided by the Bour–Mayer method as described in the introduction using . The input to is the underlying system of PDEs , the dependent variable and the list of independent variables ; outputs a pair: the first element indicates whether the system is compatible and the second element gives the completed system.

The function computes the pairwise Jacobi–Mayer brackets of a system of PDEs according to equation (3) and in these brackets replaces some first-order partial derivatives of the unknown function obtained from the underlying system of PDEs. The function checks whether an expression contains a derivative of the unknown function.

Here we use the functions defined so far to solve an overdetermined system of first-order PDEs in one unknown. The function takes as arguments the system to be solved, , and its dependent and independent variables, and . The function verifies whether a given rule gives a solution of a system of first-order PDEs in one unknown.

This subsection is chiefly concerned with examples taken from various specified sources. For convenience, warnings are suppressed with the built-in function . Undefined global variables (, , , etc.) are used, so make sure there are no conflicts from your own session.

The examples presented here arise in the search of differential invariants of hyperbolic PDEs [2].

Except for example 9, gives for all systems, so it is only shown once here.

Examples 5 and 6 come from Saltykow [7].

The two systems of PDEs treated here are in Mansion [4].

The second entry of shows that there are two PDEs that were not solved. It is straightforward to separate these PDEs with respect to and to obtain new PDEs that are easily solved using the built-in function . The separation can be done automatically through the following one-liner.

The last example is due to Boole [8].

This article has introduced and implemented an algorithm based on the Bour–Mayer method for solving an overdetermined system of PDEs in one unknown. We have demonstrated the efficiency of our approach through the consideration of 13 examples.

I gratefully acknowledge partial financial support from the DST-NRF Centre of Excellence in Mathematical and Statistical Sciences, School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa. I thank Prof. F. M. Mahomed for securing the necessary funds and his team for the hospitality during my visit last summer. This article is dedicated to my daughter Katlego on her sixteenth birthday.

[1] | P. E. Hydon, “How to Construct the Discrete Symmetries of Partial Differential Equations,” European Journal of Applied Mathematics, 11(5), 2000 pp. 515–527. |

[2] | I. K. Johnpillai, F. M. Mahomed and C. Wafo Soh, “Basis of Joint Invariants for () Linear Hyperbolic Equations,” Journal of Nonlinear Mathematical Physics, 9(Supplement 2), 2002pp. 49–59. doi:10.2991/jnmp.2002.9.s2.5. |

[3] | J. C. Ndogmo and P. Winternitz, “Generalized Casimir Operators of Solvable Lie Algebras with Abelian Nilradicals,” Journal of Physics A: Mathematical and General, 27(8), 1994pp. 2787–2800. iopscience.iop.org/article/10.1088/0305-4470/27/8/016/meta. |

[4] | P. Mansion, Théorie des équations aux dérivées partielles du premier ordre, Paris: Gauthier-Villars, 1875. |

[5] | B. Buchberger and F. Winkler, Gröbner Bases and Applications, Cambridge: Cambridge University Press, 1998. |

[6] | B. Kruglikov and V. Lychagin, “Compatibility, Multi-brackets and Integrability of Systems of PDEs,” Acta Applicandæ Mathematicæ, 109(1), 2010 pp. 151–196.doi:10.1007/s10440-009-9446-0. |

[7] | N. Saltykow, “Méthodes classiques d’intégration des équations aux dérivées partielles du premier ordre à une fonction inconnu,” Mémorial des sciences mathématiques, 50, 1931pp. 1–72. www.numdam.org/item?id=MSM_ 1931__ 50__ 1_ 0. |

[8] | G. Boole, “On Simultaneous Differential Equations of the First Order in Which the Number of the Variables Exceeds by More Than One the Number of the Equations,” Philosophical Transactions of the Royal Society of London, 152(5), 1862 pp. 437–454. doi:10.1098/rstl.1862.0023. |

C. W. Soh, “Symbolic Solutions of Simultaneous First-Order PDEs in One Unknown,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-2. |

Dr. C. Wafo Soh is currently an associate professor of mathematics at Jackson State University and a visiting associate professor of applied mathematics at the University of the Witwatersrand. He is the cofounder of the South African startup Recursive Thinking Consulting, which specializes in process mining.

**Célestin Wafo Soh**

*Department of Mathematics and Statistical Science
JSU Box 1760, Jackson State University
1400 JR Lynch Street
Jackson, MS 39217
*

*DST-NRF Centre of Excellence in Mathematical and Statistical Sciences
School of Computer Science and Applied Mathematics, University of the Witwatersrand
Johannesburg, Wits 2050, South Africa*

dx.doi.org/doi:10.3888/tmj.20-1

We simultaneously introduce effective techniques for solving Sudoku puzzles and explain how to implement them in Mathematica. The hardest puzzles require some guessing, and we include a simple backtracking technique that solves even the hardest puzzles. The programming skills required are kept at a minimum.

Sudoku, for those unfamiliar with this puzzle, consists of a square grid with nine subgrids. The 81 entries are to be filled with the integers 1 to 9 in such a way that each row, column and subgrid contains all the nine integers. Some of the entries are already chosen, and the final puzzle solution must contain these initial choices. Here is a sample puzzle.

The input for this puzzle is a list of nine lists consisting of blanks (shown as □) or integers between 1 and 9. A list of lists of the same length is regarded as a matrix in Mathematica, so we input for the puzzle and then show it in matrix form.

We can also display this in Sudoku format by drawing column and row lines and a frame.

In attempting a solution, a blank gets replaced with a list of candidate entries shown compactly without braces or commas.

Each element of a Sudoku matrix is obtainable as , where and are the row and column of the element in the matrix.

To obtain the entire row in `X` that contains the entry , we evaluate ; so, for example, this gives row 3, which contains element .

To obtain the column in `X` that contains is a little trickier. One could first “transpose” the matrix , that is, interchange rows and columns and then find row of the transposed matrix; the command for transposing a matrix is . However, Mathematica has a faster way using the option . We just enter to get column of `X`. For example, the column that contains , that is, column 5 of , can be obtained by entering .

The function displays that list vertically.

It is more difficult to obtain the block to which belongs. To do this we define a function that gives a list of the entries that comprise the block of in `X`.

For example, this is the block containing , the sixth entry in row 1, □.

To get a single list of these entries by removing the inner parentheses, we use .

Our first step is to replace each □ in (using ) by the list of the nine numbers, , which are possible candidates to occupy that position in .

Our next task is to start eliminating candidate values in the entries that are lists of numbers in `X`, proceeding one entry at a time. We start with in order to be able to redefine entries.

Since no entry can appear more than once in any row, column or block, we let be the set of integers in the row, column and block containing .

If the entry is a list rather than an integer, we redefine by removing the entries that also belong to .

Finally, if is a list of one element, we redefine it to be that element.

To apply again and again to until the result no longer changes, we use . The puzzle simplifies, but we see that we are still not done!

However, the first block has three entries (colored red) that are all sublists of .

While we do not know the exact value of any of the red entries, we know that the three numbers 5, 6 and 8 will be used up filling them; thus we can remove 5, 6 and 8 from the *other* entries in this block (colored green).

Similarly, in the first row, there are three entries that are sublists of , so we remove 5, 6 and 8 from at the end of row 1; this defines .

Then we use again and display the result. We are done! We explore this technique further in the next section.

If any row, column or block contains the pair twice, both and must be used up in the two entries containing , even though we do not know which pair contains and which contains . Hence, no other entry in that row, column or block can contain either or . This obvious fact is surprisingly useful in solving Sudoku puzzles.

To use it, we define the function .

1. We select the set of pairs (the lists of length two).

2. The twins are the identical pairs.

3. The numbers in the twins are the numbers to prune.

4. The lists are pruned.

5. Any singleton list is changed to its element.

Here are some examples using , which was defined before. The twins in this row are and .

Hence 5 and 8 are removed from the other lists in the row.

The twins in row 3 are and .

We can map over all the rows of a matrix.

We now use starting with until the result does not change.

We are done. It was only necessary to use on the rows.

It is easy to apply on the columns: transpose, apply to the rows, and transpose back.

The blocks are more complicated. We make use of a general theorem: transform an matrix by taking the elements of each block in order as the rows of a new matrix ; then (i.e. is an involution). Here stands for block transpose.

This verifies the theorem in the case.

To construct the new kind of transposed matrix, we define the function .

Here is the transformed matrix .

Finally, we look at the matrix .

It is the same as the original matrix ; here is a direct check that they are equal.

It is a common technique in problem solving to first transform the problem, solve the transformed problem and then transform back. As an example, we apply , followed by , followed by , to the matrix defined earlier; blocks , and change.

This makes a function out of that line of code.

The function puts together the discussion so far.

We apply it to the matrix and get a solution as before.

We generalize the function to to deal with triples and quadruplets as well as twins.

Just as with , we want to use on rows, columns and blocks of a matrix and then combine them in .

We had already solved with , but let us apply for triples as a check.

combines the three solvers into one.

Consider the puzzle .

Unfortunately, does not solve the puzzle.

However, there are entries that are pairs.

We propose to replace the pair by 1 and to try to solve the modified puzzle; if that leads to a contradiction, then .

We introduce the functions and via the helper function . If there are any pairs in `X`, replaces the first such pair with its left entry and applies ; the function replaces with its right entry .

The two blank entries indicate a contradiction.

Therefore, the alternative must solve the puzzle, and it does.

We have just seen that guessing between two alternatives quickly led us to a solution. However, if a solution was not obtained with the first alternative, it might be necessary to again guess between two alternatives, and so on. If there are always just two alternatives, this leads to a binary tree with the root at the top.

It is not clear how many levels or guesses are needed before reaching a solution. Also, it may not be necessary to generate the entire tree before a solution is reached. There is a systematic and efficient way to search such a tree, usually referred to as backtracking.

Here is the method: start at the root and go left as long as there is no contradiction. If there is a contradiction, go back one level and go right. Then resume going left as far as possible. If there is a contradiction after going right, go back through all the right branches traversed so far; then go back through an additional left branch and go right.

For example, assume that contradictions exist at all nodes on level 4 except for the last one, node 15. The labels in the following tree indicate how to backtrack to the solution at node 15.

The binary choice in the Sudoku situation is to go either left or right. A path through the tree corresponds to a sequence of such choices; for example, the path (1, 2, 6, 8) generates the sequence: , from which a composition of functions can be built.

Here is how the built-in function works with undefined functions.

This example shows a clear contradiction, since there are two blank entries.

The function , when given a nonempty sequence of and functions, drops the last in the sequence (if any) until there are none, drops the last , and finally appends a .

Here is an example.

In the next two functions, this kind of code tests for a list of lists of integers, a necessary condition for a Sudoku solution.

The function goes left if applying the sequence (a global variable with entries and ) to the matrix with does not contain an empty list ; otherwise it backtracks using . If the new sequence applied to the matrix contains only numbers, throws the matrix to the nearest containing .

We now use inside the function , which initializes the global variable .

Consider the following Sudoku puzzle.

We have failed so far to solve the puzzle using ; so we try the backtracking technique.

To see what sequence solved this puzzle, we only have to enter .

We next try on defined in the previous section.

If we now enter , we can see what sequence solved this puzzle.

Next is Evil Puzzle 8,076,199,743 from Web Sudoku [1].

Again, the solving sequence is given by .

This final puzzle was created by Arto Inkala, a mathematician based in Finland; it is claimed to be the world’s hardest Sudoku puzzle [2].

Here is how this puzzle was solved.

There are many other techniques known to experienced Sudoku solvers that could be added to our programs; also backtracking could obviously be extended to triples, and so on.

Sudoku provides a superb opportunity to introduce useful programming techniques to students of Mathematica. Backtracking is one such technique that is largely absent from standard discussions of Mathematica programming but, as we have shown, is easily implemented in Mathematica when needed.

[1] | Web Sudoku. (Jan 18, 2018) www.websudoku.com. |

[2] | Efamol. “Introducing the World’s Hardest Sudoku.” (Jan 18, 2018) www.efamol.com/efamol-news/news-item.php?id=43. |

R. Cowen, “A Beginner’s Guide to Solving Sudoku Puzzles by Computer,” The Mathematica Journal, 2018. dx.doi.org/doi:10.3888/tmj.20-1. |

Robert Cowen is Professor Emeritus in Mathematics, Queens College, CUNY. He does research in logic, combinatorics and set theory. He taught a course in Mathematica programming for many years, emphasizing discovery of mathematics, and is currently working on a text on learning Mathematica through discovery with John Kennedy. His website is sites.google.com/site/robertcowen.

**Robert Cowen***16422 75th AvenueFresh Meadows, NY 11366*

Rubik’s cube has a natural extension to four-dimensional space. This article constructs the basic concepts of the puzzle and implements it in a program. The well-known three-dimensional Rubik’s cube consists of 27 unit subcubes. Each face of determines a set of nine subcubes that have a face in the same plane as . The set can be rotated around the normal through the center of . Rubik’s 4-cube (or 4D hypercube) consists of 81 unit 4-subcubes, each containing eight 3D subcubes. Each 3-face of determines a set of 27 4-subcubes that have a cube in the same hyperplane as . The set can be rotated around the normal (a plane) through the center of . Projecting the whole 4D configuration to 3D exhibits Rubik’s 4-cube as a four-dimensional extension of Rubik’s cube. Starting from a random coloring of the 4-cube, the goal of the puzzle is to return to the initial coloring of the 3-faces.

To understand the 4D hypercube, it helps to first see how its lower-dimensional analogs relate to each other. The zero-dimensional hypercube (or 0-cube) is a point, with one vertex. The 1D hypercube (or 1-cube) is a segment, with two vertices and one edge. The 2D hypercube (or 2-cube) is a square, with four vertices, four edges and one face (the square including its interior). The 3D hypercube is a cube (or 3-cube), with eight vertices, edges, six square faces and one volume. Going up a dimension doubles the number of vertices. More generally, the number of -cubes (points, segments, squares, …) in an -cube, , is .

The 3D cube can be represented in a 2D plane using central projection, defined by taking the intersection of the plane with the line joining the two points and . This projection maps the point to . Choose to obtain the projection shown on the right in

Figure 1. Five of the faces overlap with a sixth face, the price to pay for the loss of one dimension.

**Figure 1.** A cube and its image under a central projection.

Overall, the 4D Rubik puzzle is a 4-cube [1] (or 4D hypercube or tesseract), with vertices, edges, squares, eight cubes and one 4-cube. The eight cubes are called *cells*, which are like the six square faces of a 3D cube. The proper faces of the 4-cube are its vertices, edges, squares and cells.

Each point of a proper face is on the 3D hypersurface of the 4-cube. No point of a proper face is strictly in the interior of the 4-cube; that is, a hypersphere at such a point contains points inside and points outside the 4-cube. In particular, no interior point of a cell as a 3D object is in the interior of the hypercube; all the points of a cell are on the boundary of the 4-cube.

The 16 vertices of a 4-cube can be defined as lists of length four of all possible combinations of

and .

The 24 squares of the 4-cube are described in terms of their vertex indices.

Besides the 4-cube, there are five other regular polytopes in four dimensions. The .csv and .m files containing information for these polytopes are provided at [2]: the positions of the vertices, vertex indices for the proper faces and which faces are neighbors.

To display the 4-cube in 3D, central projection from 4D to 3D is analogous to central projection from 3D to 2D; the function is the natural extension of ; see Figure 2.

**Figure 2.** Projected image of a 4-cube by means of center projection. The larger outer cube is one of the cells of the 4-cube.

An axis of rotation in 3D is a fixed line. In 4D, an axis of rotation in four dimensions is a fixed plane [3]. For example, the rotation matrix about the - plane is defined by:

(1) |

There are six planes of rotation spanned by pairs of coordinate axes, namely -, -, -, -, -, -.

Here is the first one, for example, which leaves points in the - plane fixed.

This animation shows two successive rotations of the 4-cube projected to 3D.

Consider a 4-cube with center at the origin , side length 3, and with all proper faces of positive dimension parallel to the coordinate axes. Then its 16 vertices are:

(2) |

The eight cells of the initial 4-cube are colored differently. The word “initial” means that no rotations have been applied. The coloring touches every point of a cell, including its 3D interior points.

Just as the faces of Rubik’s cube are divided into nine squares by dividing each edge into three, the edges of Rubik’s 4-cube are also divided into three. Then the initial 4-cube is divided into small 4-cubes, each with edge length 1. The boundary (a hypersurface) of a small 4-cube contains eight small cubes, its cells.

The Rubik 3-cube has 27 subcubes in ; no square of the center cube is colored and some squares of other cubes are colored. These 26 subcubes are classified into three types according to whether they are at a corner, at an edge or at the center of a face of the larger cube. Figure 3 shows one of each type.

**Figure 3.** Three types of small cubes: in the center of a square face, at an edge and at a vertex, with one, two or three colored squares.

Analogously, the 81 small 4-cubes of include the uncolored one at the center and 80 partially colored small 4-cubes. These are classified into four types according to the dimension of their intersection with . The type of a small 4-cube does not change after rotation. Table 1 summarizes the numbers for each type for and .

cube or cell.

The number of colored small squares for Rubik’s cube is calculated using the data in Table 1:

Another way is to count the number of faces times the number of squares per face: .

The small 4-cubes with nonzero coordinates form the hypersurface of . In particular, a small 4-cube with center given by four nonzero coordinates contains a vertex of . Again from Table 1, the number of colored small cells is:

This number can also be obtained as the number of cells of times the number of small cells per cell of : .

We define several global variables to be used here and later. Figure 4 shows the divided 4-cube with 216 colored small cells.

**Figure 4.** Center projection of a hypercube consisting of 216 colored small cells.

Each edge is divided into three parts, so that the length of the 4-subcubes is 1. Consider a 4-subcube with center at . Then, the vertices of the 4-subcube are . The coordinates of the center of each 4-subcube are a combination of one of , 0 and . When the value is nonzero ( or ), the 4-subcubes face outward in the corresponding directions. In other words, the nonzero values in coordinates denote the outward-facing 4-subcubes.

For the initial state, the colors of the cells are set according to the coordinates of their centers:

For example, in the small 4-cube with center , the two small cells with vertices and are colored because both the and coordinate values are nonzero.

The geometry of the 216 small colored cells is used to manage the puzzle. Each element of the datasets consists of four elements: (1) the vertices of six squares; (2) the location of the center of the small 4-cube to which the small cell belongs; (3) color; and (4) the location of the center of the small cell. The vertices of the six squares are used for drawing the subcubes, and the locations of the centers of the subcubes are used to judge the completeness of the puzzle. The dataset of the initial state is obtained by the following procedures. First, the vertex numbers of the squares making up each small cell are defined.

Next, the 216 small cells are selected by checking all possible small cells.

This list contains 216 entries and each entry contains four components corresponding to a small cell.

For example, here is entry 123 of . The components for this small cell are its six square faces, center, color and current position.

sets up a 4-cube for drawing.

Here is an example.

sets up a cell (with 27 4-cubes) for drawing.

Figure 5 shows the initial state of Rubik’s 4-cube.

**Figure 5.** Center projection of initial state of the Rubik 4-cube with its eight cells.

In the 3D case of Rubik’s cube , a *block* is a set of nine small cubes whose centers have one coordinate that is constant, , 0 or 1. There are nine blocks, three per coordinate axis. A natural technique to rotate a middle block, for example, the one cut by the plane by , is to rotate the block above by , the block below by , and then the whole cube by .

In the 4D case, a *block* is a set of 27 small 4-cubes whose centers have one coordinate that is constant. There are 12 blocks, four per axis and three per choice of constant coordinate , 0 or 1. Under a rotation, the small 4-cubes in a block change position simultaneously. Each block is a four-dimensional *hyperprism* with height 1.

Figure 6 shows an example of the block ; the cell opposite the orange cell is not colored.

**Figure 6.** Example of a block of 27 small 4-cubes (orange, ).

A block is rotated by , or around an axis, which is a fixed plane. Therefore, the information needed for an action on the Rubik 4-cube is (1) the block to be rotated; (2) the axis of rotation; and (3) the angle. For Rubik’s cube , the axis of rotation is automatically determined by selecting a block. But for , two coordinate axes must be chosen to determine the fixed plane. One is the constant coordinate axis used to select the block, and the other must be chosen from the remaining three coordinate axes. There are 108 possible actions on : 12 choices of block, three choices for the second coordinate axis and three choices of angle: . Therefore, 108 buttons are required for the rotations in the Rubik 4-cube computer program. Table 2 lists the properties of Rubik’s cube and Rubik’s 4-cube.

The program to realize Rubik’s 4-cube in 3D relies on central projection of a hypercube and rotation matrices in 4D. The program is shown in the next section.

Implementing an interface consists of three parts: constructing the buttons for the rotations, displaying the current state and judging whether the puzzle is complete.

The buttons for the rotations are placed in grids. The player can rotate a block by clicking one of the buttons. The rows correspond to the selection of the axis of the coordinates of the block and the columns correspond to the coordinate values for that axis. The player can select a block by choosing one of the rows and one of the columns. For example, clicking the button where row crosses column chooses the block on . For each block, the other three axes are listed. Then, the selection of the second axis is required to verify the rotational plane. Finally, one of the three buttons (up, diagonal and down) must be chosen to determine the rotation angle of (▲), (■) and (▼). (The 0 rows can be ignored—the player can perform an equivalent pair of actions instead in the parallel blocks.)

When the colors of the 27 subcubes on a cell are all the same, that cell is complete. The puzzle is solved when all the cells are complete.

Although we succeeded in implementing Rubik’s 4-cube, some problems remain to be addressed. We aimed for ease of implementation rather than efficiency. Therefore, in the future, we should consider enhancing the application to get a more effective visualization method and an intuitive interface.

The program redraws 1,296 squares after each rotation, so that efficient coding is important. There is a great deal of redundancy in calculating the vertices of the 4-subcubes for each rotation. The most effective method for handling the vertices using subcubes or 4-subcubes remains to be clarified. Note that we must transfer the vertices of the 4-subcubes to those of the subcubes when we handle the vertices of 4-subcubes as a dataset rather than handling them as subcubes.

Effective visualization is a common problem for four-dimensional geometry. In this article, we used central projection to represent 4-cubes. However, the proposed projection does not completely represent the features of the puzzle. Although there are other projections for representing a 4-cube, the most suitable method is not yet clear.

Another possible improvement would be to animate the rotation. The animation of the rotation of the colored small 4-cubes would help the player intuitively understand their rearrangement.

An intuitive interface is important for playing this puzzle. The interface and visualization issues are related, and their development may provide a new method for understanding four-dimensional space.

[1] | H. S. M. Coxeter, Introduction to Geometry, 2nd ed., Hoboken: Wiley, 1989. |

[2] | T. Yoshino. “Activities of Dr. Takashi Yoshino.” (Dec 11, 2017) takashiyoshino.random-walk.org/basic-data-of-4d-regular-polytopes. |

[3] | K. Miyazaki, M. Ishii and S. Yamaguchi, Science of Higher-Dimensional Shape and Symmetry, Kyoto: Kyoto University Press, 2005 (In Japanese). |

T. Yoshino, “Rubik’s 4-Cube,” The Mathematica Journal, 2017. dx.doi.org/doi:10.3888/tmj.19-8. |

Profession: Science of Form. Fields of interest: skeletal structure of plankton, non-Euclidean geometry, hyperspace, pattern formation.

**Takashi Yoshino**

*Toyo University,
Kujirai 2100, Kawagoe, 350-8585, JAPAN
*

Given a finite vertex set, one can construct every connected spanning hypergraph by first choosing a spanning hypertree, then choosing a blob on each of its edges.

If is a finite vertex set and is a collection of finite subsets (called edges), none of which is a subset of another, we recursively define the *swell* of , , to be the collection of all sets that either:

- belong to
- are the union of some pair of
*overlapping*sets, both already belonging to

For example, if

then

If we also have , then the set system is called a *clutter*. This condition means that (except in the case ) each edge contains at least two vertices, and the hypergraph spanned by the edge set is connected. Here the hypergraph spanned by a set of edges is defined to have vertex set and edge set . (There is no agreed-upon definition of “hypergraph.” For some authors it is any set system; for others it is a simplicial complex; for others it is an antichain of sets.)

Here is a larger clutter.

The number of clutters spanning vertices is given by A048143 (oeis.org/A048143),

This sequence varies as , so the number of digits required roughly doubles with each consecutive term. Our main example is just one of some 56 sextillion members of .

This normalizing function is a universal invariant for the species of labeled clutters, meaning two clutters are isomorphic iff they have the same image.

Here is a list of nonisomorphic representatives for all clutters with up to four vertices, corresponding to “unlabeled” clutters. This brute-force enumeration may not work for .

A *kernel* of is a clutter (the restriction of to edges that are subsets of ) for some .

Define .

A *set partition* is a set of disjoint sets with .

Suppose is a set partition of such that each block is a kernel of (i.e. and ). Since would imply , it follows that the set of unions is itself a clutter, which we call a *cap* of .

Equivalently, a cap of is a clutter satisfying both:

- every edge of is a subset of exactly one edge of

To see that this does *not* establish a partial order of clutters with a vertex set, observe that

is a nontransitive chain of caps. The following is the set of all set partitions of the edge set indices corresponding to each cap of the clutter.

In these plots of clutter partitions, the filled squares correspond to all pairs of a vertex and an edge such that the vertex belongs to the edge; these squares are then shaded according to which block of the partition the edge belongs to.

The *density* of a clutter is

where the sum is over all edges .

A clutter with two or more edges is a *tree* iff . This is equivalent to the usual definition of a spanning hypertree [1].

A clutter is a *blob* iff no cap of is a tree.

The trees and blobs among the caps and kernels (respectively) of our running example are as follows.

Suppose a clutter decomposes into a cap and corresponding set of kernels . Then

where the sum is over all . In particular, , and iff every is a tree. Using this simple identity, one easily proves the following.

**Lemma**

The following is also straightforward.

**Proposition**

We now come to the main result.

For our running example, the theorem corresponds to the following decomposition into a tree of blobs.

**Theorem**

**Proof**

First we show that any blob (kernel) is contained within a single branch of any tree (cap). Suppose that is a kernel of and is a blob, and that is a cap of that is a tree. Let be the subtree of contributing to the set partition of *non-empty* intersections for each branch . The set of unions forms a clutter that is obtained from by deleting in turn all vertices not in , a process that weakly decreases density. Let be the set partition comprised of maximal kernels (i.e. connected components) contained in blocks of . Then is a cap of and . Since is a connected clutter, we have , and therefore . But since is a blob, cannot be a tree, hence it must be a maximal cap (viz. , ).

Next we show that . If any two blobs overlap, both blobs must be contained entirely in whatever branch (of any given tree) contains their intersection. This implies that there is another blob containing their union, and hence that the maximal blobs are disjoint. Since every singleton is also a blob, we conclude that is a cap of .

Finally, if any kernel of were a blob, so would be the restriction of to its union, contradicting maximality of . This proves that the set of unions of is a tree. ■

The following are the decompositions for each nonisomorphic clutter with four vertices.

Let be the set of all kernels of . If is itself a (connected) clutter with vertex set , then there exists a unique subset-minimal upper bound satisfying both

- for all

In general, we can only define uniquely for a *connected* set of kernels, so is not strictly a join operation for the poset of subsets . But if is not connected as a clutter, then letting be its (maximal) connected components, we say that is a *connected set of kernels* iff is connected as a set of kernels, in which case the join is given by

In practice, the verification of connectedness and the computation of may require several iterations constructing joins of connected components. For example, consider the connected set of kernels.

It has the following sequence of joins of connected components.

**One Problem**

If is a connected set of kernels, we define its *compression* to be the number of iterations in the computation of by constructing consecutive joins of connected components. For the previous example, we have . Although it seems unlikely that is a bounded invariant, we do not know how to construct an example with compression greater than .

- For which positive numbers does there exist a connected set of kernels such that ?
- Does there exist an infinite chain of connected sets of kernels such that for all ?

Define an invariant by

for all , where the sum is over all clutter partitions . Here denotes the indicator function for a proposition , equal to 1 or 0 depending on whether is true or false, respectively.

**Theorem**

**Proof**

Let be the set of all clutter partitions . What we have essentially shown above is that , regarded as a subposet of the lattice of set partitions ordered by refinement, is a lattice. We have the simple enumerative identity

where the product is over all non-singleton kernels , here regarded as elements of whose only non-singleton block is . Expanding the right-hand side gives

where the sum is over all sets of non-singleton kernels , again regarded as lattice elements. Here is algorithmically the same operation as the connected-join operation on . Expanding and factoring accordingly, this becomes

where the outer sum is over all , the product is over all , and the inner sum is over all connected sets of kernels spanning . For any kernel , define

where the outer sum is over all clutter partitions , and where the product and inner sum are as before. Letting be the set partition of whose only non-singleton block is , we have shown that

Hence our theorized expansion does indeed satisfy the defining identity of . ■

Note that it is sufficient in the preceding theorem and proof to consider only connected sets of *subset-minimal* non-singleton kernels, and it is often practical to do so. The hypergraph whose edges are minimal non-singleton kernels is also of some interest. The well-known Möbius function of a hypergraph is defined on the lattice of connected set partitions, and in this context an element of may be called a *pseudo-kernel*. In comparison, however, our invariant , which is defined on essentially all clutters, seems to be more interesting; we do not know if it has been studied before.

A *semi-clutter* is any anti-chain of subsets . For each finite set , let be the set of semi-clutters spanning . A *species* [2] is an endofunctor on the category of finite sets and bijections, so here we have defined a species of semi-clutters. The compound semi-clutter of a decomposition , as defined by Billera [3], is obtained as a disjoint “sum” of Cartesian “products.” Interpreted in the language of species theory, this is a certain natural transformation

where denotes the composition operation on species, a generalization of composition of exponential formal power series. Let be the set of (connected) clutters spanning , let be the set containing only the maximal clutter on , and let be the set of clutters having no expression as a compound of a proper decomposition (i.e. is the species of “prime” clutters). Billera’s main theorem (attributed to Shapley) establishes a unique reduced compound representation, which is itself a species of decompositions

From this it is evident that can ultimately be reduced to a nested compound expression using only trivial and prime clutters. Hence the problem of enumerating semi-clutters on a vertex set is reduced to the problem of constructing, for any connected clutter, its “maximal proper committees,” which is the nontrivial solution of [2] for the enumeration of prime clutters considered. This is a particularly interesting application of formal species.

[1] | R. Bacher, “On the Enumeration of Labelled Hypertrees and of Labelled Bipartite Trees.” arxiv.org/abs/1102.2708. |

[2] | A. Joyal, “Une théorie combinatoire des séries formelles,” Advances in Mathematics, 42(1), 1981 pp. 1–82. doi:10.1016/0001-8708(81)90052-9. |

[3] | L. J. Billera, “On the Composition and Decomposition of Clutters,” Journal of Combinatorial Theory, Series B, 11(3), 1971 pp. 234–245. doi:10.1016/0095-8956(71)90033-5. |

G. Wiseman, “Every Clutter Is a Tree of Blobs,” The Mathematica Journal, 2017. dx.doi.org/doi:10.3888/tmj.19-7. |

The author is a former graduate student of pure mathematics at the University of California, Davis. He is interested in categorical technology applied to discrete mathematics.

**Gus Wiseman**

*gus@nafindix.com*