The Mathematica® Journal

Michael Trott

In the middle of the last century, J.J. Sylvester tried to calculate the expectation value of the convex hull of randomly chosen points in a plane square. As he discovered, for the problem is trivial. For the question is relatively easy to answer using *Mathematica*:

But starting for the problem becomes extremely difficult due to the multiple integrals to be evaluated. In 1885, M. W. Crofton came up with an ingenious idea to solve special cases of this problem. His formulae are today called Crofton's theorem. At that time he remarked:

The intricacy and difficulty to be encountered in dealing with such multiple integrals and their limits is so great that little success could be expected in attacking such questions directly by this method [direct integration]; and most what has been done in the matter consists in turning the difficulty by various considerations, and arriving at the result by evading or simplifying the integration.

The general setting of the problem is to calculate the expectation value of the -dimensional volume of the convex hull of points in dimensions; for instance, the volume of a random tetrahedron formed by four randomly choosen points in (an unsolved problem).

In the following we will show that, using the stunning integration capabilities of *Mathematica*, it is now possible to tackle such problems directly.

In the following, let the plane polygon be a unit square, and let us calculate the expectation value of the area of a random triangle within the unit square.

Here is a sketch of the situation at hand:

Let , , and be the vertices of the triangle. Then the area of the triangle is given by the absolute value of the determinant:

This means the integral to be calculated is:

The determinant changes sign, depending on the orientation of the triangle, and there are two possible ways to deal with this issue. The first would be to use as the integrand, and the second would be to divide the six-dimensional integration region into subregions, such that inside each the expression does not change sign. Because the first approach soon leads to intractable integrals, we will follow this second approach here.

This is the expression for the area (still containing the sign ambiguity):

We must now recursively (with respect to each coordinate ; ; ) determine the boundaries where the expression ÷p and its "lifted" expressions (lifted in the sense that a few coordinate values are viewed as fixed) change sign. Furthermore, we must take into account that all coordinates are restricted to the interval ; this condition also restricts the ranges of coordinates other than the current one. A sign change for an expression containing ÷p or its lifted variants can occur if the numerator of the expression is zero or the denominator has a pole of odd order. The function calculates all potential new points of sign changes of the expression sols with respect to the variable x.

To divide the volume of integration--the six dimensional unit cube--uniquely into subregions of constant sign, we also must take care about the points where two expressions become equal. The function calculates all potential new points of sign changes of the expression sols with respect to the variable x.

For each of the coordinates ; ; , we now have to look for all points calculated by the routines and to get all points of a potential sign change.

Starting now with the expression ÷p and the coordinate and calculating all internal points of potential sign changes, we get the following.

Adding the interval boundaries 0 and 1 to the points of potential sign changes (as well as the iterator variable the intervals refer to) shows the following hyperplane divisions of our six-dimensional integration region:

Next we must sort the points of according to their position between 0 and 1. To do this, starting with the two intervals and , we calculate recursively one rational internal point of the intervals. The function sort sorts the points points taking into account the already calculated rational interval representatives. This sorting is necessary to divide each hyperplane into succesive stripes and to make sure that we take into account only those intervals which are inside the interval .

The function lift calculates the sorted intervals of the coordinate x with the potential sign-changing points points, using the already calculated intervals oldIntervals and the representatives oldRealizations of these intervals.

Now we repeatedly apply the function lift, starting with the intervals and we obtain a complete subdivision of the integration space together with sample points of the regions.

We have 496 regions all together.

Here the first and last one are explicitly shown. The regions are not bounded by hyperplanes but generically have curved bounding surfaces:

Inside each of these 496 regions ÷p does not change sign.

Now, inside each of the 496 six-dimensional polytopes we have to integrate ÷p. ÷p itself is only a (multi-)linear polynomial in each of the six (integration) variables ; ; which at first might give the wrong impression that the integration is easy. However, because the limits of the integrations are themselves rational functions of the outer integration variables, the integrals we have to deal with typically will be quite complicated and will contain rational functions, logarithms, and polylogarithms in the result. (See the example below.)

We will not use *Mathematica'*s definite integration capabilities because they are "too advanced" for our purposes. I mean "too advanced" in the following sense: if a definite integral contains parameters, *Mathematica* tries to determine if the integral exists for all parameters and if the integrand contains singular points. Because the integrals involved possess up to five parameters (the outer integration variables) this is quite a stunning task. On the other hand, by the construction of the integration regions regions and by the nature of ÷p, we do not have to worry about any of these problems--inside each of the integration regions the integrand is a smooth, not sign-changing, function. So we use the much faster indefinite integration capabilities of *Mathematica* and calculate the definite integrals from the indefinite ones. If we are lucky, we just have to substitute the integration limits into the indefinite integral to obtain the definite integral. But sometimes this process may yield indeterminate expressions. In such a situation we must be more careful and must calculate the needed limits by using a series expansion around the integration endpoints.

Now let us implement the aboves described integration: the function multiIntegrate is our own multidimensional definite integration function, and myIntegrate is our one-dimensional integrator.

The function limitsNeededQ tests to see if limits are needed for substituting the integration limits.

To speed up the indefinite integration and the calculation of the limits, we apply some transformation rules implemented in ÷‚ to the expressions of interest. Because we know that the integrals we are calculating are real quantities, we do not have to worry about branch cut problems associated with the logarithm function, and we drop all imaginary parts at the end. (Again, because this fact is not known to the built-in indefinite integration routines, this transformation speeds up the calculation considerably.)

The function myIntegrate is finally doing the univariate indefinite integration. myIntegrate first calculates the indefinite integral, then simplifies the result. After this the integration limits are substituted. If the above procedure was successful, our work is done. If our process failed (if Indeterminate was returned or some flavor of DirectedInfinity turned up in the result) we calculate explicit limit values.

Now everything is in place for us to actually carry out the integration. To get an idea of the complexity of the expressions involved, let us look at the individual integration results of the first region. The indefinite integrals are typically quite a bit larger than the definite ones shown in the following results, and for some of the 496 regions the intermediate indefinite integrals can be as large as 50 KB. (In the following we suppress some of the messages generated when direct substitution of the limits fails.)

So we have as the contribution from the first region

Using *Mathematica'*s numerical integration capabilities, we can calculate an approximate value of this integral, too.

This value agrees reasonably well with the symbolic result.

As a test of our integration routine, we can calculate the volume of all 496 individual regions. In total, they have the volume 1 of the unit cube.

Finally, we can now carry out all 496 six-dimensional integrations to obtain the expectation value (11/144) for the area of the triangle. Be aware that the following calculation will take about 5 hours on a 200Mhz Pentium Pro.

Again we can use NIntegrate to numerically check the result.

V. S. Alagar, *Journal of Applied Probability ***14 **(1977) p. 284.

C. Buchta in A. Dold, B. Eckmann (eds.), *Zahlentheoretische Analysis*, Springer, Berlin, 1983.

C. Buchta, *Journal for Reine und Angewandbe* **347 **(1984) p. 212.

N. Henze, *Journal of Applied Probability* **20 **(1983) p. 111.

V. Klee, *American Mathematical Monthly* **76 **(1969) p. 286.

R. Pfiffe, *Math. Magazine* **62 **(1989) p. 309.

Converted by