The Combinatorics of Distributions « The Mathematica Journal

Bernd Günther

The Combinatorics of Distributions
NB CDF PDF

Distributions, which are the various ways of distributing a certain number of objects of different classes among a collection of targets, have been the subject of combinatorial investigations since MacMahon’s 1917 monograph. In this paper we apply them to a simulation of superimposed random coding. Furthermore, asymptotic estimates are provided using logarithmic polynomials (related to the well-known Bell polynomials) for symbolic and numeric calculation.

Introduction

We denote by the number of ways classes of indistinguishable objects can be distributed among different boxes, leaving no box empty and such that no box contains more than one object from the same class. In MacMahon’s terms [1] (cf. also [2]) we speak of the distribution of objects of specification into boxes of specification .

Relaxing the requirement that no box should remain empty gives distributions. Since each subcollection consisting of boxes has permissible distributions, we obtain

(1)

Equation (1) can be readily solved:

(2)

which holds for with for .

It is convenient to extend the definition to the case by setting , where, of course, we tacitly assume .

For example, with , , , there are 12 admissible distributions.

Applications of distributions are abundant. Markov [3] considered a lottery performing regular drawings of out of lots, and asked for the probability that the number of lots that have shown up at least once after drawings equals . Here the lots take over the role of the boxes, which are occupied by an object of class if it is drawn in the round. Hence

(3)

Superimposed Random Coding [4, 5] is an application to coding theory. Consider a collection of randomly generated binary code words of length and weight (i.e., the number of 1-bits equals ). A set of code words can be combined by superposition , so that contains a 1-bit in all those places where at least one of the has one. Notice that has weight with probability as in (3). For “decoding,” we tentatively assume that the code word is present if its 1-bits are among those of . This way we will certainly not miss any correct matches but may encounter a few false ones; the error probability must be investigated.

In practice we are testing for the presence of a set of code words simultaneously by checking if its superposition is covered by . is of weight with probability

(4)

Since there are pairs of subsets of size and (the first one contained in the second), the probability of a purely accidental match evaluates to

(5)

More details and a simulation are given in the supplemental notebook, which is available from content.wolfram.com/sites/19/2011/07/CombDistrib3Suppl.nb.

Asymptotics

Even for moderate values of , , and , the number of distributions increases very quickly and soon exceeds the possibilities of ordinary 64-bit arithmetic. Fortunately, an asymptotic formula is available.

Theorem 1

such that the ratio

remains constant with

and if

either remains constant or also

, but

for suitable

, then

(6)

where the new parameter

is defined by

(7)

Sketch of Proof

The number of distributions can be computed by induction on as

(8)

which readily leads to

(9)

We resolve the multinomial coefficient as a product of factorials, apply Stirling’s formula, and approximate the sum by an integral over the variables :

(10)

(11)

(12)

The function has a global maximum in the interior of the integration domain at with chosen as in the theorem. Under such circumstances, behaves like a Dirac delta distribution as :

(13)

(14)

where denotes the absolute value of the Hessian of at the location of the maximum. The validation of our estimates requires the interchange of limit operations at several instances, which can be justified if . More details of the proof cannot be given within the present scope. We recommend [6] as general reference for asymptotics.

Notice that the maximal possible value corresponding to , where every object is assigned to a different box, is not covered by the theorem.

We are interested in the limit case , where the number of classes used to typify our objects is very large; imagining that the different classes are distributed one after another ( can be interpreted as the time parameter of a Markov chain), this means that we want to anticipate the far future. However, the case , though included in the theorem, is not handled very conveniently, most notably because of the necessity of solving the -degree polynomial equation (7); some other deficiencies are described in the supplemental notebook.

Setting and in equation (7) is equivalent to

(15)

which is an analytic equation, since the singularity at is inessential. By the implicit function theorem can be expressed as a power series in , and we are going to compute the coefficients. The logarithmic polynomials will be required as tools, for whose introduction we make a small digression.

Logarithmic Polynomials

We recall that the logarithmic polynomials are defined by

(16)

where denotes the Bell polynomials familiar from Faà di Bruno’s formula [7]. Observe that is a polynomial in indeterminates with integer coefficients. Their sequence satisfies the condition

(17)

The following implementation is basically an adaption of [8], exercise 16b (we are indebted to the referee).

Here are the first five log polynomials.

Substituting for in (17) one easily derives

(18)

thus the logarithmic polynomials are not homogeneous but of uniform weight, where the variable counts with weight . Furthermore, taking partial derivatives with respect to the highest argument,

(19)

which means that depends additively on the last variable:

(20)

Asymptotics, Resumed

Thus equipped, we set out to solve equation (15) for . The constant coefficient can be obtained from (15) by taking limits:

(21)

It will be convenient to introduce , , as a new independent parameter, so that

(22)

This puts us in a position to put (15) into a form where (17) can be applied, namely where the constant coefficients of the involved power series equal 1:

(23)

By comparing coefficients we obtain

(24)

It will be seen presently that this determines as a rational expression in and ; we observe that these two quantities, transcendentally related by (22), are algebraically independent. To factor out the denominator, we set with . Observing (18), we obtain

(25)

Now from (20):

(26)

By uniformity of weight (18), the denominators cancel, thus revealing for as a polynomial.

Here are the first three cases of .

We can now check the correctness of our coefficients by inserting them into equation (15).

This is 0 because of (22). Let us summarize the results achieved so far.

Lemma 1

The unique solution

of the equation

is given by the power series

(27)

with

determined from

and with the polynomials

determined inductively by (26).

The rest now is fairly standard. We observe that in (6),

(28)

and we can easily compute a power series expansion of the expression from the one of .

The first two coefficients are as follows.

This provides us with complete control over the numerator of (6). To handle the denominator we observe and therefore, from (27), . We also observe , because depends implicitly on . In consequence,

(29)

In using this limit approximation we accept a slight penalty of accuracy in favor of a simpler formula. Assembling all parts, we have proved the following theorem.

Theorem 2

such that the ratio

remains constant with

and

for suitable

, then

(30)

with

determined from

and with the power series coefficients

as above.

Since is not supposed to grow faster than , at least the first five terms of the power series in (30) will be significant, but higher terms may be negligible depending on the actual growth rate.

The derivation of error bounds for the asymptotic relation (30) is beyond our present scope. A numerical example showing that the approximation is rather good is contained in the supplemental notebook.

References

[1]	P. A. MacMahon, Combinatory Analysis, 3rd ed. reprint, Providence, RI: AMS Chelsea, 2001.
[2]	J. Riordan, Introduction to Combinatorial Analysis, Mineola, NY: Dover Publications, 2002.
[3]	A. A. Markov, Wahrscheinlichkeitsrechnung, Leipzig, Berlin: Teubner, 1912. catalog.hathitrust.org/Record/012107751.
[4]	B. Günther, “On the Probability Distribution of Superimposed Random Codes,” IEEE Transactions on Information Theory, 54(7), 2008 pp. 3206-3210. doi:10.1109/TIT.2008.924658.
[5]	C. S. Roberts, “Partial-Match Retrieval via the Method of Superimposed Codes,” Proceedings of the IEEE, 67(12), 1979 pp. 1624-1642. ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1455812.
[6]	N. G. de Bruijn, Asymptotic Methods in Analysis, Amsterdam: North-Holland Publishing Co.; New York: Interscience Publishers, 1958.
[7]	L. Comtet, Advanced Combinatorics—The Art of Finite and Infinite Expansions, rev. and enl. ed. (J. W. Nienhuys, trans.), Dordrecht, Boston: D. Reidel Publishing Company, 1974 p. 140.
[8]	M. Trott, The Mathematica GuideBook for Numerics, Berlin: Springer, 2005.
B. Günther, “The Combinatorics of Distributions,” The Mathematica Journal, 2011. dx.doi.org/doi:10.3888/tmj.13-10.

About the Author

The author obtained his Ph.D. in mathematics at the University of Frankfurt, Germany in 1989 and has lectured and done research in Frankfurt and Seattle, Washington. Since then he has worked for Oracle, the Beilstein Institute, and currently for Deutsche Bahn AG. He has published several papers in pure and applied mathematics.

Bernd Günther
DB-Systel GmbH
Helpertseestrasse 21
63165 Muehlheim
Germany
dr.bernd.guenther@t-online.de

« Previous

The Combinatorics of Distributions NB CDF PDF