The Mathematica Journal
Departments
Feature Articles
Columns
New Products
New Publications
Classifieds
Calendar
News Bulletins
Mailbox
Letters
Write Us
About the Journal
Staff and Contributors
Submissions
Subscriptions
Advertising
Back Issues
Home
Download this Issue

Poisson Distribution

Tomás Garza  (tgarza01@prodigy.net.mx) writes: In TMJ 6(3):16--17 random numbers with a Poisson distribution are generated using

[Graphics:../Images/index_gr_50.gif]
[Graphics:../Images/index_gr_51.gif]
[Graphics:../Images/index_gr_52.gif]

However, this is rather slow. We can use RandomArray.

[Graphics:../Images/index_gr_53.gif]
[Graphics:../Images/index_gr_54.gif]

RandomArray is useful because, for certain distributions, it is more efficient to produce a whole array of random values. However, trying

[Graphics:../Images/index_gr_55.gif]
[Graphics:../Images/index_gr_56.gif]

we find no speed improvement. Using Random directly is much faster. Consider producing exponentially distributed random numbers with unit mean.

[Graphics:../Images/index_gr_57.gif]
[Graphics:../Images/index_gr_58.gif]

Here RandomArray is much faster.

[Graphics:../Images/index_gr_59.gif]
[Graphics:../Images/index_gr_60.gif]

Alternatively, for the exponential distribution, the inverse cumulative distribution function (CDF) is

[Graphics:../Images/index_gr_61.gif]
[Graphics:../Images/index_gr_62.gif]
[Graphics:../Images/index_gr_63.gif]

Hence

[Graphics:../Images/index_gr_64.gif]

can be used to generate exponentially distributed random numbers.

[Graphics:../Images/index_gr_65.gif]
[Graphics:../Images/index_gr_66.gif]

This is about the same speed as using RandomArray.

In a Poisson process, the time interval between two successive occurrences is exponentially distributed (see [1]). A simple relationship between the Poisson and exponential distributions is implied: add as many independent and identically distributed (i.i.d.) exponential random variables as needed to exceed [Graphics:../Images/index_gr_67.gif]. The number of terms minus 1 follows a Poisson distribution with the same parameter. The code

[Graphics:../Images/index_gr_68.gif]
[Graphics:../Images/index_gr_69.gif]
[Graphics:../Images/index_gr_70.gif]

is over five times faster than using RandomArray directly.

[Graphics:../Images/index_gr_71.gif]
[Graphics:../Images/index_gr_72.gif]

A general and powerful alternative uses the inverse CDF. Note that this method is not restricted to the Poisson distribution (see [2] for a discussion of the more general problem of random sampling from any discrete distribution). For the Poisson distribution, the CDF is the following.

[Graphics:../Images/index_gr_73.gif]
[Graphics:../Images/index_gr_74.gif]

For mean [Graphics:../Images/index_gr_75.gif], here is a plot of the CDF.

[Graphics:../Images/index_gr_76.gif]

[Graphics:../Images/index_gr_77.gif]

To compute the inverse CDF, we make use of the following procedure. Here are the step values of the CDF for [Graphics:../Images/index_gr_78.gif] (cf. previous plot).

[Graphics:../Images/index_gr_79.gif]
[Graphics:../Images/index_gr_80.gif]

We can extract the cumulative frequencies from this table.

[Graphics:../Images/index_gr_81.gif]
[Graphics:../Images/index_gr_82.gif]

An elegant way to define the "inverse" of a step function is to use Interpolation with InterpolationOrder -> 0, [suggested by Jens-Peer Kuska (kuska@informatik.uni-leipzig.de)].

[Graphics:../Images/index_gr_83.gif]
[Graphics:../Images/index_gr_84.gif]

This method is around 50 times faster than RandomArray!

[Graphics:../Images/index_gr_85.gif]
[Graphics:../Images/index_gr_86.gif]
[Graphics:../Images/index_gr_87.gif]

The standard statistical approach for testing equality of two distributions proceeds along the following lines. The cumulative relative frequencies (also known as the empirical distribution function, or EDF) of the direct (directvalues) and the approximate sampling procedures (inversevalues) are given by the following two expressions, respectively.

[Graphics:../Images/index_gr_88.gif]
[Graphics:../Images/index_gr_89.gif]
[Graphics:../Images/index_gr_90.gif]
[Graphics:../Images/index_gr_91.gif]

The following table, displayed using PrettyTable (see the section "Formatted Tables"), summarizes the results. The first column gives the values of the Poisson CDF, while the second and third columns give the EDFs of the values obtained through the direct and the approximate sampling procedures, respectively.

[Graphics:../Images/index_gr_92.gif]

[Graphics:../Images/index_gr_93.gif]

Visual inspection shows a good agreement between the experimental results for the two procedures and the Poisson CDF. The maximum distance between [Graphics:../Images/index_gr_94.gif] and the Poisson CDF is

[Graphics:../Images/index_gr_95.gif]
[Graphics:../Images/index_gr_96.gif]

The corresponding distance for [Graphics:../Images/index_gr_97.gif] is

[Graphics:../Images/index_gr_98.gif]
[Graphics:../Images/index_gr_99.gif]

In order to test whether these observed distances simply reflect the randomness inherent in the sampling process, an experiment is conducted to estimate their probabilistic behavior. A sample of 10,000 Poisson random numbers is obtained and their EDF constructed. The maximum distance, say [Graphics:../Images/index_gr_100.gif], between this EDF and the Poisson CDF is then recorded, and the procedure is repeated a large number of times, e.g., 1,000. The EDF of the 1,000 values of [Graphics:../Images/index_gr_101.gif] thus obtained provides an estimate of the probability distribution of the maximum distances which arise in the sampling process. One may then judge whether a given distance is too large to be considered as a natural product of randomness.

The experiment is conducted thus: we first compute the exact cumulative frequencies which (implicitly) set the (arbitrary) length of the EDF obtained in each repetition of the experiment.

[Graphics:../Images/index_gr_102.gif]
[Graphics:../Images/index_gr_103.gif]

The behavior of the maximum distances between sampling EDFs and the Poisson CDF,

[Graphics:../Images/index_gr_104.gif]

appears in the graph below.

[Graphics:../Images/index_gr_105.gif]

[Graphics:../Images/index_gr_106.gif]

Here we see that experimentally obtained distances in samples of size 10,000 from the Poisson distribution are seldom larger than 0.0092 (only in 10% of the cases) or 0.0105 (only in 5% of the cases). Both distances [Graphics:../Images/index_gr_107.gif] and [Graphics:../Images/index_gr_108.gif] fall comfortably within the interval [Graphics:../Images/index_gr_109.gif]. This indicates that both [Graphics:../Images/index_gr_110.gif] and [Graphics:../Images/index_gr_111.gif] are compatible with the hypothesis H: the samples they are constructed from were drawn from a Poisson distribution.

This testing procedure is essentially equivalent to the Kolmogorov--Smirnov test of statistical theory (see, for example, [3, chapt. 9] or [4]). However, here we have estimated, through a simulation technique, the EDF of maximum distances appropriate for our specific problem, instead of using a general-purpose table of values as is the case in conventional statistics. The Kolmogorov--Smirnov table yields the values 0.0125 and 0.014 as those which would be exceeded in 90% and 95% of the cases (cf. the corresponding 0.0092 and 0.0105 displayed above) for a sample size of 10,000, respectively. The test procedure outlined here is therefore more precise, in the sense that it requires a closer fit between the two distributions in order to accept the hypothesis.


Converted by Mathematica      May 8, 2000

[Tricks of the Trade Index] [Prev Page][Next Page]