Volume 9, Issue 4

Articles
In and Out
Trott's Corner
New Products
New Publications
Calendar
News Bulletins
New Resources
Classifieds

Editorial Policy
Staff and Contributors
Submissions
Subscriptions
Back Issues
Contact Information

Bootstrap Tutorial

# Distribution of Sample Mean

We begin by reviewing the most elementary problem in statistics, the distribution of the sample mean.

Let for be a sample of IID random variables with and . The sample mean is defined by

Since the observations are random variables, the sample mean is a random variable and we can, in principle, compute its distribution. Using the properties of expected value, it is a standard exercise to show that and , which gives us the first two moments of the sampling distribution. If the terms are distributed , we know that a sum of these variables will be normally distributed, so will be distributed .

But what do we do if the s are not normally distributed or if we are computing some function of the sample that is more complex than the mean?

Classical statistics was driven by analytic tractability, and the methods used in classical statistics only apply to certain well-behaved distributions and certain, mostly linear, computations. With modern computers, analytic complexity is no barrier to computing estimates of the sampling distribution of almost any statistic, as we demonstrate next using Monte Carlo simulation.

Here we draw a list of 25 uniformly distributed random numbers, compute the mean, and repeat this 100 times. This will give us 100 different estimates of the mean of the underlying distribution.

Let us look at the distribution of these 100 calculated means; this frequency distribution can be viewed as an estimate of the true sampling distribution.

Since the underlying random variable is uniformly distributed on [0,1], the estimated mean should be close to 0.5. The variance of the uniform distribution is

So the variance of the sample mean of 25 observations should be

The estimates we have computed should not be too far from these numbers.

We can do the same thing for 5000 repetitions, in which case the estimated results should be much closer to the theoretical predictions.

The Monte Carlo method can be used to compute an estimate of the sampling distribution for virtually any statistic, as long as we know the distribution from which the samples are drawn.