Central Limit Theorem

Central limit theorem is a statistical theory which states that when the large sample size is having a finite variance, the samples will be normally distributed and the mean of samples will be approximately equal to the mean of the whole population.

In other words, the central limit theorem states that for any population with mean and standard deviation, the distribution of the sample mean for sample size N has mean μ and standard deviation σ / √n .

As the sample size gets bigger and bigger, the mean of the sample will get closer to the actual population mean. If the sample size is small, the actual distribution of the data may or may not be normal, but as the sample size gets bigger, it can be approximated by a normal distribution. This statistical theory is useful in simplifying analysis while dealing with stock index and many more.

The CLT can be applied to almost all types of probability distributions. But there are some exceptions. For example, if the population has a finite variance. Also this  theorem applies to independent, identically distributed variables. It can also be used to answer the question of how big a sample you want. Remember that as the sample size grows, the standard deviation of the sample average falls because it is the population standard deviation divided by the square root of the sample size. This theorem is an important topic in statistics. In many real time applications, a certain random variable of interest is a sum of a large number of independent random variables. In these situations, we can use the CLT to justify using the normal distribution.

In this article, students can learn the central limit theorem formula , definition and examples. 

TABLE OF CONTENTS

1. Statement
2. Formula
3. Proof
4. Solved Examples

Central Limit Theorem Statement

The central limit theorem states that whenever a random sample of size n is taken from any distribution with mean and variance, then the sample mean will be approximately normally distributed with mean and variance. The larger the value of the sample size, the better the approximation to the normal.

Assumptions of Central Limit Theorem

  • The sample should be drawn randomly following the condition of randomization.
  • The samples drawn should be independent of each other. They should not influence the other samples.
  • When the sampling is done without replacement, the sample size shouldn’t exceed 10% of the total population.
  • The sample size should be sufficiently large.

Formula

The formula for the central limit theorem is given below:

Central Limit Theorem for Sample Means,

Z = xˉμσn\frac{\bar x – \mu}{\frac{\sigma}{\sqrt{n}}}

Proof

Consider x1, x2, x3,……,xn are independent and identically distributed with mean μ\mu and finite variance σ2\sigma^2, then any random variable Zn as,

Zn = Xˉnμσn\frac{\bar X_n – \mu}{\frac{\sigma}{\sqrt{n}}}, where xˉn\bar x_n = 1ni=1n\frac{1}{n} \sum_{i = 1}^n xix_i.

Then the distribution function of Zn converges to the standard normal distribution function as n increases without any bound.

Again, define a random variable Ui by

Ui = xiμσ\frac{x_i – \mu}{\sigma}

E(Ui) = 0 and V(Ui) = 1

Thus, the moment generating function can be written as

mu(t) = 1 + t22+t33!E(Ui3)+..\frac{t^2}{2} + \frac{t^3}{3!} E(U_i^3) + ……..

Also Zn = n(Xˉμσ)\sqrt{n}(\frac{\bar X – \mu}{\sigma})

= 1nU\frac{1}{\sqrt{n}} \sum U

Since xi are random independent variables, so Ui are also independent.

This implies, mu(t) =(1 +t22n+t33!n32E(Ui3) + ..)n(1\ + \frac{t^2}{2n} + \frac{t^3}{3! n^{\frac{3}{2}}}E(U_i^3)\ +\ ………..)^n

or ln mu(t)=n ln (1 +t22n+t33!n32E(Ui3) + ..)ln\ m_u(t) = n\ ln\ ( 1\ + \frac{t^2}{2n} + \frac{t^3}{3! n^{\frac{3}{2}}} E(U_i^3)\ +\ ………..)

As per Taylor series expansion:

ln(1 + x)=x –x22 + x33  x44 +.. ln(1\ +\ x) = x\ – \frac{x^2}{2}\ +\ \frac{x^3}{3}\ -\ \frac{x^4}{4}\ + ……..

If x = t22n + t33!n32 E(Ui3)\frac{t^2}{2n}\ +\ \frac{t^3}{3! n^{\frac{3}{2}}}\ E(U_i^3)

Then,

ln(mu(t))=n ln (1 + x)=n(x –x22 + x33 ..) ln(m_u(t)) = n\ ln\ (1\ +\ x) = n(x\ – \frac{x^2}{2}\ +\ \frac{x^3}{3} -\ ……..)

Multiply each term by n and as n  n\ \rightarrow\ \infty , all terms but the first go to zero.

limn> ln(mu(t)) \lim_{n->\infty}\ ln(m_u(t)) = t22\frac{t^2}{2} and limn> (mu(t))\lim_{n->\infty}\ (m_u(t)) = exp\exp(t22)(\frac{t^2}{2})

Which is the moment generating function for a standard normal random variable.

Steps

The steps used to solve the problem of central limit theorem that are either involving ‘>’ ‘<’ or “between” are as follows:

1) The information about the mean, population size, standard deviation, sample size and a number that is associated with “greater than”, “less than”, or two numbers associated with both values for range of “between” is identified from the problem.

2) A graph with a centre as mean is drawn. 

3) The formula z = xˉμσn\frac{\bar x – \mu}{\frac{\sigma}{\sqrt{n}}} is used to find the z-score.

4) The z-table is referred to find the ‘z’ value obtained in the previous step.

5) Case 1: Central limit theorem involving “>”.

Subtract the z-score value from 0.5.

Case 2: Central limit theorem involving “<”.

Add 0.5 to the z-score value.

Case 3: Central limit theorem involving “between”.

Step 3 is executed.

6) The z-value is found along with x bar.

The last step is common to all the three cases, that is to convert the decimal obtained into a percentage.

Also read

Mean value theorem

Correlation Coefficient

Examples on Central Limit Theorem

Example 1:

20 students are selected at random from a clinical psychology class, find the probability that their mean GPA is more than 5. If the average GPA scored by the entire batch is 4.91. The standard deviation is 0.72.

Solution:

Here,

Population mean = μ\mu = 4.91

Population standard deviation= σ\sigma = 0.72

Sample size = nn = 20 (which is less than 30)

Since the sample size is smaller than 30, use t-score instead of the z-score, even though the population standard deviation is known.

σxˉ\sigma_{\bar x} = σn\frac{\sigma}{\sqrt n}

Substituting the values we have:

σxˉ=0.7220\sigma_{\bar x}=\frac{0.72}{\sqrt{20}} = 0.161

Now, Find t-score:

t = xμσxˉ\frac{x – \mu}{\sigma_{\bar x}}

For our problem, the raw score x = 5

t = 54.910.161\frac{5 – 4.91}{0.161} = 0.559

Find probability for t value using the t-score table. The degree of freedom here would be:

Df = 20 – 1 = 19

P(t \leq 0.559) = 0.7087

P(t > 0.559) = 1 – 0.7087 = 0.2913

Thus the probability that the score is more than 5 is 9.13 %.

Example 2:

The average weight of a water bottle is 30 kg with a standard deviation of 1.5 kg. If a sample of 45 water bottles is selected at random from a consignment and their weights are measured, find the probability that the mean weight of the sample is less than 28 kg.

Solution:

Population mean: μ\mu = 30 kg

Population standard deviation: σ=1.5Kg\sigma = 1.5 Kg

Sample size: n = 45 (which is greater than 30)

Using, z-score, we have

The sample standard deviation:

σxˉ\sigma_{\bar x} = σn\frac{\sigma}{\sqrt n}

And, σxˉ\sigma_{\bar x} = 1.545\frac{1.5}{\sqrt{45}} = 6.7082

Find z- score for the raw score of x = 28 kg

z = xμσxˉ\frac{x – \mu}{\sigma_{\bar x}}

= (28 – 30)(6.7082) = -0.2981

Using z- score table OR normal cdf function on a statistical calculator,

P(z < -0.2981) = 0.3828

Thus the probability that the weight of the cylinder is less than 28 kg is 38.28%.

Example 3: The record of weights of female population follows normal distribution. Its mean and standard deviation are 65 kg and 14 kg respectively. If a researcher considers the records of 50 females, then what would be the standard deviation of the chosen sample?

Solution:

Mean of the population μ = 65 kg

Standard deviation of the population = 14 kg

sample size n = 50

Standard deviation is given by σxˉ=σn\sigma _{\bar{x}}= \frac{\sigma }{\sqrt{n}}

= 14/50\sqrt{50}

= 14/7.071 

= 1.97

 

Frequently Asked Questions

How to determine the standard error of the mean?

Recall Central limit theorem statement, which states that,For any population with mean and standard deviation, the distribution of sample mean for sample size N have mean μ\mu and standard deviation σn\frac{\sigma}{\sqrt n} . To determine the standard error of the mean, the standard deviation for the population and divide by the square root of the sample size.
Z = XˉμσXˉ\frac{\bar X – \mu}{\sigma_{\bar X}}
where, σXˉ\sigma_{\bar X} = σN\frac{\sigma}{\sqrt{N}}
Xˉ\bar X = sample mean
μ\mu = mean of sampling distribution
σXˉ\sigma_{\bar X} = standard deviation of the sampling distribution or standard error of the mean.

What are the properties of the Central Limit Theorem?

We can summarize the properties of the Central Limit Theorem for sample means with the following statements:
1. Sampling is a form of any distribution with mean and standard deviation.
2. Provided that n is large (n \geq 30), as a rule of thumb), the sampling distribution of the sample mean will be approximately normally distributed with a mean and a standard deviation is equal to σn\frac{\sigma}{\sqrt{n}} .
3. If the sampling distribution is normal, the sampling distribution of the sample means will be an exact normal distribution for any sample size.