Central Limit Theorem

Central limit theorem is a statistical theory which states that when the large sample size has a finite variance, the samples will be normally distributed and the mean of samples will be approximately equal to the mean of the whole population.

In other words, the central limit theorem states that for any population with mean and standard deviation, the distribution of the sample mean for sample size N has mean μ and standard deviation σ / √n .

As the sample size gets bigger and bigger, the mean of the sample will get closer to the actual population mean. If the sample size is small, the actual distribution of the data may or may not be normal, but as the sample size gets bigger, it can be approximated by a normal distribution. This statistical theory is useful in simplifying analysis while dealing with stock indexes and many more.

The CLT can be applied to almost all types of probability distributions. But there are some exceptions. For example, if the population has a finite variance. Also, this theorem applies to independent, identically distributed variables. It can also be used to answer the question of how big a sample you want. Remember that as the sample size grows, the standard deviation of the sample average falls because it is the population standard deviation divided by the square root of the sample size. This theorem is an important topic in statistics. In many real-time applications, a certain random variable of interest is a sum of a large number of independent random variables. In these situations, we can use the CLT to justify using the normal distribution.

In this article, students can learn the central limit theorem formula, definition and examples. 

TABLE OF CONTENTS

1. Statement
2. Formula
3. Proof
4. Solved Examples

Central Limit Theorem Statement

The central limit theorem states that whenever a random sample of size n is taken from any distribution with mean and variance, then the sample mean will be approximately normally distributed with mean and variance. The larger the value of the sample size, the better the approximation to the normal.

Assumptions of Central Limit Theorem

  • The sample should be drawn randomly following the condition of randomization.
  • The samples drawn should be independent of each other. They should not influence the other samples.
  • When the sampling is done without replacement, the sample size shouldn’t exceed 10% of the total population.
  • The sample size should be sufficiently large.

Formula

The formula for the central limit theorem is given below:

Central Limit Theorem for Sample Means,

Z =

\(\begin{array}{l}\frac{\bar x – \mu}{\frac{\sigma}{\sqrt{n}}}\end{array} \)

Proof

Consider x1, x2, x3,……,xn are independent and identically distributed with mean

\(\begin{array}{l}\mu\end{array} \)
and finite variance
\(\begin{array}{l}\sigma^2\end{array} \)
, then any random variable Zn as,

Zn =

\(\begin{array}{l}\frac{\bar X_n – \mu}{\frac{\sigma}{\sqrt{n}}}\end{array} \)
, where
\(\begin{array}{l}\bar x_n\end{array} \)
=
\(\begin{array}{l}\frac{1}{n} \sum_{i = 1}^n\end{array} \)
\(\begin{array}{l}x_i\end{array} \)
.

Then the distribution function of Zn converges to the standard normal distribution function as n increases without any bound.

Again, define a random variable Ui by

Ui =

\(\begin{array}{l}\frac{x_i – \mu}{\sigma}\end{array} \)

E(Ui) = 0 and V(Ui) = 1

Thus, the moment generating function can be written as

mu(t) = 1 +

\(\begin{array}{l}\frac{t^2}{2} + \frac{t^3}{3!} E(U_i^3) + ……..\end{array} \)

Also Zn =

\(\begin{array}{l}\sqrt{n}(\frac{\bar X – \mu}{\sigma})\end{array} \)

=

\(\begin{array}{l}\frac{1}{\sqrt{n}} \sum U\end{array} \)

Since xi are random independent variables, so Ui are also independent.

This implies, mu(t) =

\(\begin{array}{l}(1\ + \frac{t^2}{2n} + \frac{t^3}{3! n^{\frac{3}{2}}}E(U_i^3)\ +\ ………..)^n\end{array} \)

or

\(\begin{array}{l}ln\ m_u(t) = n\ ln\ ( 1\ + \frac{t^2}{2n} + \frac{t^3}{3! n^{\frac{3}{2}}} E(U_i^3)\ +\ ………..) \end{array} \)

As per Taylor series expansion:

\(\begin{array}{l} ln(1\ +\ x) = x\ – \frac{x^2}{2}\ +\ \frac{x^3}{3}\ -\ \frac{x^4}{4}\ + …….. \end{array} \)

If x =

\(\begin{array}{l}\frac{t^2}{2n}\ +\ \frac{t^3}{3! n^{\frac{3}{2}}}\ E(U_i^3)\end{array} \)

Then,

\(\begin{array}{l} ln(m_u(t)) = n\ ln\ (1\ +\ x) = n(x\ – \frac{x^2}{2}\ +\ \frac{x^3}{3} -\ ……..) \end{array} \)

Multiply each term by n and as

\(\begin{array}{l}n\ \rightarrow\ \infty\end{array} \)
, all terms but the first go to zero.

\(\begin{array}{l} \lim_{n->\infty}\ ln(m_u(t))\end{array} \)
=
\(\begin{array}{l}\frac{t^2}{2}\end{array} \)
and
\(\begin{array}{l}\lim_{n->\infty}\ (m_u(t))\end{array} \)
=
\(\begin{array}{l}\exp\end{array} \)
\(\begin{array}{l}(\frac{t^2}{2})\end{array} \)

Which is the moment generating function for a standard normal random variable.

Steps

The steps used to solve the problem of the central limit theorem that are either involving ‘>’ ‘<’ or “between” are as follows:

1) The information about the mean, population size, standard deviation, sample size and a number that is associated with “greater than”, “less than”, or two numbers associated with both values for a range of “between” is identified from the problem.

2) A graph with a centre as mean is drawn. 

3) The formula z =

\(\begin{array}{l}\frac{\bar x – \mu}{\frac{\sigma}{\sqrt{n}}}\end{array} \)
is used to find the z-score.

4) The z-table is referred to find the ‘z’ value obtained in the previous step.

5) Case 1: Central limit theorem involving “>”.

Subtract the z-score value from 0.5.

Case 2: Central limit theorem involving “<”.

Add 0.5 to the z-score value.

Case 3: Central limit theorem involving “between”.

Step 3 is executed.

6) The z-value is found along with x bar.

The last step is common to all three cases, that is to convert the decimal obtained into a percentage.

Also read

Mean value theorem

Correlation Coefficient

Examples on Central Limit Theorem

Example 1:

20 students are selected at random from a clinical psychology class, find the probability that their mean GPA is more than 5. If the average GPA scored by the entire batch is 4.91. The standard deviation is 0.72.

Solution:

Here,

Population mean =

\(\begin{array}{l}\mu\end{array} \)
= 4.91

Population standard deviation=

\(\begin{array}{l}\sigma\end{array} \)
= 0.72

Sample size =

\(\begin{array}{l}n\end{array} \)
= 20 (which is less than 30)

Since the sample size is smaller than 30, use t-score instead of the z-score, even though the population standard deviation is known.

\(\begin{array}{l}\sigma_{\bar x}\end{array} \)
=
\(\begin{array}{l}\frac{\sigma}{\sqrt n}\end{array} \)

Substituting the values we have:

\(\begin{array}{l}\sigma_{\bar x}=\frac{0.72}{\sqrt{20}}\end{array} \)
= 0.161

Now, Find t-score:

t =

\(\begin{array}{l}\frac{x – \mu}{\sigma_{\bar x}}\end{array} \)

For our problem, the raw score x = 5

t =

\(\begin{array}{l}\frac{5 – 4.91}{0.161}\end{array} \)
= 0.559

Find probability for t value using the t-score table. The degree of freedom here would be:

Df = 20 – 1 = 19

P(t

\(\begin{array}{l}\leq \end{array} \)
0.559) = 0.7087

P(t > 0.559) = 1 – 0.7087 = 0.2913

Thus the probability that the score is more than 5 is 9.13 %.

Example 2:

The average weight of a water bottle is 30 kg with a standard deviation of 1.5 kg. If a sample of 45 water bottles is selected at random from a consignment and their weights are measured, find the probability that the mean weight of the sample is less than 28 kg.

Solution:

Population mean:

\(\begin{array}{l}\mu\end{array} \)
= 30 kg

Population standard deviation:

\(\begin{array}{l}\sigma
\(\begin{array}{l} = 1.5 Kg

Sample size: n = 45 (which is greater than 30)

Using, z-score, we have

The sample standard deviation:

\(\begin{array}{l}\sigma_{\bar x}\end{array} \)
=
\(\begin{array}{l}\frac{\sigma}{\sqrt n}\end{array} \)

And,

\(\begin{array}{l}\sigma_{\bar x}\end{array} \)
=
\(\begin{array}{l}\frac{1.5}{\sqrt{45}}\end{array} \)
= 6.7082

Find z- score for the raw score of x = 28 kg

z =

\(\begin{array}{l}\frac{x – \mu}{\sigma_{\bar x}}\end{array} \)

= (28 – 30)(6.7082) = -0.2981

Using z- score table OR normal cdf function on a statistical calculator,

P(z < -0.2981) = 0.3828

Thus the probability that the weight of the cylinder is less than 28 kg is 38.28%.

Example 3: The record of weights of the female population follows a normal distribution. Its mean and standard deviation are 65 kg and 14 kg, respectively. If a researcher considers the records of 50 females, then what would be the standard deviation of the chosen sample?

Solution:

Mean of the population μ = 65 kg

Standard deviation of the population = 14 kg

sample size n = 50

Standard deviation is given by

\(\begin{array}{l}\sigma _{\bar{x}}= \frac{\sigma }{\sqrt{n}}\end{array} \)

= 14/

\(\begin{array}{l}\sqrt{50}\end{array} \)

= 14/7.071 

= 1.97

Applications of Central Limit Theorem

1] The sample distribution is assumed to be normal when the distribution is unknown or not normally distributed according to Central Limit Theorem. This method assumes that the given population is distributed normally. It helps in data analysis.

2] The sample mean deviation decreases as we increase the samples taken from the population, which helps in estimating the mean of the population more accurately.

3] The sample mean is used in creating a range of values which likely includes the population mean.

4] The concept of the Central Limit Theorem is used in election polls to estimate the percentage of people supporting a particular candidate as confidence intervals.

5] CLT is used in calculating the mean family income in a particular country.

6] It is used in rolling many identical, unbiased dice.

7] The probability distribution for total distance covered in a random walk will approach a normal distribution.

8] Flipping many coins will result in a normal distribution for the total number of heads (or equivalently total number of tails).

9] By looking at the sample distribution, CLT can tell whether the sample belongs to a particular population.

10] It enables us to make conclusions about the sample and population parameters and assists in constructing good machine learning models.

Frequently Asked Questions

How to determine the standard error of the mean?

Recall the Central limit theorem statement, which states that, For any population with mean and standard deviation, the distribution of the sample mean for sample size N have mean μ and standard deviation σ/√n. To determine the standard error of the mean, find the standard deviation for the population and divide by the square root of the sample size.

What are the properties of the Central Limit Theorem?

We can summarize the properties of the Central Limit Theorem for sample means with the following statements:
1. Sampling is a form of any distribution with mean and standard deviation.
2. Provided that n is large (n≥30), as a rule of thumb), the sampling distribution of the sample mean will be approximately normally distributed with a mean and a standard deviation is equal to σ/√n.
3. If the sampling distribution is normal, the sampling distribution of the sample means will be an exact normal distribution for any sample size.

Give an example where the central limit theorem is used in real life?

Biologists use the central limit theorem when they use data from a sample of organisms to make conclusions about the overall population of organisms.

Give the formula for the central limit theorem.

Central Limit Theorem for sample means, Z = (x̄-μ)/(σ/√n)

Test your Knowledge on central limit theorem