Analysis of Frequency Distributions

Before going into the analysis of frequency distributions, first recall what frequency distribution is. A frequency distribution is a general way to organize raw data of a quantitative variable. It illustrates how different values of a variable are distributed and their corresponding frequencies. We can calculate various measures to analyze the data in the given frequency distribution. In this article, you will learn how to compare the two distributions or two series of data in detail.

As of now, one might have learned how to compare numbers, patterns, shapes and so on. Also, you might have come across different types of measures of dispersion such as mean, median, mode, standard deviation, etc. Whenever we want to compare the variability of two series with the same mean, measured in different units, we do not merely calculate the measures of dispersion. Still, we need such measures which are independent of the units. The measure of variability, independent of units, is called the coefficient of variation (CV). However, we know that the mean deviation and the standard deviation have the same units in which the data are given.

To compare the variability or dispersion of two series, one should estimate the coefficient of variance for each series. The coefficient of variation is defined as the percentage of standard deviation over mean. This can be calculated as:

$$\begin{array}{l}\large Coefficient\ of\ Variation = C.V. = \frac{\sigma }{\overline{x}}\times 100;\ \overline{x}\ne 0\end{array}$$

Hare,

$$\begin{array}{l}\sigma\end{array}$$
= Standard deviation of the data

$$\begin{array}{l}\overline{x}\end{array}$$
= Mean of the data

The series of data or frequency distribution having greater C.V. is more variable than the other. In contrast, the series of data or frequency distribution having lesser C.V. is more consistent than the other.

Let’s understand how to compare the given two frequency distributions based on their means.

Comparison of Two Frequency Distributions With the Same Mean

Suppose

$$\begin{array}{l}\overline{x_1}\end{array}$$
and
$$\begin{array}{l}\sigma_1\end{array}$$
are the mean and standard deviation of the first frequency distribution, and
$$\begin{array}{l}\overline{x_2}\end{array}$$
and
$$\begin{array}{l}\sigma_2\end{array}$$
are the mean and standard deviation of the second frequency distribution, such that their means are equal.

The Coefficient of Variation formula for the first distribution is:

$$\begin{array}{l}C.V._{1} = \frac{\sigma_1 }{\overline{x_1}}\times 100\end{array}$$

Coefficient of Variation for the second distribution is:

$$\begin{array}{l}C.V._{2} = \frac{\sigma_2 }{\overline{x_2}}\times 100\end{array}$$

Since

$$\begin{array}{l}\overline{x_1} = \overline{x_2}\end{array}$$
, we assume that
$$\begin{array}{l}\overline{x_1} = \overline{x_2} = \overline{x}\end{array}$$
.

So, the coefficients of variations can be written as:

$$\begin{array}{l}C.V._{1} = \frac{\sigma_1 }{\overline{x}}\times 100\end{array}$$
….(i)

$$\begin{array}{l}C.V._{2} = \frac{\sigma_2 }{\overline{x}}\times 100\end{array}$$
….(ii)

From these two equations, we can say that the two C.Vs. can be compared on the basis of values of σ1 and σ2 only. In other words, when means of two frequency distributions are the same, then they can be analysed based on their standard deviations. Thus, we declare that the frequency distribution with greater standard deviation or variance {since the standard deviation is the square root of variance} is called more variable or dispersed than the other for two frequency distributions with equal means. Also, the frequency distribution with the lesser value of standard deviation or variance is more consistent.

Solved Example

Question:

The below data shows the mean and variance of heights and the corresponding weights of the students of Class X:

Height

Weight

Mean

162.6 cm

52.36 kg

Variance

127.69 cm2

23.1361 kg2

What can be said about the weights and the heights?

Solution:

For the given, we consider the heights of students as one series of data and weights as the other series of data.

So,

Mean of height = 162.6 cm

Variance of height = 127.69cm2

Therefore, standard deviation of height = √127.69 cm = 11.3 cm

Also,

Mean of weight = 52.36 kg

Variance of weight = 23.1361 kg2

Thus, the standard deviation of weight = √23.1361 kg = 4.81 kg

Now, we need to calculate the coefficient of variation for these two data sets to identify the relationship between them.

Using the coefficient of variation formula,

$$\begin{array}{l}C.V. = \frac{\sigma }{\overline{x}}\times 100\end{array}$$

i.e. CV = (standard deviation/mean) × 100

For heights, the coefficient of variation (C.V.) = (11.3/162.6) × 100 = 6.95

For weights, the coefficient of variation (CV) = (4.81/52.36) × 100 = 9.18

Here, the C.V. of heights is lesser than the C.V. of weights.