In statistics, we have studied the classification of data into a grouped and ungrouped frequency distribution. These data can be pictorially represented using different graphs such as bar graphs, frequency polygons and histograms and so on. Also, we know that the three measures of central tendencies are mean, median and mode. In this article, we will discuss how to find the mean of the grouped data using different methods such as direct method, assumed mean method and step deviation method with many solved examples.
Table of Contents:
What is Meant by Mean in Statistics?
The mean or the average of the given observations is defined as the sum of the values of all the observations divided by the total number of observations. The mean of the data is generally represented by the notation x̄. If x_{1}, x_{2}, x_{3}, …x_{n} are the number of observations with respective frequencies f_{1}, f_{2}, f_{3}, … f_{n}, then
The sum of observations = f_{1}x_{1}+ f_{2}x_{2} + f_{3}x_{3} + ….+ f_{n}x_{n}.
The total number of observations = f_{1}+f_{2}+… + f_{n}.
Therefore, the mean of the data, x̄ = (f_{1}x_{1}+ f_{2}x_{2} + f_{3}x_{3} + ….+ f_{n}x_{n})/ ( f_{1}+f_{2}+… + f_{n}).
In short, the above form can be represented using the summation (Σ).
Where, “i” varies from 1 to n.
Now, let us discuss how to find the mean of the given data using the above formula,
Example:
The marks scored by 30 students of class 10 of a certain school in the Maths paper consisting of 100 marks is given below in the tabular form. Find the mean of the marks obtained by the class 10 students.
Marks obtained (x_{i}) | 10 | 20 | 36 | 40 | 50 | 56 | 60 | 70 | 72 | 80 | 88 | 92 | 95 |
Number of students (f_{i}) | 1 | 1 | 3 | 4 | 3 | 2 | 4 | 4 | 1 | 1 | 2 | 3 | 1 |
Solution:
To find the mean of the marks obtained by the students in the Mathematics paper, we need to find the product of each x_{i} and their corresponding frequency f_{i}.
Marks Obtained (x_{i}) | Number of students (f_{i}) | f_{i}x_{i} |
10 | 1 | 10 |
20 | 1 | 20 |
36 | 3 | 108 |
40 | 4 | 160 |
50 | 3 | 150 |
56 | 2 | 112 |
60 | 4 | 240 |
70 | 4 | 280 |
72 | 1 | 72 |
80 | 1 | 80 |
88 | 2 | 176 |
92 | 3 | 276 |
95 | 1 | 95 |
Total | Σf_{i} = 30 | Σf_{i}x_{i} = 1779 |
Table 1
Thus, by using the formula,
x̄ = 1779/30
x̄ = 59.3
Hence, the mean of the marks obtained is 59.3.
Three Methods to Find the Mean of Grouped Data
In many scenarios, the data is large and to make a meaningful study, the data has to be condensed as grouped data. So, in those scenarios, we have to convert the ungrouped data into a grouped data and then find the mean. The three methods to find the mean of the grouped data is:
- Direct Method
- Assumed Mean Method
- Step-deviation Method.
Now, let us discuss all these three methods one by one.
Direct Method
Consider the same example as given above.
Now, convert the ungrouped data into grouped data by forming a class interval of width 15.
Note, that while taking the frequencies to each class interval, students falling in the upper-class limit will be considered in the next class interval.
Therefore, the grouped frequency distribution table for the above-given example is as follows:
Class Interval | 10-25 | 25-40 | 40-55 | 55-70 | 70-85 | 85-100 |
Number of Students | 2 | 3 | 7 | 6 | 6 | 6 |
Now, for each class interval, we need to find the midpoint (classmark) that serves as the representative of the whole class.
For example, for the first class interval, 10-25, the class mark is:
Class Mark = (Upper class limit + lower class limit)/2
Class Mark = (25+10)/2 = 17.5
Similarly, find the classmark for all the intervals.
Therefore, the mean of the marks obtained by the students is given as:
Class Interval | Number of students (f_{i}) | Class Mark (x_{i}) | f_{i}x_{i} |
10-25 | 2 | 17.5 | 35 |
25-40 | 3 | 32.5 | 97.5 |
40-55 | 7 | 47.5 | 332.5 |
55-70 | 6 | 62.5 | 375 |
70-85 | 6 | 77.5 | 465 |
85-100 | 6 | 92.5 | 555 |
Total | Σf_{i} = 30 | Σf_{i}x_{i} = 1860 |
Table 2
Therefore, Mean, x̄ = 1860/30 = 62
The mean value obtained using the direct method is 62.
If you compare the mean obtained from Table 1 and Table 2, 59. 3 being the exact mean, whereas 62 is the approximate mean, because of the midpoint assumption in Table 2.
Assumed Mean Method
If the numerical values of x_{i }and f_{i }are large, finding the product of x_{i }and f_{i }becomes a time-consuming process. To reduce the calculations, we can use the assumed mean method.
In this method, first, we need to choose the assumed mean, say “a” among the x_{i}, which lies in the centre. (If we consider the same example, we can choose either a = 47.5 or 62.5). Now, let us choose a = 47.5.
The second step is to find the difference (d_{i}) between each x_{i} and the assumed mean “a”.
The third step is to find the product of d_{i} with the corresponding f_{i}.
Class Interval | Number of students (f_{i}) | Class Mark (x_{i}) | d_{i} = x_{i} – 47.5 | f_{i}d_{i} |
10-25 | 2 | 17.5 | -30 | -60 |
25-40 | 3 | 32.5 | -15 | -45 |
40-55 | 7 | 47.5 | 0 | 0 |
55-70 | 6 | 62.5 | 15 | 90 |
70-85 | 6 | 77.5 | 30 | 180 |
85-100 | 6 | 92.5 | 45 | 270 |
Total | Σf_{i} = 30 | Σf_{i}d_{i} = 435. |
Table 3
Hence, the mean of the deviations obtained is,
As, the relationship between
We can write
Now, substitute the values of a, Σf_{i }, and Σf_{i}di in the above formula to get the mean,
Therefore, x̄ = 47.5 + (435/30)
x̄ = 47.5 + 14.5
x̄ = 62.
Therefore, the mean of the marks obtained by the class 10 students is 62.
Hence, the result obtained from the direct method and assumed mean method is the same.
Step Deviation Method
Consider the same example as given above. In the step deviation method, we will add one more column to the table to find the mean, which is u_{i} = (x_{i} – a)/h
Where “a” is the assumed mean and “h” is the class size, which is equal to 15 (i.e) width of the class interval.
Class Interval | Number of students (f_{i}) | Class Mark (x_{i}) | d_{i} = x_{i} – 47.5
d_{i} = x_{i} – a |
u_{i} =(x_{i} – a)/h
(h=15) |
f_{i}u_{i} |
10-25 | 2 | 17.5 | -30 | -2 | -4 |
25-40 | 3 | 32.5 | -15 | -1 | -3 |
40-55 | 7 | 47.5 | 0 | 0 | 0 |
55-70 | 6 | 62.5 | 15 | 1 | 6 |
70-85 | 6 | 77.5 | 30 | 2 | 12 |
85-100 | 6 | 92.5 | 45 | 3 | 18 |
Total | Σf_{i} = 30 | Σf_{i}u_{i} = 29 |
Table 4
Therefore, we obtained
The relation between
Now, substitute the values of a, h,Σf_{i }, and Σf_{i}ui in the above formula to get the mean,
x̄ = 47.5 + 15(29/30)
x̄ = 47.5 + 15(0.967)
x̄= 47.5+ 14.5
x̄ = 62
Hence, the mean of the marks scored by the students = 62.
Therefore, the mean obtained by all three methods is the same.
Thus, we can say that the assumed mean method and the step deviation method are the simplified forms of the direct method.
(Note: In this example, the mean of the grouped data slightly differs from the mean of the ungrouped data because of the midpoint assumption).
Also, read: |
Practice Problems
- Consider the distribution of the daily wages of 50 employees of a factory as given below. Determine the mean of the daily wages of the workers of a factory using the approximate method.
Daily wages (in Rs) 500-520 520-540 540-560 560-580 580-600 Number of workers 12 14 8 6 10 - The given table shows the expenditure on the food of 25 households in a locality. Find the mean of daily expenditure on food using a suitable method.
Daily expenditure (in Rs) 100-150 150-200 200-250 250-300 300-350 Number of households 4 5 12 2 2
Stay tuned with BYJU’S – The Learning App and learn all the Maths-related concepts easily by exploring more exciting videos.
Frequently Asked Questions on Mean of Grouped Data
What are the three methods used to find the mean of grouped data?
The three methods used to find the mean of the grouped data are:
Direct method
Assumed mean method
Step deviation method
What is meant by class-mark?
The classmark is also called the midpoint of the class intervals, which can be found by taking the average of its upper-class limit and lower-class limit.
Class Mark = (upper class limit + lower class limit)/2
Can we get the same mean value for both the grouped and ungrouped data?
The mean value of grouped data slightly differs from the ungrouped data because of the midpoint assumption.
What does “h” mean in the step-deviation method?
In the step-deviation method, “h” represents class size.
How to choose the value of “a” in the assumed mean method?
In the assumed mean method, the value of “a” can be chosen which lies in the centre of x_{1}, x_{2}, . . ., x_{n}.
Comments