Comparison of Population by Random Sample Method with Examples - BYJUS

# Comparison of Population by Random Sampling

Sampling is the process of taking a small group of people from a bigger group for collecting information. We can use the concept of sampling to compare two populations on the basis of any statistic. We will learn some terms related to sampling, and we will look at some solved examples of comparison of populations by random sampling....Read MoreRead Less

## Comparison of Population by Random Sampling

Hospitals use statistics to calculate patient and health care delay times. During pandemics, health care centers and hospitals make public the data on how many patients were admitted, how many patients were discharged, and how many patients got infected. We can use the statistics to examine and analyze the data regarding the overall health centers and hospitals. We can make a report of all the data and get the results of how many patients got infected on average. Read on to learn how they create these reports, compare populations, and conduct analyses.

## How to Compare the Populations in Statistics?

Use the mean and the mean absolute deviation (MAD) methods to compare two populations when both distributions are symmetric. Use the median and the interquartile range (IQR) when either one or both distributions are skewed. Remember to think about sampling variability, the chance variability from sample to sample. Make sure you’re not making decisions simply on the basis of the fact that two sample means aren’t equal. When making a decision, consider the distribution of the variation in sample means.

1. If there is no difference in the population means, a meaningful difference in sample means is unlikely to have occurred by chance. A significant difference in sample means is one that is far from zero (or not likely to happen if the population means are equal). A sample mean difference that is close to 0, indicating that the population means are equal, is considered non-significant.

2. Note that the size of the difference required to be declared meaningful is dependent on the context, sample size, and population variability.

Using the MAD, we can describe the difference between two sample means. Then we divide the difference in the sample means by the larger MAD to determine how many MADs separate the means.

3. In data distributions, variability is a natural occurrence. The sample means of two data distributions can be compared by describing how far apart they are. The degree of separation is determined by the number of MADs that separate the means (Note that if the two sample MADs differ, the calculation is based on the larger of the two.)

## Example on Comparing Populations

There are exactly the same number of values in both data sets. The values in the data sets are represented by the double box-and-whisker plot.

Data Set A and B

a. Measures of center and variation are used to compare the data sets.

Both distributions are skewed in one direction or the other. Use the median and the interquartile range (IQR) to make your decision.

DATA SET A

DATA SET B

Median = 60 (it is in the middle of the plot)

Median = 90 (it is in the middle of the plot)

IQR = (greater value - lower value)

IQR = (greater value - lower value)

from the plot = 80 - 30 = 50

from the plot = 100 - 80 = 20

So, data set B has a greater measure at the center, and data set A has a greater measure of variation.

b. Which data set is more likely to contain a value of 95?

About 25% of the data values in data set A are between 80 and 130.

About 50% of the data values in data set B are between 80 and 100.

So, data set B is more likely to contain a value of 95.

c. Which data set is more likely to contain a value that differs from the center by at least 30?

The IQR of data set A is 50 and the IQR of data set B is 20. This means it is more common for a value to differ from the center by 30 in data set A than in data set B. So, data set A is more likely to contain a value that differs from the center by at least 30.

## Visual Overlap

1. When two populations have similar variabilities, the visual overlap of the data can be described by writing the difference in the measures of center as a multiple of the measure of variation. Greater values indicate less visual overlap.

2. Less visual overlap indicates a more significant difference in the measures of the center.

3. When the difference in the measure of center is at least two times the measure of variation, the difference is significant.

## Examples on Describing a Visual Overlap

1. Two data sets are shown in the double dot plot. As a multiple of the measure of variation, express the difference in the measures of center.

Data A and Data B

Both distributions are approximately symmetric. Use the mean and the MAD to describe the centers and variations.

Mean of data set $$A = \frac{\text{Sum of all the data values in set A}}{\text{Total number of values in set A}}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \frac{20+30+30+40+40+50+50+50+60+60+60+70+80+80+90}{15}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \frac{810}{15}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= 54$$

Mean of data set $$B = \frac{\text{Sum of all the data values in set B}}{\text{Total number of values in set B}}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \frac{0+0+0+10+20+20+30+30+30+30+40+40+50+60+60}{15}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \frac{420}{15}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= 28$$

Mean absolute deviation of set $$A = \sum_{1}^{n}\frac{(x_i-mean)}{n}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \frac{20-54+30-54+30-54+40-54+40-54+50-54+50-54+50-54+60-54+60-54+60-54+70-54+80-54+80-54+90-54}{15}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \frac{244}{15}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= 16$$

Mean absolute deviation of set  $$B = \sum_{1}^{n}\frac{(x_i-mean)}{n}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \frac{0-54+0-54+0-54+10-54+20-54+20-54+30-54+30-54+30-54+30-54+40-54+40-54+50-54+60-54+60-54}{15}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \frac{236}{15}$$

$$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= 16$$

DATA SET A

DATA SET B

Mean = $$= \frac{810}{15}=54$$

Mean = $$= \frac{420}{15} = 28$$

Mad = $$= \frac{244}{15} = 16.26\simeq 16$$

Mad = $$= \frac{236}{15} = 15.7\simeq 16$$

$$\frac{(\text{(Difference in means)})}{(\text{(MAD)})} = (\frac{26}{16})\sim 1.6$$

So, the difference in the means is about 1.6 times the MAD.

2. The heights of roller coasters at two amusement parks are represented by the double box and whisker plot. Are the roller coasters at one park significantly taller than those at the other?

Park A and Park B

Height in meters

The distribution for park A is skewed, so use the median and the IQR to describe the centers and variations.

Park A

Park B

Median = 50 (it is in the middle of the plot)

Median = 55 (it is in the middle of the plot)

IQR = (greater value - lower value) from the plot = 55 - 45 = 10

IQR = (greater value - lower value) from the plot = 60 - 50 = 10

Because the variabilities are similar, you can describe the visual overlap by expressing the difference in medians as a multiple of the IQR.

$$\frac{(\text{Difference in means})}{(\text{IQR})}=\frac{5}{10}=0.5$$

The difference in medians is not significant because the quotient is less than 2. The roller coasters in one park are not significantly taller than those in the other.

## How to Compare Populations using Random Samples?

You do not need to have all of the data from two populations to make comparisons. You can use random samples to make comparisons. You are more likely to make valid comparisons when the sample size is large and when there is little variability in the data.

## Examples on Comparing Random Samples

1. There are 1000 numbered tiles in each of the two bags. A random sample of 12 numbers from each bag is represented by the double box and whisker plot. Measures of center and variation are used to compare the samples. Can you figure out which bag has more tiles with higher numbers?

Bag A and Bag B

Both distributions are skewed right, so use the median and the IQR.

BAG A

BAG B

Median = 4 (it is in the middle of the plot)

Median = 3 (it is in the middle of the plot)

IQR = (greater value - lower value) from the plot = 6 - 3 = 3

IQR = (greater value - lower value) from the plot = 5 - 2 = 3

The samples have similar variations, but the sample from bag A has a higher median.

However, the sample size is insufficient to conclude that tiles in bag A have a higher number of them than tiles in bag B.

Using multiple random samples to the above example:

In the above example, the double box and whisker plot represents the medians of 50 random samples of 12 numbers from each bag. In the example above, compare the variability of the sample medians to the variability of the samples. Can you figure out which bag has more tiles with higher numbers?

BAG A

BAG B

Median = 5.5 (it is in the middle of the plot)

Median = 3.5 (it is in the middle of the plot)

IQR = (greater value - lower value) from the plot = 6 - 5 = 1

IQR = (greater value - lower value) from the plot = 4 - 3 = 1

The IQR of each bag’s sample medians is 1, which is lower than the IQR of the samples in the preceding example. The sample medians for bag A are generally higher than those for bag B. As a result, the number of tiles in bag A is generally higher than the number of tiles in bag B.

2. The medians of 50 random samples of 10 speeding tickets issued in two states are represented by the double box and whisker plot. The costs of speeding tickets in the two states are compared.

There is sufficient information to draw conclusions about the costs of speeding tickets in both states. Find the center and variation measures for each state’s sample medians. Then compare the information.

STATE A

STATE B

Median = 54 (it is in the middle of the plot)

Median = 70 (it is in the middle of the plot)

IQR = (greater value - lower value) from the plot = 67 - 48 = 19

IQR = (greater value - lower value) from the plot = 77 - 67 = 10

State A has more variation than state B, and state B’s measure of center is greater than state A’s measure of center. As a result, you can deduce that the cost of speeding tickets varies more in state A, but that speeding tickets in state B are generally more expensive.

When both distributions are symmetric, use the mean and mean absolute deviation (MAD) methods to compare two populations. When one or both distributions are skewed, use the median and interquartile range (IQR).

We can describe the difference between two sample means using the MAD. To figure out how many MADs separate the means, divide the difference in the sample means by the larger MAD.

The following are some examples of median applications:

1. Choosing the correct film genre

2. Explaining the Poverty Line by Grouping Data