When we display the data distribution in a standardized way using 5 summary – minimum, Q1 (First Quartile), median, Q3(third Quartile), and maximum, it is called a Box plot. It is also termed as box and whisker plot when the lines extending from the boxes indicate variability outside the upper and lower quartiles. Outliers can be plotted as unique points.
Table of contents:
In simple words, we can define box plot in terms of descriptive statistics related concepts. That means box or whiskers plot is a method used for depicting groups of numerical data through their quartiles graphically. These may also have some lines extending from the boxes or whiskers which indicates the variability outside the lower and upper quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers can be indicated as individual points.
It helps to find out how much the data values vary or spread out with the help of graphs. As we need more information than just knowing the measures of central tendency, this is where the box plot helps. This also takes less space. It is also a type of pictorial representation of data.
The method to summarize a set of data which is measured using an interval scale is called a box and whisker plot. These are maximum used for data analysis. We use these types of graphs or graphical representation to know:
- Distribution Shape
- Central Value of it
- Variability of it
A box plot is a chart that shows data from a five-number summary including one of the measures of central tendency. It does not show the distribution in particular as much as a stem and leaf plot or histogram does. But it is primarily used to indicate a distribution is skewed or not and if there are potential unusual observations (also called outliers) present in the data set. Boxplots are also very beneficial when large numbers of data sets are involved or compared.
Since, the centre, spread and overall range are immediately apparent, using these boxplots the distributions can be compared easily.
A box and whisker plot is a method of compiling a set of data mapped on an interval scale. It is also used for descriptive data analysis. The graph plotted here is used to show the shape of the distribution, its central value, and its variability.
Also, Try: Box and Whisker Plot Calculator
Parts of Box Plots
Check the image below which shows the outliers (minimum and maximum), median and interquartile range.
Boxplot on Normal Distribution
- Median – The mid-value(Vertical line inside the box)
- First quartile – the mid-value between the lowest number and median.(Upper Quartile)
- Third Quartile – the mid-value between the median and the largest value. (Lower Quartile)
- Interquartile range – Range between 25th percentile to 75th percentile
- Whiskers – The 2 lines extending to highest and lowest observations outside the box
- Maximum – Third Quartile + 1.5 * (Interquartile range)
- Minimum – First quartile – 1.5 * (Interquartile range)
Box Plot Chart
In a box and whisker plot:
- the ends of the box are the upper and lower quartiles so that the box crosses the interquartile range
- a vertical line inside the box marks the median
- the two lines outside the box are the whiskers extending to the highest and lowest observations.
It is used to know:
- the outliers and its values
- symmetry of Data
- tight grouping of data
- data skewness -if, in which direction and how