Home / United States / Math Classes / 8th Grade Math / Plotting Data in Linear Form
‘Data’ is a term that we hear quite often. Data is a collection of individual facts, statistics, or information. We can represent data in different forms like scatter plots and line of fit. We will learn the term and properties related to each method and understand the advantages and disadvantages of each method....Read MoreRead Less
Data: Data is a very common term and you will hear it in a number of places. So what does data mean? Well, data is important information that helps us answer different types of questions of importance. Basically, data is a set of facts like numbers, words, measurements or observations. It is collected through observation, questioning, surveys or measurement.
For analysis, data is frequently organized in graphs or charts and may include facts, numbers, or measurements. A single piece of data is referred to as a datum.
For example, The school students are going on a trip. Their school teachers collected the parent’s phone numbers of their students. Here the set of phone numbers is called the data.
Data set: A data set is a collection of information. A data set corresponds to one or more database tables in the case of tabular data, with each column representing a specific variable and each row representing a specific record of the data set in question. A collection of documents or files can also be included in a data set.
For example, In the table below, the height and weight of students are represented as a data set.
Student | Height | Weight |
---|---|---|
Joseph | 4 feet | 52 kgs |
John | 4 feet 5 inches | 63 kgs |
Mary | 5 feet | 50 kgs |
Pie charts, Pareto charts, pictograms, plots and graphs are useful when representing data. There are many different ways to display data graphically but in the sections below we’ll focus on one very useful type of graphical representation called a plot graph.
A plot is a graphical representation of a data set, usually in the form of a graph showing the relationship between two or more independent variables. The plot can be created by hand or by using a computer. Graphs are a visual representation of the correlation between variables that are very useful to quickly derive an understanding that would otherwise be impossible to derive from just a list of values.
For example, A record of the daily temperatures for ten days is data but when plotted we can determine whether the temperature is rising or falling on a daily basis, and thus forecast the temperature for upcoming days.
Scatter plots are used to assess the relationship between two discrete variables. These graphs compare the differences between two variables at the same time.
A scatter chart can indicate whether there is a linear relationship between the data (that is a straight line).
To make a scatter graph, follow the steps below:
For example, let us consider this data of the heights and weights of infants at a hospital under ventilation.
Height of infants | Weight of infants |
---|---|
0.7 | 2.7 |
1.8 | 3.2 |
2.6 | 0.8 |
4.5 | 1.6 |
6.4 | 2.1 |
7.2 | 0.9 |
Scatter plot diagram is given below: To represent the data, use ordered pairs (x,y), where x represents the height of the infant and y represents the weight of the infant. Then make a scatter plot with the ordered pairs in a coordinate plane.
Scatter plots frequently reveal patterns or relationships. We say there is a positive correlation between the variables when the ‘y’ variable tends to increase as the ‘x’ variable increases. We say there is a negative correlation between the variables when the ‘y’ variable tends to decrease as the ‘x’ variable increases. We say there is no correlation between the two variables when there is no clear relationship between them.
A trendline is another name for the line of best fit. A line of best fit is a line that passes through a scatter plot of data points and best expresses the relationship between them.
The data points will further appear as a scatter plot on a chart as a collection of points which may or may not be organized along any line. If a linear pattern emerges, a line of best fit that minimizes the distance between those points can be drawn.
Using graphing calculator to find line of best fit: A graphing calculator is a portable computer that can plot graphs, solve simultaneous equations, and perform other tasks that require variables.
Linear regression is used by graphing calculators to find the best fit line. The correlation coefficient is often denoted by the letter ‘r’ in calculators. ‘r’ has values ranging from 1 to -1, with values near -1 implying a strong negative correlation, values near 1 implying a strong positive correlation, and values near 0 indicates no correlation. The correlation coefficients are used to quantify the potential of a relationship between two variables.
A line of fit can be used to model data by determining the equation of the line of fit.
Step 1: First, plot the data in a scatter plot.
Step 2: Determine if the data can be represented by a line.
Step 3: Draw the line that appears to closely match the data. There should roughly be an equal number of points above and below the line.
Step 4: Find the slope of the line using the formula used to calculate slope \((m)=\frac{\text{Rise}}{\text{Run}}\)
\(y – y1 = m ( x – x1 )\)
Step 5: Using two points on the line, write an equation. The points don’t have to represent the actual data pairs but they do have to be on the line of fit.
A strong correlation exists when the data points are near to the line of best fit. A weak correlation exists when the data points are not near the line of best fit.
Example 1: Make the scatter plot from the data given below and find the line of fit. Identify any outlier or cluster.
x values | y values |
---|---|
0.7 | 2.7 |
1.9 | 3.1 |
3.6 | 5.6 |
5.5 | 5.9 |
8.4 | 6.1 |
9.2 | 7.3 |
Solution: The data is represented using ordered pairs ( x, y ), where ‘x’ represents the x-axis values and ‘y’ represents the y-axis values. Then make a scatter plot with the ordered pairs inside a coordinate plane.
The scatter plot shows a positive linear relationship.
Draw a line that is close to the data points.
If we observe the plot, we see that an outlier appears to exist at ( 3.6, 5.6 ).
Example 2: The table depicts a plant’s growth during a period of 5 weeks. The data is modeled by the equation y = 4x, where y represents the growth in cm and x represents the week. Is the model a perfect fit? Explain.
week | growth (cm) |
---|---|
1 | 3.5 |
2 | 4.1 |
3 | 4.3 |
4 | 4.9 |
5 | 6.1 |
Solution: To determine if the model is a perfect fit or not we will have to represent the data using ordered pairs. Then make a scatter plot with the ordered pairs in a coordinate plane as follows.
We will determine the differences between the actual growth and the growth given by the model for each value of the week (or x coordinate). This difference is called the residual.
Week numbers | Growth (cm) | Y - value (or growth) from the model using y = 4x | Residual (Growth - Y value from the model) |
---|---|---|---|
1 | 3.5 | 4 | 3.5 - 4 = - 0.5 |
2 | 4.1 | 8 | 4.1 - 8 = - 3.9 |
3 | 4.3 | 12 | 4.3 - 12 = - 7.7 |
4 | 4.9 | 16 | 4.9 - 16 = - 11.1 |
5 | 6.1 | 20 | 6.1 - 20 = - 13.9 |
Plot the ordered pairs (Week numbers, Residual). Draw a line of fit close to the data points.
Example 3: The points are distributed evenly along the line. So the equation y = 4 x is a good fit.
Hours worked | total earnings |
---|---|
3 | 25 |
4 | 40 |
2 | 20 |
6 | 62 |
5 | 51 |
Solution: To represent the data, use ordered pairs ( x, y ), where x represents hours worked and y represents total earnings. Then make a scatter plot with the ordered pairs in a coordinate plane.
Use points (4, 40) and (5, 51).
The slope of the line is slope \((m)~=~\frac{\text{Rise}}{\text{Run}}~=~\frac{51~-~40}{5~-~4}~=~\frac{11}{1}\) or 11
\(y – y1 = m ( x – x1 )\)
\(y – 40 = 11 ( x – 4 )\)
\(y – 40 = 11x – 44\)
\(y = 11x -4\)
An equation of the line of fit is \(y = 11x -4\).
Scatter plots, like line graphs, begin with the mapping of quantitative data points. The difference is that in a scatter plot, it is decided that the individual points must not be connected directly with a line but should instead express a trend. This trend can be seen by looking at the distribution of points or by adding a line of regression.
The purpose of plotting scientific data is to visualize variation or show relationships between variables.