Plotting Data in Linear Form Plotting Data in Linear Form (Definition, Types and Examples)- BYJUS

# Plotting Data in Linear Form

‘Data’ is a term that we hear quite often. Data is a collection of individual facts, statistics, or information. We can represent data in different forms like scatter plots and line of fit. We will learn the term and properties related to each method and understand the advantages and disadvantages of each method....Read MoreRead Less

## Plotting Data in Linear Form

Data: Data is a very common term and you will hear it in a number of places. So what does data mean? Well, data is important information that helps us answer different types of questions of importance. Basically, data is a set of facts like numbers, words, measurements or observations. It is collected through observation, questioning, surveys or measurement.

For analysis, data is frequently organized in graphs or charts and may include facts, numbers, or measurements. A single piece of data is referred to as a datum.

For example, The school students are going on a trip. Their school teachers collected the parent’s phone numbers of their students. Here the set of phone numbers is called the data.

Data set: A data set is a collection of information. A data set corresponds to one or more database tables in the case of tabular data, with each column representing a specific variable and each row representing a specific record of the data set in question. A collection of documents or files can also be included in a data set.

For example, In the table below, the height and weight of students are represented as a data set.

Student

Height

Weight

Joseph

4 feet

52 kgs

John

4 feet 5 inches

63 kgs

Mary

5 feet

50 kgs

## How to Represent Data?

Pie charts, Pareto charts, pictograms, plots and graphs are useful when representing data. There are many different ways to display data graphically but in the sections below we’ll focus on one very useful type of graphical representation called a plot graph.

## Using Plot Graphs

A plot is a graphical representation of a data set, usually in the form of a graph showing the relationship between two or more independent variables. The plot can be created by hand or by using a computer. Graphs are a visual representation of the correlation between variables that are very useful to quickly derive an understanding that would otherwise be impossible to derive from just a list of values.

For example, A record of the daily temperatures for ten days is data but when plotted we can determine whether the temperature is rising or falling on a daily basis, and thus forecast the temperature for upcoming days.

## Scatter plots

Scatter plots are used to assess the relationship between two discrete variables. These graphs compare the differences between two variables at the same time.

A scatter chart can indicate whether there is a linear relationship between the data (that is a straight line).

## How to make a Scatter graph?

To make a scatter graph, follow the steps below:

1. Determine which variables are independent and dependent.
2. Determine whether each variable is continuous or not before selecting the appropriate graph type. Measure the values that will appear on the X and Y axes.
3. If somehow the values are continuous, they must be spaced evenly according to the value.
4. Include units when labeling the X and Y axes.
5. Graph your information, put a dot or a symbol where the x-axis value intersects the y-axis value for each pair of the data. (If two dots collide, place them next to each other so you can see both).
6. Examine the pattern of points to see if there is a clear relationship. The variables are correlated if the data clearly forms a line or a curve. You could try regression or correlation analysis right now.
7. Your graph should have a descriptive caption. Data tables have a caption below the figure, while graphs have the caption above them.

For example,  let us consider this data of the heights and weights of infants at a hospital under ventilation.

Height of infants

Weight of infants

0.7

2.7

1.8

3.2

2.6

0.8

4.5

1.6

6.4

2.1

7.2

0.9

Scatter plot diagram is given below: To represent the data, use ordered pairs (x,y), where x represents the height of the infant and y represents the weight of the infant. Then make a scatter plot with the ordered pairs in a coordinate plane.

## What are Outliers, Gaps and Clusters?

• Clusters: A collection of values that is kept separate from other groups.
• Outliers: Some minorities have values that differ significantly from the majority.
• Peaks: The distribution’s highest value.
• Gaps are the ‘large’ spaces between a few data points.

## How to identify relationships between Data in a Scatter plot?

Scatter plots frequently reveal patterns or relationships. We say there is a positive correlation between the variables when the ‘y’ variable tends to increase as the ‘x’ variable increases. We say there is a negative correlation between the variables when the ‘y’ variable tends to decrease as the ‘x’ variable increases. We say there is no correlation between the two variables when there is no clear relationship between them.

## Line of Best Fit

A trendline is another name for the line of best fit. A line of best fit is a line that passes through a scatter plot of data points and best expresses the relationship between them.

## How to make a Line of best Fit?

The data points will further appear as a scatter plot on a chart as a collection of points which may or may not be organized along any line. If a linear pattern emerges, a line of best fit that minimizes the distance between those points can be drawn.

Using graphing calculator to find line of best fit: A graphing calculator is a portable computer that can plot graphs, solve simultaneous equations, and perform other tasks that require variables.

Linear regression is used by graphing calculators to find the best fit line. The correlation coefficient is often denoted by the letter ‘r’ in calculators. ‘r’ has values ranging from 1 to -1, with values near -1 implying a strong negative correlation, values near 1 implying a strong positive correlation, and values near 0 indicates no correlation. The correlation coefficients are used to quantify the potential of a relationship between two variables.

## Using a Line of Fit to Model Data

A line of fit can be used to model data by determining the equation of the line of fit.

Step 1: First, plot the data in a scatter plot.

Step 2: Determine if the data can be represented by a line.

Step 3: Draw the line that appears to closely match the data. There should roughly be an equal number of points above and below the line.

Step 4: Find the slope of the line using the formula used to calculate slope $$(m)=\frac{\text{Rise}}{\text{Run}}$$

$$y – y1 = m ( x – x1 )$$

Step 5: Using two points on the line, write an equation. The points don’t have to represent the actual data pairs but they do have to be on the line of fit.

## Relationship between Data using Line of Fit

A strong correlation exists when the data points are near to the line of best fit. A weak correlation exists when the data points are not near the line of best fit.

## Solved Examples

Example 1:  Make the scatter plot from the data given below and find the line of fit. Identify any outlier or cluster.

x values

y values

0.7

2.7

1.9

3.1

3.6

5.6

5.5

5.9

8.4

6.1

9.2

7.3

Solution: The data is represented using ordered pairs ( x, y ), where ‘x’ represents the x-axis values and ‘y’ represents the y-axis values. Then make a scatter plot with the ordered pairs inside a coordinate plane.

The scatter plot shows a positive linear relationship.

Draw a line that is close to the data points.

If we observe the plot, we see that an outlier appears to exist at ( 3.6, 5.6 ).

Example 2: The table depicts a plant’s growth during a period of 5 weeks. The data is modeled by the equation y = 4x, where y represents the growth in cm and x represents the week. Is the model a perfect fit? Explain.

week

growth (cm)

1

3.5

2

4.1

3

4.3

4

4.9

5

6.1

Solution:  To determine if the model is a perfect fit or not we will have to represent the data  using ordered pairs. Then make a scatter plot with the ordered pairs in a coordinate plane as follows.

We will determine the differences between the actual growth and the growth given by the model for each value of the week (or x coordinate). This difference is called the residual.

Week numbers

Growth (cm)

Y - value (or growth) from the model using     y = 4x

Residual

(Growth - Y value from the model)

1

3.5

4

3.5 - 4 = - 0.5

2

4.1

8

4.1 - 8 = - 3.9

3

4.3

12

4.3 - 12 = - 7.7

4

4.9

16

4.9 - 16 = - 11.1

5

6.1

20

6.1 - 20 = - 13.9

Plot the ordered pairs  (Week numbers, Residual). Draw a line of fit close to the data points.

Example 3: The points are distributed evenly along the line. So the equation y = 4 x is a good fit.

• The table depicts a Photographer’s total earnings y (in dollars) over x hours of work.

Hours worked

total earnings

3

25

4

40

2

20

6

62

5

51

• Model the photographer’s earnings as a function of the number of hours he or she works.
• Find the slope and equation of the line of fit. Find out how much the photographer earns per hour.

Solution:  To represent the data, use ordered pairs ( x, y ), where x represents hours worked and y represents total earnings. Then make a scatter plot with the ordered pairs in a coordinate plane.

Use points (4, 40) and (5, 51).

The slope of the line is slope  $$(m)~=~\frac{\text{Rise}}{\text{Run}}~=~\frac{51~-~40}{5~-~4}~=~\frac{11}{1}$$ or 11

$$y – y1 = m ( x – x1 )$$

$$y – 40 = 11 ( x – 4 )$$

$$y – 40 = 11x – 44$$

$$y = 11x -4$$

An equation of the line of fit is $$y = 11x -4$$.

• The slope of the line is 11. This means that the photographer earns about \$11 per hour.