Correlation

“Correlation” refers to a process for establishing whether or not relationships exist between two variables. You learned that a way to get a general idea about whether or not two variables are related is to plot them on a “scatter plot”.

While there are many measures of association for variables which are measured at the ordinal or higher level of measurement, correlation is the most commonly used approach. This section shows how to calculate and interpret correlation coefficients for ordinal and interval level scales. Methods of correlation summarize the relationship between two variables in a single number called the correlation coefficient. The correlation coefficient is 795 796 usually given the symbol r and it ranges from -1 to +1.

A correlation coefficient quite close to 0, but either positive or negative, implies little or no relationship between the two variables. A correlation coefficient close to plus 1 means a positive relationship between the two variables, with increases in one of the variables being associated with increases in the other variable.

 A correlation coefficient close to -1 indicates a negative relationship between two variables, with an increase in one of the variables being associated with a decrease in the other variable. A correlation coefficient can be produced for ordinal, interval or ratio level variables, but has little meaning for variables which are measured on a scale which is no more than nominal.

For ordinal scales, the correlation coefficient which is usually calculated is Spearman’s rho. For interval or ratio level scales, the most commonly used correlation coefficient is Pearson’s r, ordinarily referred to as simply the correlation coefficient.

The correlation coefficient

The correlation coefficient, r, is a summary measure that describes the extent of the statistical relationship between two interval or ratio level variables. The correlation coefficient is scaled so that it is always between -1 and +1. When r is close to 0 this means that there is little relationship between the variables and the farther away from 0 r is, in either the positive or negative direction, the greater the relationship between the two variables.

 The two variables are often given the symbols X and Y . In order to illustrate how the two variables are related, the values of X and Y are pictured by drawing the scatter diagram, graphing combinations of the two variables. The scatter diagram is given first, and then the method of determining Pearson’s r is presented. In presenting the following examples, relatively small sample sizes are given. Later, data from larger samples are given.

Scatter Diagram

 A scatter diagram is a diagram that shows the values of two variables X and Y , along with the way in which these two variables relate to each other. The values of variable X are given along the horizontal axis, with the values of the variable Y given on the vertical axis. For purposes of drawing a scatter diagram, and determining the correlation coefficient, it does not matter which of the two variables is the X variable, and which is Y.

Later, when the regression model is used, one of the variables is defined as an independent variable, and the other is defined as a dependent variable. In regression, the independent variable X is considered to have some effect or influence on the dependent variable Y . Correlation methods are symmetric with respect to the two variables, with no indication of causation or direction of influence being part of the statistical consideration. A scatter diagram is given in the following example. The same example is later used to determine the correlation coefficient.

Example 1.

Years of Education and Age of Entry to Labour Force Table.2 gives the number of years of formal education (X) and the age of entry into the labour force (Y ), for 12 males from the Regina Labour Force Survey. Both variables are measured in years, a ratio level of measurement and the highest level of measurement. All of the males are aged 30 or over, so that most of these males are likely to have completed their formal education.

Respondent Number Years of Education,X Age of Entry into Labour Force, Y
1 10 16
2 12 17
3 15 18
4 8 15
5 20 18
6 17 22
7 12 19
8 15 22
9 12 18
10 10 15
11 8 18
12 10 16

Table 1. Years of Education and Age of Entry into Labour Force for 12 Regina Males

Since most males enter the labour force soon after they leave formal schooling, a close relationship between these two variables is expected. By looking through the table, it can be seen that those respondents who obtained more years of schooling generally entered the labour force at an older age. The mean years of schooling is X¯ = 12.4 years and the mean age of entry into the labour force is Y¯ = 17.8, a difference of 5.4 years.

Correlation

This difference roughly reflects the age of entry into formal schooling, that is, age
five or six. It can be seen though that the relationship between years of schooling and age of entry into the labour force is not perfect. Respondent 11, for example, has only 8 years of schooling but did not enter the labour force until age 18. In contrast, respondent 5 has 20 years of schooling, but entered the labour force at age 18. The scatter diagram provides a quick way of examining the relationship between X and Y


Practise This Question

Which of the following equations are correct?

i)(a+b+c)2=a2+b2+c22(ab+bc+ca)

ii)(a+b)3=a3+b3+3ab(ab)

iii)(ab)3=a3b33ab(a+b)

iv)a2b2=(a+b)(ab)