Correlation

Correlation is a statistical measure that expresses the extent to which two variables change together. It shows the relationship between two variables, indicating whether an increase in one variable results in an increase or decrease in the other.

Types of Correlation

  1. Based on Direction:
    • Positive Correlation: Both variables move in the same direction. Example: Height and weight.
    • Negative Correlation: One variable increases while the other decreases. Example: Price and demand.
    • No Correlation: No apparent relationship between the two variables.
  2. Based on Strength:
    • Perfect Correlation (+1 or -1): One variable changes proportionally with the other.
    • High Correlation (Close to +1 or -1): A strong relationship but not perfect.
    • Low Correlation (Close to 0): A weak relationship.
  3. Based on Nature:
    • Linear Correlation: The relationship between variables forms a straight-line pattern.
    • Non-Linear (Curvilinear) Correlation: The relationship does not follow a straight-line pattern.

Methods for Measuring Correlation

  • Measuring Simple Correlation
  • Measuring Partial Correlation
  • Measuring Multiple Correlation

Measuring Simple Correlation

It measure relationship between two variables only.

1. Scatter Diagram

2. Karl Pearson’s Coefficient of Correlation (r)

Let X and Y be two correlated variable then Karl Pearson’s Correlation Coefficient denoted by \(r_{xy}\) or r(X,Y) or simply r. It is also called product moment correlation coefficient or simple correlation coefficient or simply a correlation. It is defined as follows.

\(r = \frac{COV(X,Y)}{\sqrt{Var(X)}. \sqrt{Var(Y)}}\)

Or, \(r = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\sum(x-\bar{x})^2} \sqrt{\sum(y-\bar{y})^2}}\) ………….. i

Or, \(r = \frac{n \sum xy -\sum x . \sum y}{\sqrt{n \sum x^2 – (\sum x)^2} . \sqrt{n \sum y^2 – (\sum y)^2}} \) ………………. ii

Where n = number of paired observation

You can also calculate correlation coefficient by taking assumed mean.

Let U = X-A, V = X-B, Where A and B are assumed mean of X and Y respectively. Then formula become,

\(r = \frac{n \sum uv -\sum u . \sum v}{\sqrt{n \sum u^2 – (\sum u)^2} . \sqrt{n \sum v^2 – (\sum v)^2}} \) ………………. ii

Interpretation of r:

  • If r = 0 to 0.5, there is low degree of correlation.
  • If r = 0.51 to 0.699, there is moderate degree of correlation
  • If 0.7 to 0.999, there is high degree of correlation.
  • If r = 1, there is perfectly positive correlation.
  • If r = -1, there is perfectly negative correlation.
  • If r = 0, there is no correlation, the variables are independent.

3. Spearman’s Rank Correlation

Ranks are the assignments of orders or priorities according to their status or importance. Karl Pearson’s correlation coefficient is specially useful when the data are quantitatively measured. There are some variables like beauty, knowledge, intelligence, honesty etc. which cannot be measured quantitatively directly. These types of variables can be measured by assigning ranks or some sorts of ratings. Then the degree of association that exists between the two sets of rank is known as rank correlation.

a. When rank is not repeated

\(r_s = 1- \frac{6 \sum d^2}{n(n^2-1)}\)

Where, d = difference between rank and n = number of paired data.

Example

\(x\)\(y\)\(R_x\)\(R_y\)\(d=R_x-R_y\)\(d^2\)
254045-11
21505239
27542111
26433300
314214-39
\(\sum d = 0 \)\(\sum d^2 = 20 \)

Where n= 5 (Number of Paired Observation) , Then use following formula

\(r_s = 1- \frac{6 \sum d^2}{n(n^2-1)}\)

Or, \(r_s = 1- \frac{6 \times 20}{5(5^2-1)}\)

Or, \(r_s = 1- \frac{120}{5(25-1)}\)

Or, \(r_s = 1- \frac{120}{5 \times 24}\)

Or, \(r_s = 1- \frac{120}{120}\)

Or, \(r_s = 1- 1\)

Or, \(r_s = 0\), There is no correlations

b. When rank is repeated

Formula

\(r_s = 1- \frac{6 \left[\sum d^2+\sum{\frac{m(m^2-1)}{12}}\right]}{n(n^2-1)}\)

Where \(\sum{\frac{m(m^2-1)}{12}}\) is correction factor for reprated rank.

If any rank is repeated calculated average rank for their rank repeatation as follows

\(x\)\(y\)\(R_x\)\(R_y\)\(d=R_x-R_y\)\(d^2\)
18461511416
78813211
46381114.5-3.512.25
3744131300
4738814.5-6.542.25
5647510-525
827514-39
475689-11
4663117.53.512.25
2871145981
4663117.53.512.25
75784311
81952111
4745812-416
50676600
\(\sum d = 0 \)\(\sum d^2 = 230 \)

Repeated Ranks

  • m1 for repeated rank 8 =3
  • m2 for repeated rank 11 =3
  • m3 for repeated rank 7.5 = 2
  • m4 for repeated rank 14.5 =2

Calculate correction factor

\(\sum{\frac{m(m^2-1)}{12}}\)

Or, \(\frac{3(3^2-1)}{12}+\frac{3(3^2-1)}{12}+\frac{2(2^2-1)}{12}+\frac{2(2^2-1)}{12} = 5 \)

Now

\(r_s = 1- \frac{6 \left[\sum d^2+\sum{\frac{m(m^2-1)}{12}}\right]}{n(n^2-1)}\)

Or, \(r_s = 1- \frac{6 \left[230-5\right]}{15(15^2-1)}\)

Or, \(r_s = 0.5804\). There is moderate degree of correlation.

Scroll to Top