It is the most popular non parametric test of statistical significance for bivariate tabular analysis with a contingency table. This test can be applied to nominal or categorical data, but cannot be used in ranking technique.
Conditions to apply \(\chi^2 \) test
- The frequency most be absolute not relative
- The sample observation should be independent
- The total frequency should be reasonably large that is n >50
- The expected frequency of each cell should not be less than 5
Conditions to apply \(\chi^2 \) test
- Test of independence
- Test of Goodness of Fit
- Test of Homogeneity
Formulae
\(\chi^2 = \sum \frac{(O – E)^2}{E} \)
Where,
O : Observed Frequency, E: Expected Frequency
\(E = \frac{RT \times CT}{GT} \)
Where, RT: Row Total for individual frequency, CT: Column Total for individual Frequency, GT: Grand Total of all frequency
1. Test of Independence
It is used to test the relationships between two categorical variable where df = (r-1)(c-1). It use contingency table.
a. By Direct Method
| Category A | RT | ||
| Category B | a | b | a+b |
| c | d | c+d | |
| CT | a+c | b+d | GT: a+b+c+d |
| \(O\) | \(E = \frac{RT \times CT}{GT} \) | \(O-E\) | \((O-E)^2 \) | \(\frac{(O-E)^2}{E} \) |
| a | \(E_a = \frac{(a+b) \times (a+c)}{GT} \) | \(a-E_a\) | \((a-E_a)^2\) | \(\frac{(a-E_a)^2}{E_a}\) |
| b | \(E_b = \frac{(a+b) \times (b+d)}{GT} \) | \(b-E_b\) | \((b-E_b)^2\) | \(\frac{(b-E_b)^2}{E_b}\) |
| c | \(E_c = \frac{(c+d) \times (a+c)}{GT} \) | |||
| d | \(E_d = \frac{(c+d) \times (b+d)}{GT} \) | |||
| \(\sum O\) | \(\sum E\) | \(\sum \frac{(O – E)^2}{E} \) |
Test Statistics
\(\chi^2 = \sum \frac{(O – E)^2}{E} \)
Degree of Freedom
\(df = (r-1)(c-1) = (2-1)(2-1) = 1 \)
Level of Significance
Use generally 5% level of significance \(\alpha = 0.05\)
Critical Value
Use chi squared test critical value table and find critical value by using level of significance (0.05) and degree of freedom (1).
Decision
If \(\chi^2_{cal} \leq \chi^2_{tab} \) ; Accept \(H_0\) and Reject \(H_1\) and vice versa
b. By Contingency Table
| a | b | a+b |
| c | d | c+d |
| a+c | b+d | N = a+b+c+d |
\(\chi^2 = \frac{N(ad-bc)^2}{(a+b)(c+d)(a+c)(b+d)} \)
If value of one of the cell(that is either a or b or c or d or both or all) is less than 5 then use Yates Correction
Yates Correction
\(\chi^2 = \frac{N[|ad-bc| – \frac{N}{2}]^2}{(a+b)(c+d)(a+c)(b+d)}\)
2. Test of Goodness of Fit
It is used to test if observed data matches a specific distribution use one categorical variable where df = k-1.
| x | x1 | x2 | x3 | x4 | x5 |
| f | f1 | f2 | f3 | f4 | f5 |
Observed Frequency (O) = f1, f2, f3, f4, f5
Expected Frequency (E) = \(\frac{\sum O}{n}\)
| x | f(O) | f(E) | O-E | \((O-E)^2 \) | \(\frac{(O-E)^2}{E} \) |
| x1 | f1 | \(\frac{\sum O}{n}\) | |||
| x2 | f2 | \(\frac{\sum O}{n}\) | |||
| x3 | f3 | \(\frac{\sum O}{n}\) | |||
| x4 | f4 | \(\frac{\sum O}{n}\) | |||
| x5 | f5 | \(\frac{\sum O}{n}\) | |||
| n=5 | \(\sum O\) | \(\sum E\) | \(\sum \frac{(O – E)^2}{E} \) |
Test Statistics
\(\chi^2 = \sum \frac{(O – E)^2}{E} \)
Degree of Freedom
\(df = k-1 = 5-1 = 4 \)
