Chi Squared Test - Bioinformatics Notes

It is the most popular non parametric test of statistical significance for bivariate tabular analysis with a contingency table. This test can be applied to nominal or categorical data, but cannot be used in ranking technique.

Conditions to apply \(\chi^2 \) test

The frequency most be absolute not relative
The sample observation should be independent
The total frequency should be reasonably large that is n >50
The expected frequency of each cell should not be less than 5

Conditions to apply \(\chi^2 \) test

Test of independence
Test of Goodness of Fit
Test of Homogeneity

Formulae

\(\chi^2 = \sum \frac{(O – E)^2}{E} \)

Where,

O : Observed Frequency, E: Expected Frequency

\(E = \frac{RT \times CT}{GT} \)

Where, RT: Row Total for individual frequency, CT: Column Total for individual Frequency, GT: Grand Total of all frequency

1. Test of Independence

It is used to test the relationships between two categorical variable where df = (r-1)(c-1). It use contingency table.

a. By Direct Method

	Category A		RT
Category B	a	b	a+b
	c	d	c+d
CT	a+c	b+d	GT: a+b+c+d

\(O\)	\(E = \frac{RT \times CT}{GT} \)	\(O-E\)	\((O-E)^2 \)	\(\frac{(O-E)^2}{E} \)
a	\(E_a = \frac{(a+b) \times (a+c)}{GT} \)	\(a-E_a\)	\((a-E_a)^2\)	\(\frac{(a-E_a)^2}{E_a}\)
b	\(E_b = \frac{(a+b) \times (b+d)}{GT} \)	\(b-E_b\)	\((b-E_b)^2\)	\(\frac{(b-E_b)^2}{E_b}\)
c	\(E_c = \frac{(c+d) \times (a+c)}{GT} \)
d	\(E_d = \frac{(c+d) \times (b+d)}{GT} \)
	\(\sum O\)	\(\sum E\)		\(\sum \frac{(O – E)^2}{E} \)

Test Statistics

\(\chi^2 = \sum \frac{(O – E)^2}{E} \)

Degree of Freedom

\(df = (r-1)(c-1) = (2-1)(2-1) = 1 \)

Level of Significance

Use generally 5% level of significance \(\alpha = 0.05\)

Critical Value

Use chi squared test critical value table and find critical value by using level of significance (0.05) and degree of freedom (1).

Decision

If \(\chi^2_{cal} \leq \chi^2_{tab} \) ; Accept \(H_0\) and Reject \(H_1\) and vice versa

b. By Contingency Table

a	b	a+b
c	d	c+d
a+c	b+d	N = a+b+c+d

\(\chi^2 = \frac{N(ad-bc)^2}{(a+b)(c+d)(a+c)(b+d)} \)

If value of one of the cell(that is either a or b or c or d or both or all) is less than 5 then use Yates Correction

Yates Correction

\(\chi^2 = \frac{N[|ad-bc| – \frac{N}{2}]^2}{(a+b)(c+d)(a+c)(b+d)}\)

2. Test of Goodness of Fit

It is used to test if observed data matches a specific distribution use one categorical variable where df = k-1.

x	x1	x2	x3	x4	x5
f	f1	f2	f3	f4	f5

Observed Frequency (O) = f1, f2, f3, f4, f5

Expected Frequency (E) = \(\frac{\sum O}{n}\)

x	f(O)	f(E)	O-E	\((O-E)^2 \)	\(\frac{(O-E)^2}{E} \)
x1	f1	\(\frac{\sum O}{n}\)
x2	f2	\(\frac{\sum O}{n}\)
x3	f3	\(\frac{\sum O}{n}\)
x4	f4	\(\frac{\sum O}{n}\)
x5	f5	\(\frac{\sum O}{n}\)
n=5	\(\sum O\)	\(\sum E\)			\(\sum \frac{(O – E)^2}{E} \)

Test Statistics

\(\chi^2 = \sum \frac{(O – E)^2}{E} \)

Degree of Freedom

\(df = k-1 = 5-1 = 4 \)

Conditions to apply \(\chi^2 \) test

Conditions to apply \(\chi^2 \) test

Formulae

1. Test of Independence

a. By Direct Method

Test Statistics

Degree of Freedom

Level of Significance

Critical Value

Decision

b. By Contingency Table

Yates Correction

2. Test of Goodness of Fit

Test Statistics

Degree of Freedom

Related Posts

Leave a Comment Cancel Reply