Fisher's Exact Test - Bioinformatics Notes

This test is used to determine if there are nonrandom associations between two categorical variables in a contingency table usually 2 x 2 tables. It is done when sample sizes are small and the chi-squared test assumptions are violated (expected frequency >= 5).

When to Use Fisher’s Exact Test?

When sample sizes are small (typically when the expected frequency in any cell of the table is less than 5).
When analyzing categorical data in a 2×2 contingency table.
When the chi-square test’s assumptions (large sample size) do not hold.

Steps to Perform Fisher’s Exact Test

Create a 2×2 contingency table with observed frequencies.
Calculate the probability of obtaining the observed table using the Fisher’s Exact Test formula.
Compute the p-value, which is the sum of probabilities of all tables with equal or lower probability.
Compare the p-value with the significance level (α, usually 0.05):
- If p ≤ α, reject the null hypothesis (evidence of association).
- If p > α, fail to reject the null hypothesis (no significant association).

Hypothesis for Fisher’s Exact Test

Null Hypothesis (H₀): There is no association between the two categorical variables (they are independent).
Alternative Hypothesis (H₁): There is an association between the two categorical variables (they are dependent).

2×2 Contingency Table Format

A contingency table is used to summarize the frequency counts of two categorical variables:

	Category B₁	Category B₂	Total
A₁	a	b	a+b
A₂	c	d	c+d
Total	a+c	b+d	N

where:

a, b, c, and d are observed frequencies in each cell.
N is the total sample size.

Mathematical Formula for Fisher’s Exact Test

\(P= \frac{^{(a+b)}C_a.^{(c+d)}C_c}{^NC_{(a+c)}} \) Where C is Combination

Or,

\(P= \frac{(a+b)!.(c+d)!.(a+c)!.(b+d)!}{N!.a!.b!.c!.d!} \)

Interpretation of the Formula

The test calculates the probability of observing the specific arrangement of the table under the assumption that row and column totals are fixed.
The p-value is the sum of probabilities of all tables that have a probability equal to or smaller than the observed table.

Test statistics

\(P = P_k +P_{k-1} +P_{k-2} + P_{k-3}+ …………. +P_o\)

Where

\(P\) : P value of Fisher’s Exact Test
\(P_o\) : is the probability of the observed contingency table (i.e., the exact table we got from the data)
\(P_{k-1}\) or \(P_{k-2}\) : Probabilities of more extreme tables than \(P_o\)
\(P_k\) : Probability of the most extreme table possible.

Example

	Category B₁	Category B₂	Total
A₁	1	6	7
A₂	4	1	5
Total	5	7	12

Hypothesis:

Null: Proportion are equal
Alternate: Proportion are not equal

Test Statistics Calculation

\(P_o =\frac{(a+b)!.(c+d)!.(a+c)!.(b+d)!}{N!.a!.b!.c!.d!} \)

Or, \(P_o =\frac{(7)!.(5)!.(5)!.(7)!}{12!.1!.6!.4.1!} = 0.0441 \)

Now modify value of contingency table keeping sum of all of them constant to get extreme table

	Category B₁	Category B₂	Total
A₁	0	7	7
A₂	5	0	5
Total	5	7	12

\(P_k =\frac{(a+b)!.(c+d)!.(a+c)!.(b+d)!}{N!.a!.b!.c!.d!} \)

Or, \(P_k =\frac{(7)!.(5)!.(5)!.(7)!}{12!.0!.7!.5.0!} = 0.00126 \)

Final Test Statistics Calculation

\(P = P_o + P_k\)

Or, \(P = 0.0441 + 0.00126 \) = 0.045326

Decision:

\(P(0.045326 < \alpha (0.05) \) Hence, do not accept null hypothesis