Regression

Regression is a statistical technique used to model the relationship between a dependent variable (target) and one or more independent variables (predictors). It helps in understanding how changes in the independent variables affect the dependent variable and is widely used for prediction and forecasting.

Types of Regression

A. Linear Regression

Linear regression models the relationship between the dependent variable and independent variable(s) using a straight-line equation: \(Y = \beta_0 + \beta_1 X + \epsilon \)

Where:

  • Y= Dependent variable
  • X = Independent variable
  • β0​ = Intercept
  • β1​ = Slope (coefficient)
  • ϵ = Residual Error

Types of Linear Regression:

  1. Simple Linear Regression – One independent variable.
  2. Multiple Linear Regression – More than one independent variable.

B. Polynomial Regression

  • A nonlinear regression technique where the relationship between variables is modeled using an nth-degree polynomial equation:

\(Y = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + \dots + \beta_n X^n + \epsilon\)

  • Used when data shows a curvilinear relationship.

Applications of Regression

  • Finance: Stock price prediction, risk assessment.
  • Healthcare: Disease prediction, survival analysis.
  • Marketing: Customer segmentation, sales forecasting.
  • Economics: GDP prediction, demand forecasting.
  • Engineering: Quality control, reliability analysis.

Types of Linear Regression

a. Simple Linear Regression

Simple regression is a statistical technique that establishes a relationship between two variables: one independent variable (predictor) and one dependent variable (outcome). The purpose is to predict the value of the dependent variable based on the independent variable.

\(Y = a + b X + \epsilon \)

Simply, \(y = a + b x \)

Theoritical Fitted Regression Equation

\(y = a + b x \) ……………… i

For the value of a and b normal equation are

\(\sum y = na + b \sum x \) ………….. ii

\(\sum yx = a \sum x + b \sum x^2 \) ………….. iii

Get the value of a and b by solving above equation ii and iii and put the value of a and b in fitted estimated regression equation below to get dependent variable y with independent variable x.

\(\hat{y} = a + bx \)

b.Multiple Linear Regression

Multiple regression is an extension of simple regression that involves two or more independent variables to predict a single dependent variable. It is used when multiple factors influence the dependent variable.

\(Y = a + b_1 X_1+ b_2 X_2 + ………….. + b_n X_n + \epsilon \)

Solving Problem

Theoritical Fitted Regression Equation

\(y = a + b_1x_1 + b_2x_2 \) ……………… i

For the value of a and b normal equation are

\(\sum y = na + b_1 \sum x_1 + b_2 \sum x_2 \) ………….. ii

\(\sum yx _1 = a \sum x_1 + b_1 \sum x_1^2 + b_2 \sum x_1x_2 \) ………….. iii

\(\sum yx _2 = a \sum x_2 + b_1 \sum x_1x_2 + b_2 \sum x_2^2 \) ………….. iv

Get the value of a and b by solving above equation ii , iii and iv then put the value of a and b1 and b2 in estimated fitted regression equation below to get dependent variable y with independent variable x1 and x2.

\(\hat{y} = a + bx_1 + bx_2\)

Residual (Error) \(\epsilon\)

\(\epsilon = Y – \hat{Y}\)

Sum of Square of Error (SSE or \(\sum \epsilon^2\)) = \(\sum(Y-\hat{Y})^2\)

Standard Error of Estimate

\(S_E = \sqrt{\frac{\sum(y-\hat{y}^)2}{n-2}}\)

Or, \(S_E = \sqrt{\frac{\sum y^2 -a \sum y -b \sum xy}{n-2}}\)

The Coefficient of Determination \(r^2\)

The coefficient of determination \(r^2\) is a statistical measure that explains the proportion of variance in the dependent variable that is predictable from the independent variable(s). It is often used in correlation and regression analysis to assess the goodness of fit of a model.

Total Sum of Squares (SST):

\(SST = \sum(Y-\bar{Y})^2\)

Regression Sum of Squares (SSR):

\(SSR = \sum(\hat{Y}-\bar{Y})^2\)

Error Sum of Squares (SSE):

\(SSE = \sum(Y-\hat{Y})^2\)

SST = SSR +SSE

\(R^2 = \frac{SSR}{SST} = 1- \frac{SSE}{SST} = 1- \frac{\sum(y- \hat{y})^2}{\sum(y- \bar{y})^2}\)

Or, \(R^2 = \frac{a\sum{y}+b\sum{xy}-n(\bar{y})^2}{\sum{y^2}-n(\bar{y})^2} \)

Interpretation of \(r^2\)

  • \(r^2\) values range from 0 to 1:
    • \(r^2\) =1: Perfect fit (all variation in Y is explained by X).
    • \(r^2\) =0: No predictive power (none of the variation is explained).
    • 0<\(r^2\) <1: Some portion of variation in Y is explained by X.

Testing the Significance of Regression Coefficients

In regression analysis, we estimate coefficients that describe the relationship between the dependent variable (Y) and independent variables (X). However, we need to test whether these coefficients are statistically significant, meaning whether they truly influence Y or if the observed relationship is due to random variation.

The significance of a regression coefficient is tested using the t-test (for individual coefficients) and the F-test (for the overall model).

Hypotheses:

  • Null Hypothesis (H0): b​=0 (The independent variable X has no significant effect on Y)
  • Alternative Hypothesis (H1​): b​=0 (The independent variable X​ has a significant effect on Y)

If we reject H0​, we conclude that X significantly affects Y.

F Test (ANOVA)

ANOVA Table

Source of VariationSum of Squares (SS)Degrees of Freedom (df)Mean Square (MS)F-statistic
Regression (Explained)\(SSR = \sum(\hat{Y}-\bar{Y})^2\)k\(MSR=\frac{SSR}{k}\)\(F = \frac{MSR}{MSE}\)
Residual (Unexplained, Error)\(SSE = \sum(Y-\hat{Y})^2\)n−k−1\(MSE = \frac{SSE}{n-k-1}\)
Total Variation\(SST = \sum(Y-\bar{Y})^2\)n−1

If F-statistic is significant, it indicates that at least one independent variable contributes to explaining Y.

\(F = \frac{MSR}{MSE}\)

Decision Rule:

  1. Compute F-statistic.
  2. Compare with critical F-value from the F-table at α=0.05.
  3. If \( F > F_{\text{critical}}\)​, reject H0​ (at least one predictor is significant).
  4. Otherwise, fail to reject H0 (the model is not significantly better than a model with no predictors).

Data Table

xy\(\hat{y}=a + bx\)\((\hat{Y}-\bar{Y})^2\)\((Y-\hat{Y})^2\)

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top