Regression analysis is a set of statistical processes for estimating the relationships between dependent variable and one or more independent variables. On the basis of regression analysis; observed when evaluating an event, which events are affected by is to be investigated. These events are either can be multiple and directly or indirectly affected.Regression according to the number of variables in the model analysis, simple linear regression and multiple linear regression is divided into two.


Independent Variable : Usually denoted by x. It is the (explanatory) variable that is not affected by another variable, but is the cause of y or is thought to affect it.

Dependent Variable : Usually denoted by y. It is the variable that can change or be affected (explained) depending on the variable x.

Assumptions of Regression Analysis

The regression model is based on the following assumptions.

  • The relationship between independent variable and dependent is linear.

Homoscedastic : Homoscedastic data have the same standard deviation in different groups where data are.

Heteroscedastic : Heteroscedastic have different standard deviations in different groups and assumes that the relationship between the two variables is linear.

Autocorrelation : Autocorrelation refers to the degree of correlation between the values of the same variables across different observations in the data.


The model is what we express with the help of a function and called the regression model. A model of the relationship is hypothesized, and estimates of the parameter values are used to develop an estimated regression equation. Various tests are then employed to determine if the model is satisfactory. If the model is deemed satisfactory, the estimated regression equation can be used to predict the value of the dependent variable given values for the independent variables.

A regression model relates Y to a function of X and β. Most regression models propose that Yi is a function of Xi and β with 𝜀i representing an additive error term that may stand in for unmodeled determinants of Yi or random statistical noise: Yi = f(Xi,β) +𝜀i

Steps To Conduct A Regression Analysis

Coefficient of Determination

The coefficient of determination, denoted or is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

Calculating the coefficient of determination (R²)

𝑹 ² = 𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 / 𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏

= 𝑹𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏 𝑺𝒖𝒎 𝒐𝒇 𝑺𝒒𝒖𝒂𝒓𝒆 (𝑺𝑺𝑹) / 𝑻𝒐𝒕𝒂𝒍 𝑺𝒖𝒎 𝒐𝒇 𝑺𝒒𝒖𝒂𝒓𝒆 (𝑺𝑺𝑻)


Linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Simple Linear Regression And Multiple Linear Regression

Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables. In a cause and effect relationship, the independent variable is the cause, and the dependent variable is the effect. Least squares linear regression is a method for predicting the value of a dependent variable y, based on the value of an independent variable x. Mathematically, the regression model is represented by the following equation: 𝐲 = 𝛽0 +𝛽1 𝒙1 +𝜀1


  • x independent variable

𝜷𝟏 = 𝒏 ∑ 𝒙𝐲 — ∑ 𝒙 ∑ 𝐲 / 𝒏 ∑ 𝒙²− (∑ 𝒙)²

An example of simple linear regression, which has one independent variable

Example — Linear Regression of patient’s age and their blood pressure

A study is conducted involving 10 patients to investigate the relationship and effects of patient’s age and their blood pressure.

Table: calculating the linear regression of patient’s age and blood pressure

Calculating the mean (𝒙̅ ,Ӯ):

𝒙̅=∑ 𝒙/ n = 491/10= 49.1 , Ӯ= ∑ 𝐲/n = 1410/10= 141

Calculating the regression coefficient;

𝛽1 = 𝑛 ∑ 𝑥y — ∑ 𝑥 ∑ y / 𝑛 ∑ 𝑥²− (∑ 𝑥)²

𝛽1 = 10 ∗ 71566 − 491 ∗ 1410 /10 ∗ 26157 − (491)²

𝛽1 = 23350 / 20489 = 1.140

𝛽0 =Ӯ −𝛽1𝑥̅

𝛽0 = 141–1.140 ∗ 49.1 so 𝛽0 = 85.026

Then substitute the regression coefficient into the regression model

𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 𝒃𝒍𝒐𝒐𝒅 𝒑𝒓𝒆𝒔𝒔𝒖𝒓𝒆 (Ŷ) = 85.026 + 1.140 𝑎𝑔e

Interpretation of the equation;

Constant (intercept) value 𝛽0 = 85.026 indicates that blood pressure at age zero.

Regression coefficient 𝛽1 = 1.140 indicates that as age increase by one year the blood pressure increase by 1.140

Applying the value of age to the regression Model to calculate the estimated blood pressure (Ŷ) coefficient of determination (R²) as follows:

Equation of ANOVA table for simple linear regression

Calculating the ANOVA table values for simple linear regression;

Calculating the coefficient of determination (R²)

𝑹²= 𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏/𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏=SSR/SST

Then substitute the values from ANOVA table

𝑹 𝟐 = 𝟐𝟔𝟔𝟐.𝟕𝟓 / 𝟑𝟐𝟖𝟒 = 𝟎. 𝟖𝟏𝟎

We can say that 81% of the variation in the blood pressure rate is explained by age.

Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of a dependent variable (target or criterion variable) based on the value of two or more independent variables (predictor or explanatory variables). Multiple regression allows you to determine the overall fit (variance explained) of the model and the relative contribution of each of the predictors to the total variance explained. For example, you might want to know how much of the variation in exam performance can be explained by revision time and lecture attendance “as a whole”, but also the “relative contribution” of each independent variable in explaining the variance. Mathematically, the multiple regression model is represented by the following equation: 𝒀 = 𝜷𝟎 +𝜷𝒊 𝑿𝒊 … … … … +𝜷𝒏 𝑿𝒏 +𝜀𝒊


  • 𝑿𝒊 to 𝑿𝒏 Represent independent variables.

In both cases,𝜀𝒊 is an error term and the subscript 𝒊 indexes a particular observation.

By using method of deviation;

  • 𝜷𝟏 = (∑ 𝒙𝟏𝒚)(∑ 𝒙𝟐²)−(∑ 𝒙𝟐 𝒚)(∑ 𝒙𝟏𝒙𝟐 ) /(∑ 𝒙𝟏²)(∑ 𝒙𝟐²) − (∑ 𝒙𝟏𝒙𝟐)²


  • Foster, Dean P., & George, Edward I. (1994). The Risk Inflation Criterion for Multiple Regression. Annals of Statistics, 22(4). 1947–1975. doi:10.1214/aos/1176325766



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store