Machine Learning : Linear Regression

3 min readSep 3, 2023

What is Linear Regression ?

Linear regression is a type of supervised machine learning algorithm that computes the linear relationship between a dependent variable and one or more independent features.

In its simplest form, it is a linear model using the Least Squares method.

Improved forms of linear regression, ridge regression, lasso regression, ridge regression with cross validation etc. are usually preferred for relatively complex datasets.

How does Linear Regression work ?

Linear Regression is a statistical method that is used to find an association between a dependent / target variable and independent / predictor variable or variable.

It is used to predict/classify an unknown attribute or value based on its relationship with known values.

Given the independent and dependent variables and their values plotted, it works by trying to find a line that can best capture the underlying trend, so it can better predict an unknown value.

A few examples are :

The weight of a person, with respect to their height.
Organizations would like to understand the relationship between the sales outcome with respect to expenditure in advertising/promotion.
Existing weather data can be used to provide a weather prediction.

What is the formula of Linear Regression ?

The basic formula for Linear Regression for individual data points with one variable is, taking x as the independent variable and y as the target variable.

Types of Linear Regression

Simple linear regression

Simple linear regression or SLR is a statistical model used when only one independent variable is present, and the functional relationship between the outcome variable is linear.
Equation :

Multiple linear regression

Multiple linear regression is a statistical model used for finding a relationship/association between a set of independent variables and a dependent variable.
Equation :

Linear Regression models in Python

In Python, Linear regression can be implemented from the statsmodels and the scikit-learn library :

statsmodels.regression.linear_model.OLS

OLS is a method of estimating the parameters of a linear regression model by minimizing the sum of the squared differences between the predicted and actual values.

Statsmodels OLS provides a detailed summary of the model, including information such as the coefficients, standard errors, t-values, and p-values for each predictor variable.

sklearn.linear_model.LinearRegression

The Scikit-Learn Linear Regression model is a simple linear regression model that estimates the parameters of a linear regression model by minimizing the sum of the squared differences between the predicted and actual values.

Scikit-Learn Linear Regression, on the other hand, provides a simpler output that includes the coefficients and intercept of the model.

In the follow up blogs, we’ll see a sample linear regression project using the above models.