Machine Learning : Lasso Regression

3 min readDec 30, 2023

What is Lasso Regression ?

Lasso or Least Absolute Shrinkage and Selection Operator regression is a form on regularized linear regression that applies the L1 regularization to the OLS linear regression function, minimizing the absolute values of coefficients.

It is an alternate to Ridge Regression (click to read that article :-) ), which applies the L2 regularization.

What is the L1 regularization ?

Regularization is a method of preventing a model from overfitting by adding more information to it and restricting its complexity.

It introduces a penalty term based on the absolute values of the model parameters to the cost function during training, aiming to prevent overfitting and promote sparsity in the model.

The L1 regularization term has the effect of shrinking some of the coefficients of the model which are least influential to exactly zero.

Lasso is best applicable in situations having fewer variables with medium to large effects, and it requires cross validation to find the optimal value for it.

https://www.google.com/url?sa=i&url=https%3A%2F%2Fpub.towardsai.net%2Flasso-l1-and-ridge-l2-regularization-techniques-33b7f12ac0b&psig=AOvVaw1puMQ7zFRSM5KZn7xoPfHb&ust=1704043180361000&source=images&cd=vfe&opi=89978449&ved=0CBIQjRxqFwoTCNDex9jVt4MDFQAAAAAdAAAAABAD

How is L1 regularization applied ?

Like Ridge Regression, the impact of LASSO regularization is controlled by by the parameter : alpha (ᶐ) being included in the equation.

The equation is as follows :

Corresponding to this, when you implement this in Python, you work with two parameters :

alpha (ᶐ) : a float-type value which corresponding to the one in the equation, controls the extent of lasso regularization. The default value is 1.0 .
max_iter : maximum number of iterations for gradient solver

Implementing Lasso Regression in Python

Using Scikit-learn and other libraries, let us see how we can implement lasso regression :

General advantages and disadvantages of Lasso regression

Advantages of Lasso Regression:

Feature Selection: One of the primary advantages of Lasso regression is its ability to perform feature selection by driving the coefficients of irrelevant or less important features to exactly zero. This can be valuable when dealing with high-dimensional datasets.
Simplicity: Lasso encourages sparsity in the model, leading to simpler and more interpretable models, especially when there are many features.
Robustness to Collinearity: Lasso tends to perform well even in the presence of multicollinearity (high correlation among predictor variables), as it arbitrarily selects one of the correlated features and shrinks the coefficients of the others to zero.

Disadvantages of Lasso Regression:

Not Robust to Outliers: Lasso is sensitive to outliers in the data, and a single outlier can disproportionately influence the coefficients of the model.
Unstable for Highly Correlated Features: When features are highly correlated, Lasso tends to arbitrarily select one of the correlated features and shrink the coefficients of the others to zero. The choice of which feature to keep may be unstable across different runs of the algorithm.
Selection of Regularization Parameter: The performance of Lasso depends on the proper selection of the regularization parameter (λ). Cross-validation is often used to find an optimal value, but this process can be computationally expensive for large datasets.
May Shrink Coefficients to Zero Too Aggressively: While feature sparsity can be an advantage, Lasso’s aggressive penalty on coefficients may lead to a risk of discarding potentially important variables, especially when the number of observations is small compared to the number of features.