Machine Learning : Elastic Net Regression

Shivang Kainthola
2 min readJan 24, 2024

Previously, we talked about lasso and ridge regression. If we were to combine the best of both of those worlds, we would get

Elastic Net Regression

  • It is a linear regression model that combines both L1 (Lasso) and L2 (Ridge) regularization techniques.
  • Designed to address some of the limitations of Lasso and Ridge regression by introducing a mixing parameter (l1_ratiol1_ratio) that controls the contribution of L1 and L2 penalties.

This is visible in the equation for elastic net regression as follows:

  • β0​ is the intercept term.
  • β is the vector of coefficients for the features.
  • xi​ is the feature vector for the ith observation.
  • yi​ is the target variable for the ith observation.
  • ∥β∥1​ is the L1 norm (sum of absolute values of coefficients).
  • ∥β∥22​ is the L2 norm squared (sum of squared values of coefficients).
  • α is the regularization parameter that controls the overall strength of the penalty.
  • l1_ratiol1_ratio is the mixing parameter that determines the ratio of L1 to L2 penalty. It ranges from 0 to 1.

Why and when should you use it ?

  • Use Lasso when you suspect that many features are irrelevant or redundant. Lasso performs feature selection by driving some coefficients to exactly zero.
  • Use Ridge when you have a high-dimensional dataset with multicollinearity among features. Ridge helps to mitigate multicollinearity by adding a penalty term to the squared magnitudes of the coefficients.
  • Use Elastic Net when you want a combination of L1 and L2 regularization.

Elastic Net combines the benefits of both Lasso and Ridge, providing a compromise between feature selection and handling multicollinearity.

It is useful when you have a dataset with a large number of features and some degree of multicollinearity, and you want a balance between sparsity and coefficient shrinkage.

Implementing Elastic Net Regression in Python :

--

--