Machine Learning : Elastic Net Regression
2 min readJan 24, 2024
Previously, we talked about lasso and ridge regression. If we were to combine the best of both of those worlds, we would get
Elastic Net Regression
- It is a linear regression model that combines both L1 (Lasso) and L2 (Ridge) regularization techniques.
- Designed to address some of the limitations of Lasso and Ridge regression by introducing a mixing parameter (l1_ratiol1_ratio) that controls the contribution of L1 and L2 penalties.
This is visible in the equation for elastic net regression as follows:
- β0 is the intercept term.
- β is the vector of coefficients for the features.
- xi is the feature vector for the ith observation.
- yi is the target variable for the ith observation.
- ∥β∥1 is the L1 norm (sum of absolute values of coefficients).
- ∥β∥22 is the L2 norm squared (sum of squared values of coefficients).
- α is the regularization parameter that controls the overall strength of the penalty.
- l1_ratiol1_ratio is the mixing parameter that determines the ratio of L1 to L2 penalty. It ranges from 0 to 1.
Why and when should you use it ?
- Use Lasso when you suspect that many features are irrelevant or redundant. Lasso performs feature selection by driving some coefficients to exactly zero.
- Use Ridge when you have a high-dimensional dataset with multicollinearity among features. Ridge helps to mitigate multicollinearity by adding a penalty term to the squared magnitudes of the coefficients.
- Use Elastic Net when you want a combination of L1 and L2 regularization.
Elastic Net combines the benefits of both Lasso and Ridge, providing a compromise between feature selection and handling multicollinearity.