Machine Learning : Ridge Regression
What is Ridge Regression ?
Ridge regression is a form of linear regression which applies the L2 regularization to the Ordinary Least Squares function used for regression, minimizing the sum of squares of the data (weights).
It is an alternate to Lasso Regression, which utilizes the L1 regularization.
Now, what is the L2 regularization ?
Regularization is a method of preventing a model from overfitting by adding more information to it and restricting its complexity.
L2 regularization or L2-norm involves adding a penalty for LARGE variation in the data or if observing the function, for large variation in the w parameter.
Ridge regression as well as L2-norm regularization CANNOT be implemented without first having carried out normalization of the data. (Read about that in the follow up blogs)
How is L2 regularization applied ?
The L2 regularization is applied by the parameter : alpha (ᶐ) being included in the equation.
Higher the value of alpha, stronger the regularization.
When applying L2 regularization, the regression equation becomes :
How can we implement this in Python ?
Just like most statistical ML models, we do so using Scikit-Learn. Let’s implement ridge regression on a generic dataset and contrast it with simple OLS linear regression in a step by step manner :
General advantages and disadvantages
Advantages :
- Controls tendency of overfitting
- Penalizes large coefficients
- Suitable for high-dimensional data
Disadvantages :
- Sensitive to feature scaling
- May make the interpretation of individual features or coefficients difficult
- Does not conduct variable selection
Considering that this method restricts interpretability as well as variable selection, it must be applied keeping in mind the shortcomings.