Understanding Ridge Regression and the Difference in Coefficients
Ridge regression is a popular regularization technique used to prevent overfitting in linear regression models. It does this by adding a penalty term to the cost function, which encourages the model to produce smaller coefficients for the features with higher variance. In this article, we will explore the difference between manually calculated and MASS::lm.ridge coefficients in ridge regression.
A Brief Introduction to Ridge Regression
Ridge regression is defined as follows:
solve(t(X) %*% X + lbd*I) %*% t(X) %*% y
Where X is the design matrix, y is the response vector, lbd is the regularization parameter, and I is the identity matrix.
Why Manual Calculations Fail for MASS::lm.ridge
The manual calculation of ridge regression coefficients relies on computing the pseudoinverse of the matrix (t(X) %*% X + lbd*I) using the formula:
Rinv = solve(t(X) %*% X + lbd*diag(ncol(X)))
However, MASS::lm.ridge uses a different approach. It scales the data before modeling to account for the difference in coefficients.
Scaling and Its Impact on Coefficients
The MASS::lm.ridge function uses scaling to adjust the coefficients. This is done by computing the singular value decomposition (SVD) of the design matrix X, which decomposes it into three matrices: U, Sigma, and V. The scaled coefficients are then computed as:
lscoef = Xs$v %*% (rhs/d)
Where rhs is the response vector, d is the diagonal of Sigma, and v is the columns of Xs$V.
Why Manual Calculations Fail to Scale
Manual calculations fail to scale because they do not account for the variance in the data. The manual calculation relies on computing the pseudoinverse, which does not take into account the scaling required to adjust for the difference in coefficients.
Verifying the Scaling Portion of MASS::lm.ridge
To confirm that MASS::lm.ridge scales the data, we can check its function code by typing MASS::lm.ridge into the R console. We will find that it includes a scaling portion:
X = as.matrix(tb1 %>% select(x0, x1, x2))
n <- nrow(X); p <- ncol(X)
# Xscale <- drop(rep(1/n, n) %*% X^2)^0.5
# X <- X/rep(Xscale, rep(n, p))
Xs <- svd(X)
rhs <- t(Xs$u) %*% tb1$y
d <- Xs$d
lscoef <- Xs$v %*% (rhs/d)
The commented out line Xscale <- drop(rep(1/n, n) %*% X^2)^0.5 shows that the function scales the data by taking the square root of the diagonal elements.
Example Walkthrough
Let’s walk through an example using our minimal, reproducible example:
library(tidyverse)
ridgeRegression = function(X, y, lbd) {
Rinv = solve(t(X) %*% X + lbd*diag(ncol(X)))
t(Rinv %*% t(X) %*% y)
}
# Generate some data:
set.seed(0)
tb1 = tibble(
x0 = 1,
x1 = seq(-1, 1, by=.01),
x2 = x1 + rnorm(length(x1), 0, .1),
y = x1 + x2 + rnorm(length(x1), 0, .5)
)
X = as.matrix(tb1 %>% select(x0, x1, x2))
# Sanity check: force ordinary linear regression
# and compare it with the built-in linear regression:
ridgeRegression(X, tb1$y, 0) - coef(summary(lm(y ~ x1 + x2, data=tb1)))[, 1]
# looks the same: -2.94903e-17 1.487699e-14 -2.176037e-14
# compare manual ridge regression to MASS ridge regression:
ridgeRegression(X, tb1$y, 10) - coef(MASS::lm.ridge(y ~ x0 + x1 + x2 - 1, data=tb1, lambda = 10))
# noticeably different: -0.0001407148 0.003689412 -0.08905392
The example shows that the manual calculation and MASS::lm.ridge produce different coefficients.
Conclusion
Ridge regression is a powerful technique for preventing overfitting in linear regression models. However, manual calculations of ridge regression coefficients fail to account for the variance in the data. The MASS::lm.ridge function scales the data before modeling to adjust for the difference in coefficients. This article has demonstrated how to use MASS::lm.ridge and understand why manual calculations fail to produce accurate results.
Further Reading
For further reading on ridge regression, we recommend checking out:
- The Scikit-Learn documentation for the
Ridgeclass in scikit-learn. - The Wikipedia page on Ridge Regression provides a comprehensive overview of ridge regression and its history.
Code
The code used in this article is available on GitHub: https://github.com/username/ridge-regression
You can clone the repository or fork it to use the code for your own projects.
Last modified on 2024-08-11