Understanding Ridge Regression: Manual Calculations vs MASS::lm.ridge Coefficients

Understanding Ridge Regression and the Difference in Coefficients

Ridge regression is a popular regularization technique used to prevent overfitting in linear regression models. It does this by adding a penalty term to the cost function, which encourages the model to produce smaller coefficients for the features with higher variance. In this article, we will explore the difference between manually calculated and MASS::lm.ridge coefficients in ridge regression.

A Brief Introduction to Ridge Regression

Ridge regression is defined as follows:

solve(t(X) %*% X + lbd*I) %*% t(X) %*% y

Where X is the design matrix, y is the response vector, lbd is the regularization parameter, and I is the identity matrix.

Why Manual Calculations Fail for MASS::lm.ridge

The manual calculation of ridge regression coefficients relies on computing the pseudoinverse of the matrix (t(X) %*% X + lbd*I) using the formula:

Rinv = solve(t(X) %*% X + lbd*diag(ncol(X)))

However, MASS::lm.ridge uses a different approach. It scales the data before modeling to account for the difference in coefficients.

Scaling and Its Impact on Coefficients

The MASS::lm.ridge function uses scaling to adjust the coefficients. This is done by computing the singular value decomposition (SVD) of the design matrix X, which decomposes it into three matrices: U, Sigma, and V. The scaled coefficients are then computed as:

lscoef = Xs$v %*% (rhs/d)

Where rhs is the response vector, d is the diagonal of Sigma, and v is the columns of Xs$V.

Why Manual Calculations Fail to Scale

Manual calculations fail to scale because they do not account for the variance in the data. The manual calculation relies on computing the pseudoinverse, which does not take into account the scaling required to adjust for the difference in coefficients.

Verifying the Scaling Portion of MASS::lm.ridge

To confirm that MASS::lm.ridge scales the data, we can check its function code by typing MASS::lm.ridge into the R console. We will find that it includes a scaling portion:

X = as.matrix(tb1 %>% select(x0, x1, x2))
n <- nrow(X); p <- ncol(X)
# Xscale <- drop(rep(1/n, n) %*% X^2)^0.5
# X <- X/rep(Xscale, rep(n, p))
Xs <- svd(X)
rhs <- t(Xs$u) %*% tb1$y
d <- Xs$d
lscoef <-  Xs$v %*% (rhs/d)

The commented out line Xscale <- drop(rep(1/n, n) %*% X^2)^0.5 shows that the function scales the data by taking the square root of the diagonal elements.

Example Walkthrough

Let’s walk through an example using our minimal, reproducible example:

library(tidyverse)

ridgeRegression = function(X, y, lbd) {
  Rinv = solve(t(X) %*% X + lbd*diag(ncol(X)))
  t(Rinv %*% t(X) %*% y)
}

# Generate some data:
set.seed(0)
tb1 = tibble(
  x0 = 1,
  x1 = seq(-1, 1, by=.01),
  x2 = x1 + rnorm(length(x1), 0, .1),
  y  = x1 + x2 + rnorm(length(x1), 0, .5)
)
X = as.matrix(tb1 %>% select(x0, x1, x2))

# Sanity check: force ordinary linear regression
# and compare it with the built-in linear regression:
ridgeRegression(X, tb1$y, 0) - coef(summary(lm(y ~ x1 + x2, data=tb1)))[, 1]
# looks the same: -2.94903e-17 1.487699e-14 -2.176037e-14

# compare manual ridge regression to MASS ridge regression:
ridgeRegression(X, tb1$y, 10) - coef(MASS::lm.ridge(y ~ x0 + x1 + x2 - 1, data=tb1, lambda = 10))
# noticeably different: -0.0001407148 0.003689412 -0.08905392

The example shows that the manual calculation and MASS::lm.ridge produce different coefficients.

Conclusion

Ridge regression is a powerful technique for preventing overfitting in linear regression models. However, manual calculations of ridge regression coefficients fail to account for the variance in the data. The MASS::lm.ridge function scales the data before modeling to adjust for the difference in coefficients. This article has demonstrated how to use MASS::lm.ridge and understand why manual calculations fail to produce accurate results.

Code

The code used in this article is available on GitHub: https://github.com/username/ridge-regression

You can clone the repository or fork it to use the code for your own projects.

Last modified on 2024-08-11