Vish Sangale
NOTEApril 2020

Regularization

Regularization is a fundamental technique in machine learning used to prevent overfitting by penalizing the complexity of a model. By adding a regularization term to the loss function, we encourage the model to learn simpler patterns that generalize better to unseen data.

L2 Regularization (Weight Decay)

L2 regularization, also known as Ridge regression or Weight Decay, adds the sum of the squares of the weights to the loss function:

\[L_{reg} = L_{original} + \lambda \sum w^2\]

L1 Regularization

L1 regularization, or Lasso regression, adds the sum of the absolute values of the weights:

\[L_{reg} = L_{original} + \lambda \sum |w|\]

Elastic Net (L1 + L2)

Elastic Net is a hybrid approach that combines both L1 and L2 penalties:

\[L_{reg} = L_{original} + \lambda_1 \sum |w| + \lambda_2 \sum w^2\]

Max Norm Regularization

Max Norm regularization enforces an absolute upper bound on the magnitude of the weight vector for every neuron:

\[\|w\|_2 \le c\]

Summary: When to use which?