Weight Decay

· 109 words · 1 minute read
  • Weight decay is a way to make sure the models don’t overfit, how? easy, for a model to overfit, it has to have crazy values in the model…. I mean if all the values were small it would be really hard to create weird folds in space. So what if in the loss function we add a new term, this term will be the sum of all parameters in the model squared! that will give an incentive to the NN to have small values in the model. the weight decay is a factor of this summ
  • loss L will became L = old_loss + weight_decay * (sum(model_parameters) ** 2)