Adding regularization will often help To prevent overfitting problem (high variance problem ).
1. Logistic regression
回忆一下训练时的优化目标函数
其中
$L_2 \ \ regularization $ (most commonly used):
其中
Why do we regularize just the parameter w? Because w Is usually a high dimensional parameter vector while b is A scalar. Almost all The parameters are in w rather than b.
$L_1 \ \ regularization $
其中
w will end up being sparse. In other words the w vector will have a lot of zeros in it. This can help with compressing the model a little.
2. Neural network “Frobenius norm”
其中
$L_2$ regulation is also called Weight decay:
能够防止权重$w$过大,从而避免过拟合
3. inverted dropout
对于不同的训练样本都可以随机消除一部分结点
反向随机失活(前向和后向都需要dropout):
this inverted dropout technique by dividing by the keep.prob, it ensures that the expected value of a3 remains the same. This makes test time easier because you have less of a scaling problem.
测试时不需要使用drop out