Thankyou for this wonderful benchmarking. In several experiments `wd=1.2e-6`. Can you please give some guidelines or rule of thumb in choosing the hyperparameter for weight decay?
Thankyou for this wonderful benchmarking.
In several experiments
wd=1.2e-6. Can you please give some guidelines or rule of thumb in choosing the hyperparameter for weight decay?