@VolodyaCO brought this up at https://github.com/dwavesystems/dwave-pytorch-plugin/pull/47/files#r2577897489 .
Empirically, the current weight initialization scheme tends to result in optimizer getting trapped in poor local optima.
We should run a simple experiment to pick a better initialization scheme.
For what it's worth, I've found initializing weights to 0 to be quite robust.
@VolodyaCO brought this up at https://github.com/dwavesystems/dwave-pytorch-plugin/pull/47/files#r2577897489 .
Empirically, the current weight initialization scheme tends to result in optimizer getting trapped in poor local optima.
We should run a simple experiment to pick a better initialization scheme.
For what it's worth, I've found initializing weights to 0 to be quite robust.