GRBM weights are initialized arbitrarily and aren't robust

@VolodyaCO brought this up at https://github.com/dwavesystems/dwave-pytorch-plugin/pull/47/files#r2577897489 . 
Empirically, the current weight initialization scheme tends to result in optimizer getting trapped in poor local optima.
We should run a simple experiment to pick a better initialization scheme.
For what it's worth, I've found initializing weights to 0 to be quite robust.