Hi,
Thanks a lot for the port.
It seems that the example.py misses the scaling of the waveform into the [-256, 256] range which SoundNet expects:
https://github.com/cvondrick/soundnet/blob/fc1b3dd588f53e2bd27207b542aff420404e94c7/demo.lua#L23-L30
Currently, torchaudio.load(path) loads a waveform in [-1, 1] because of the default normalization in:
|
wav, sr = torchaudio.load(audio_path) |
|
print(wav.shape) |
|
|
|
wav = wav.unsqueeze(1).unsqueeze(-1).repeat(1,1,8,1) # errors occur when the wav is too short |
|
feats = model.extract_feat(wav) |
Also, here is a Google Colab notebook.
One can reproduce the issue just by adding printing min/max values after
|
wav, sr = torchaudio.load(audio_path) |
Hi,
Thanks a lot for the port.
It seems that the
example.pymisses the scaling of the waveform into the [-256, 256] range which SoundNet expects:https://github.com/cvondrick/soundnet/blob/fc1b3dd588f53e2bd27207b542aff420404e94c7/demo.lua#L23-L30
Currently,
torchaudio.load(path)loads a waveform in [-1, 1] because of the default normalization in:SoundNet_Pytorch/example.py
Lines 10 to 14 in f2d9ce0
Also, here is a Google Colab notebook.
One can reproduce the issue just by adding printing min/max values after
SoundNet_Pytorch/example.py
Line 10 in f2d9ce0