Skip to content

Input in example.py is not scaled up to [-256, 256] and remains to be in [-1, 1] #2

@v-iashin

Description

@v-iashin

Hi,

Thanks a lot for the port.

It seems that the example.py misses the scaling of the waveform into the [-256, 256] range which SoundNet expects:
https://github.com/cvondrick/soundnet/blob/fc1b3dd588f53e2bd27207b542aff420404e94c7/demo.lua#L23-L30

Currently, torchaudio.load(path) loads a waveform in [-1, 1] because of the default normalization in:

wav, sr = torchaudio.load(audio_path)
print(wav.shape)
wav = wav.unsqueeze(1).unsqueeze(-1).repeat(1,1,8,1) # errors occur when the wav is too short
feats = model.extract_feat(wav)

Also, here is a Google Colab notebook.
One can reproduce the issue just by adding printing min/max values after

wav, sr = torchaudio.load(audio_path)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions