-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Description
Issue description
The pytorch stft implementation and thus the pytorch/audio SPECTROGRAM transform is pretty slow (factor x10) in comparison with the librosa (numpy based) spectrogram computation.
Code example
if __name__ == "__main__":
n_fft = 4096
hop_length = 512
n_mels = 128
f_min = 512
f_max = 8192
import librosa
import time
from torchaudio.transforms import SPECTROGRAM
sig, sr = librosa.load(librosa.util.example_audio_file())
torch_spec_tranform = SPECTROGRAM(ws=n_fft, hop=hop_length)
with torch.no_grad():
print("librosa: ", end="")
start = time.time()
for _ in range(10):
spec = librosa.core.stft(sig, n_fft=n_fft, hop_length=hop_length)
print(time.time() - start)
print(spec.shape)
sig = torch.from_numpy(sig)
sig = sig.expand(10, -1)
print("torch: ", end="")
start = time.time()
spec = torch_spec_tranform(sig)
print(time.time() - start)
print(spec.shape)out:
Warning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
self.window = Variable(self.window, volatile=True)
librosa: 1.6748952865600586
(2049, 2647)
torch: 15.748203754425049
torch.Size([10, 2639, 2049])System Info
python collect_env.py
Collecting environment information...
PyTorch version: 0.4.0
Is debug build: No
CUDA used to build PyTorch: None
OS: Fedora release 28 (Twenty Eight)
GCC version: (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)
CMake version: version 3.11.2
Python version: 3.6
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Versions of relevant libraries:
[pip3] numpy (1.14.3)
[pip3] torch (0.4.0)
[pip3] torchaudio (0.1)
[pip3] torchvision (0.2.1)
[conda] Could not collectMy CPU:
Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
with 16 GB RAM
Metadata
Metadata
Assignees
Labels
No labels