Skip to content

smj007/Audiomer-PyTorch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

☢️ Audiomer ☢️

Audiomer: A Convolutional Transformer for Keyword Spotting
Accepted at AAAI 2022 DSTC Workshop

PWC

[ arXiv ] [ Previous SOTA ] [ Model Architecture ]

Pretrained Models: Google Drive

NOTE: This is a pre-print release, the code might have bugs.

Usage

To reproduce the results in the paper, follow the instructions:

  • To download the Speech Commands v2 dataset, run: python3 datamodules/SpeechCommands12.py
  • To train Audiomer-S and Audiomer-L on all three datasets thrice, run: python3 run_expts.py
  • To evaluate a model on a dataset, run: python3 evaluate.py --checkpoint_path /path/to/checkpoint.ckpt --model <model type> --dataset <name of dataset>.
  • For example: python3 evaluate.py --checkpoint_path ./epoch=300.ckpt --model S --dataset SC20

Results

Performer Conv-Attention

TLDR: We augment 1D ResNets With Performer Attention over Raw Audio waveform.


System requirements

  • NVIDIA GPU with CUDA
  • Python 3.6 or higher.
  • pytorch_lightning
  • torchaudio
  • performer_pytorch

About

A Convolutional Transformer for Keyword Spotting (AAAI 2022 DSTC Workshop)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%