Skip to content

maswang32/hearinganythinganywhere

Repository files navigation

This is a modified version of the DiffRIR model that supports training with multiple source positions. Single-source training can still be performed without any changes. Multi-source training has been tested exclusively with RIRs. Additional metrics have been added for validation. trace_mp.py is an improved version for computing precomputed reflection paths and supports multiprocessing. Instructions for using DiffRIR in multi-source mode can be found in the section # Multi-Source Training and Evaluation. Modified by Luka Fehrmann, 2025

Hearing Anything Anywhere - CVPR 2024

Code for the DIFFRIR model presented in Hearing Anything Anywhere. Please contact Mason Wang at masonlwang32 at gmail dot com for any inquiries or issues.

Mason Wang1 | Ryosuke Sawata1,2 | Samuel Clarke1 | Ruohan Gao1,3 | Elliott Wu1 | Jiajun Wu1

1Stanford, 2SONY AI, 3University of Maryland, College Park

Organization

HRIRs - the SADIE dataset of Head-Related Room Impulse Responses, which are used to render binaural audio.

example_trajectories - 3 notebooks used for generating example trajectories using trajectory.py, which are on the website. Includes a hallway, dampened room, and virtual speaker rotation example. Also contains audio files you can simulate in the room.

models - weights for pretrained models in each of the four base subdatasets.

precomputed - folder of precomputed reflection paths for all datasets, computed up to their default order

rooms - information on the geometry of each room, also contains dataset.py, which is used for loading data.

binauralize.py - tools used for binaural rendering

config.py - used to link the dataset

evaluate.py - tools used to evaluate renderings and render music

metrics.py - loss functions and evaluation metrics

render.py - the DIFFRIR renderer, used to render RIRs.

train.py - Training script, will train a DIFFRIR renderer on the specified dataset, save its outputs, and evaluate it.

trajectory.py - Used for rendering trajectories, e.g., simulating walking through a room while audio is playing

Downloading our Dataset

The dataset can be downloaded from zenodo: https://zenodo.org/records/11195833

The dataset used for multi-source training (including RIRs.npy, xyzs.npy and precomputed reflection paths) can be downloaded from zenodo: https://zenodo.org/records/16738230

Linking the Dataset

config.py contains a list of paths to the data directories for different subdatasets. Each data directory should contain RIRs.npy, xyzs.npy, and so on.

Before using DIFFRIR, you will need to edit config.py so that these paths point to the correct directories on your machine.

Rendering Trajectories

There are three example notebooks in the example_trajectories directory that show you how to generate realistic, immersive audio in a room.

Training and Evaluation

The three necessary arguments to the training script train.py are:

  1. The path where the model's weights and renderings should be saved.
  2. The name of the dataset (e.g. "classroomBase") as specified in rooms/dataset.py.
  3. The path to the directory of pretraced reflection paths (these are included as part of this github repo), which should be precomputed/<dataset_name>

For example, to train and evaluate DIFFRIR on the Classroom Base dataset, simply run:

python train.py models/classroomBase classroomBase precomputed/classroomBase

In the above example:

  1. The weights and training losses of the model will be saved in models/classroomBase,
  2. In models/classroomBase/predictions, the predicted RIRs for the monoaural locations in the dataset, the predicted music renderings, and the predicted binaural RIRs and music for the binaural datapoints in the dataset will be saved.
  3. models/classroomBase/predictions will contain (N,) numpy arrays specifiying the per-datapoint error for monoaural RIR rendering.
  4. models/classroomBase/predictions will contain (N,K) numpy arrays specifiying the per-datapoint, per-song error for monoaural music rendering.

Multi-Source Training and Evaluation

To train the model with a dataset containing multiple source positions (e.g., "shoebox_multi_src6_ti3_id.py"), the paths to the precomputed reflection paths must be specified in the same order as the sources are listed in the dataset file located at rooms/shoebox_multi_src6_ti3_id.py.

For the given example, simply run:

python train.py models/shoebox_multi_src6_ti3_id shoebox_multi_src6_ti3_id precomputed/shoebox_L01 precomputed/shoebox_L11 precomputed/shoebox_L21 precomputed/shoebox_L31 precomputed/shoebox_L41 precomputed/shoebox_L51

Tracing Paths

The precomputed directory contains traced paths for all of the subdatasets used, but in case you would like to retrace (perhaps to a different order), you can use trace.py:

python trace.py precomputed/classroomBase classroomBase

The above command will trace the classroomBase dataset to its default reflection order(s), and save the results in precomputed/classroomBase.

For faster computation of the reflection paths, the modified version trace_mp.py can also be used.

python trace_mp.py precomputed/classroomBase classroomBase

Citation

@InProceedings{hearinganythinganywhere2024,
  title={Hearing Anything Anywhere},
  author={Mason Wang and Ryosuke Sawata and Samuel Clarke and Ruohan Gao and Shangzhe Wu and Jiajun Wu},
  booktitle={CVPR},
  year={2024}}


About

Hearing Anything Anywhere Code Release

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published