This is a modified version of the DiffRIR model that supports training with multiple source positions. Single-source training can still be performed without any changes. Multi-source training has been tested exclusively with RIRs. Additional metrics have been added for validation. trace_mp.py is an improved version for computing precomputed reflection paths and supports multiprocessing. Instructions for using DiffRIR in multi-source mode can be found in the section # Multi-Source Training and Evaluation. Modified by Luka Fehrmann, 2025
Project Page | Video | Paper | Data
Code for the DIFFRIR model presented in Hearing Anything Anywhere. Please contact Mason Wang at masonlwang32 at gmail dot com for any inquiries or issues.
Mason Wang1 | Ryosuke Sawata1,2 | Samuel Clarke1 | Ruohan Gao1,3 | Elliott Wu1 | Jiajun Wu1
1Stanford, 2SONY AI, 3University of Maryland, College Park
HRIRs - the SADIE dataset of Head-Related Room Impulse Responses, which are used to render binaural audio.
example_trajectories - 3 notebooks used for generating example trajectories using trajectory.py, which are on the website. Includes a hallway, dampened room, and virtual speaker rotation example. Also contains audio files you can simulate in the room.
models - weights for pretrained models in each of the four base subdatasets.
precomputed - folder of precomputed reflection paths for all datasets, computed up to their default order
rooms - information on the geometry of each room, also contains dataset.py, which is used for loading data.
binauralize.py - tools used for binaural rendering
config.py - used to link the dataset
evaluate.py - tools used to evaluate renderings and render music
metrics.py - loss functions and evaluation metrics
render.py - the DIFFRIR renderer, used to render RIRs.
train.py - Training script, will train a DIFFRIR renderer on the specified dataset, save its outputs, and evaluate it.
trajectory.py - Used for rendering trajectories, e.g., simulating walking through a room while audio is playing
The dataset can be downloaded from zenodo: https://zenodo.org/records/11195833
The dataset used for multi-source training (including RIRs.npy, xyzs.npy and precomputed reflection paths) can be downloaded from zenodo: https://zenodo.org/records/16738230
config.py contains a list of paths to the data directories for different subdatasets. Each data directory should contain RIRs.npy, xyzs.npy, and so on.
Before using DIFFRIR, you will need to edit config.py so that these paths point to the correct directories on your machine.
There are three example notebooks in the example_trajectories directory that show you how to generate realistic, immersive audio in a room.
The three necessary arguments to the training script train.py are:
- The path where the model's weights and renderings should be saved.
- The name of the dataset (e.g.
"classroomBase") as specified inrooms/dataset.py. - The path to the directory of pretraced reflection paths (these are included as part of this github repo), which should be
precomputed/<dataset_name>
For example, to train and evaluate DIFFRIR on the Classroom Base dataset, simply run:
python train.py models/classroomBase classroomBase precomputed/classroomBase
In the above example:
- The weights and training losses of the model will be saved in
models/classroomBase, - In
models/classroomBase/predictions, the predicted RIRs for the monoaural locations in the dataset, the predicted music renderings, and the predicted binaural RIRs and music for the binaural datapoints in the dataset will be saved. models/classroomBase/predictionswill contain(N,)numpy arrays specifiying the per-datapoint error for monoaural RIR rendering.models/classroomBase/predictionswill contain(N,K)numpy arrays specifiying the per-datapoint, per-song error for monoaural music rendering.
To train the model with a dataset containing multiple source positions (e.g., "shoebox_multi_src6_ti3_id.py"), the paths to the precomputed reflection paths must be specified in the same order as the sources are listed in the dataset file located at rooms/shoebox_multi_src6_ti3_id.py.
For the given example, simply run:
python train.py models/shoebox_multi_src6_ti3_id shoebox_multi_src6_ti3_id precomputed/shoebox_L01 precomputed/shoebox_L11 precomputed/shoebox_L21 precomputed/shoebox_L31 precomputed/shoebox_L41 precomputed/shoebox_L51
The precomputed directory contains traced paths for all of the subdatasets used, but in case you would like to retrace (perhaps to a different order), you can use trace.py:
python trace.py precomputed/classroomBase classroomBase
The above command will trace the classroomBase dataset to its default reflection order(s), and save the results in precomputed/classroomBase.
For faster computation of the reflection paths, the modified version trace_mp.py can also be used.
python trace_mp.py precomputed/classroomBase classroomBase
@InProceedings{hearinganythinganywhere2024,
title={Hearing Anything Anywhere},
author={Mason Wang and Ryosuke Sawata and Samuel Clarke and Ruohan Gao and Shangzhe Wu and Jiajun Wu},
booktitle={CVPR},
year={2024}}