This repository is the official implementation of our paper:
Takao Kawamura, Daisuke Niizumi, and Nobutaka Ono, “What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model” arXiv link
We provide the framework to analyze general-purpose audio models at the neuron level using Audio Activation Probability Entropy (AAPE).
This project relies on EVAR. Please follow the detailed instructions in Install.md to set up the evaluation environment and download datasets.
Run the notebooks to calculate neuron activations and compute conditional activation probabilities.
Example: Open and run {model_name}_{dataset_name}.ipynb (e.g., M2D_ESC50.ipynb).
This notebook generates the following outputs:
- For SSL ViT (M2D):
{task_name}_analysis/ - For SL ViT:
{task_name}_vit/
Generated files:
over_zero.pt— conditional activation probabilitiesclasses.txt— list of class namesfeatures.pt— extracted embeddingsgts.pt— ground-truth labels
Run the notebooks to compute AAPE and analyze class-specific neurons.
Open and run Analyze_ClassSpecificNeurons.ipynb.
This notebook generates the following results across multiple domains:
Task-level neuron statistics
task_level_neuron_statistics_analysis.csv— SSL ViT (M2D)task_level_neuron_statistics_vit.csv— SL ViT
Cross-task analysis
cross_task-surge-nspitch.pdf— Octavecross_task-esc50-gise51.pdf— Acoustic eventcross_task-vc1-cremad_gender.pdf— Gender
Within-task analysis
within_task-gtzan.pdf— Music genrewithin_task-cremad.pdf— Emotionwithin_task-voxforge.pdf— Language
The detailed analysis procedure is described in the paper.
Run the notebook to evaluate how deactivating class-specific neurons affects downstream classification performance.
Open and run Ablations.ipynb.
In this experiment:
- A classifier is first trained on frozen embeddings.
- Neurons in the feature extractor are selectively deactivated.
- The test performance of the trained classifier is evaluated under neuron deactivation.
Note: Due to potential differences in hardware (e.g., GPU vs. CPU), the results may vary across environments, even when random seeds are fixed.
|
|- README.md
|- Install.md
|
|- {model_name}_{dataset_name}.ipynb
|- Analyze_ClassSpecificNeurons.ipynb
|- Ablations.ipynb
|- add_to_evar/le_eval_only.py
|
|- evar/ (created by following Install.md)
|- common.py
└─ common_analyze.py
If you use this code in your research, please cite:
@misc{kawamura2026what,
title = {What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model},
author = {Takao Kawamura and Daisuke Niizumi and Nobutaka Ono},
year = {2026},
eprint = {2602.15307},
archivePrefix = {arXiv},
primaryClass = {eess.AS},
url = {https://arxiv.org/abs/2602.15307},
}The AAPE identification logic implemented in common.py is inspired by and partially based on the neuron identification methodology proposed in:
Tang et al., “Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models,” ACL 2024.
@inproceedings{tang2024language,
title = {Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models},
author = {Tianyi Tang and Wenyang Luo and Haoyang Huang and Dongdong Zhang and Xiaolei Wang and Xin Zhao and Furu Wei and Ji-Rong Wen},
booktitle = {Proc. Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
year = {2024},
pages = {5701--5715},
doi = {10.18653/v1/2024.acl-long.309},
}This project is released under the MIT License.