Skip to content

onolab-tmu/AAPE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model

This repository is the official implementation of our paper:

Takao Kawamura, Daisuke Niizumi, and Nobutaka Ono, “What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model” arXiv link

We provide the framework to analyze general-purpose audio models at the neuron level using Audio Activation Probability Entropy (AAPE).

🚀 Reproduction Steps

1. Setup Environment

This project relies on EVAR. Please follow the detailed instructions in Install.md to set up the evaluation environment and download datasets.

2. Compute Conditional Activation Probabilities

Run the notebooks to calculate neuron activations and compute conditional activation probabilities.

Example: Open and run {model_name}_{dataset_name}.ipynb (e.g., M2D_ESC50.ipynb). This notebook generates the following outputs:

  • For SSL ViT (M2D): {task_name}_analysis/
  • For SL ViT: {task_name}_vit/

Generated files:

  • over_zero.pt — conditional activation probabilities
  • classes.txt — list of class names
  • features.pt — extracted embeddings
  • gts.pt — ground-truth labels

3. Compute AAPE and Identify Class-Specific Neurons

Run the notebooks to compute AAPE and analyze class-specific neurons. Open and run Analyze_ClassSpecificNeurons.ipynb.

This notebook generates the following results across multiple domains:

Task-level neuron statistics

  • task_level_neuron_statistics_analysis.csv — SSL ViT (M2D)
  • task_level_neuron_statistics_vit.csv — SL ViT

Cross-task analysis

  • cross_task-surge-nspitch.pdf — Octave
  • cross_task-esc50-gise51.pdf — Acoustic event
  • cross_task-vc1-cremad_gender.pdf — Gender

Within-task analysis

  • within_task-gtzan.pdf — Music genre
  • within_task-cremad.pdf — Emotion
  • within_task-voxforge.pdf — Language

The detailed analysis procedure is described in the paper.

4. Examine the effects of deactivating class-specific neurons

Run the notebook to evaluate how deactivating class-specific neurons affects downstream classification performance. Open and run Ablations.ipynb.

In this experiment:

  1. A classifier is first trained on frozen embeddings.
  2. Neurons in the feature extractor are selectively deactivated.
  3. The test performance of the trained classifier is evaluated under neuron deactivation.

Note: Due to potential differences in hardware (e.g., GPU vs. CPU), the results may vary across environments, even when random seeds are fixed.

🛠️ Repository Structure

|
|- README.md
|- Install.md
|
|- {model_name}_{dataset_name}.ipynb
|- Analyze_ClassSpecificNeurons.ipynb
|- Ablations.ipynb
|- add_to_evar/le_eval_only.py
|
|- evar/ (created by following Install.md)
|- common.py
└─ common_analyze.py

Citation

If you use this code in your research, please cite:

@misc{kawamura2026what,
  title         = {What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model},
  author        = {Takao Kawamura and Daisuke Niizumi and Nobutaka Ono},
  year          = {2026},
  eprint        = {2602.15307},
  archivePrefix = {arXiv},
  primaryClass  = {eess.AS},
  url           = {https://arxiv.org/abs/2602.15307},
}

Acknowledgements

The AAPE identification logic implemented in common.py is inspired by and partially based on the neuron identification methodology proposed in:

Tang et al., “Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models,” ACL 2024.

@inproceedings{tang2024language,
  title         = {Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models},
  author        = {Tianyi Tang and Wenyang Luo and Haoyang Huang and Dongdong Zhang and Xiaolei Wang and Xin Zhao and Furu Wei and Ji-Rong Wen},
booktitle       = {Proc. Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  year          = {2024},
  pages         = {5701--5715},
  doi           = {10.18653/v1/2024.acl-long.309},
}

License

This project is released under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published