This library is for applying trained models to data.
We provide a few Python notebooks for efficient transfer learning, as suggested in Feature Embeddings from Large-Scale Acoustic Bird Classifiers Enable Few-Shot Transfer Learning.
The full workflow is illustrated in a Colab tutorial. This tutorial can be used with Google Colab's free-tier, requiring no software installation, though a (free) Google account is required. This notebook can be copied and adapted to work with your own data, stored in Drive.
For local installation and use of the base Python notebooks, we recommend using a Linux machine (eg, Ubuntu) with a moderate GPU. Our continuous integration tests install and run on Linux, so that is your best bet for compatibility. Some users have had success using the Windows Linux Subsystem (WSL), or with using Docker and virtual machines hosted in the cloud. Anecdotally, installation on OS X is difficult.
The classifier workflow has two-or-three steps. We work with an /embedding model/, a large quantity of /unlabeled data/ and a usually-smaller set of /labeled data/.
We first need to compute /embeddings/ of the target unlabeled audio. The
unlabeled audio is specified by one or more 'globs' of files like:
/my_home/audio/*/*.flac. Any audio formats readable by Librosa should be fine.
We provide embed_audio.ipynb to do so. This creates a dataset of embeddings
in a directory of your choice, along with a configuration file.
Computing embeddings can take a while for large datasets;
we suggest using a machine with a GPU. For truly massive datasets (terabytes
or more), we provide a Beam pipeline via embed.py which can run on a cluster.
Setting this up may be challenging, however; feel free to get in touch if you
have questions.
Once we have embedded the unlabeled audio, you can use agile_modeling.ipynb
to search for interesting audio and create a classifier. Starting from a clip
(or Xeno-Canto id, or URL for an audio file), you can search for similar audio
in the unlabeled data.
By providing a label and clicking on relevant results, you will start amassing
a set of labeled data.
You can also add labeled data manually. The labeled data is stored in a simple 'folder-of-folders' format: each class is given a folder, whose name is the class. (Explicit negatives can be put in an 'unknown' folder.) This makes it easy to add additional examples. It is recommended to add examples with length matching the /window size/ of the embedding model (5 seconds for Perch, or 3 seconds for BirdNET).
From there, the notebook will build a small classifier using the embeddings of the labeled audio. The classifier can then be run on the unlabeled data. Hand-labeling results will allow you to feed new data into the labeled dataset, and iterate quickly.
The analysis.ipynb notebook provides additional tools for analyzing data with
a pre-trained classifier, as developed in agile_modeling.ipynb. It can be
used to run detections over new data, estimate total call density, and
evaluate the real-world model quality.
Install the main repository, following the instructions in the main README.md, and check that the tests pass.
Separately, install Jupyter. Once Jupyter and Chirp are installed, use the command line to navigate to the directory where Chirp is installed, and launch the notebook with:
poetry run jupyter notebook \
--NotebookApp.allow_origin='https://colab.research.google.com' \
--port=8888 --NotebookApp.port_retries=0
This starts the notebook server. A link to the notebook should appear in the terminal output; open this in a web browser.
Once in the Jupyter interface, navigate to chirp/inference, and get started
with embed_audio.ipynb.
Note that you can use Google Colab to get some nicer notebook layout. Copy the
notebook file into Google Drive and open it with Colab. Then use the
Connect to a Local Runtime option to connect to your Jupyter notebook server.
To allow simple substitution of different models, we provide an EmbeddingModel
interface, with a variety of implementations for common models (the model zoo).
This is described in further detail in chirp/projects/zoo/README.md.
The embed.py script contains a Beam pipeline for running an EmbeddingModel
on a large collection of input audio. The pipeline produces a database of examples
in TFRecord format.
Example configurations can be found in the chirp/configs/inference directory.
Currently configuration is handled by supplying a config name on the command
line (one of raw_soundscapes, separate_soundscapes or
birdnet_soundscapes). The corresponding configuration file in
chirp/configs/inference can be edited to provide the model location and
glob matching pattern for the target wav files.
The embedding script also includes a dry_run option which processes a single
file at random using the chosen configuration. This is useful for ensuring that
the model and data is configured properly before launching a large job.
Step-by-step:
-
Run the main repository's installation instructions.
-
Download and extract the Perch model from TFHub: https://tfhub.dev/google/bird-vocalization-classifier/
-
Adjust the inference raw_soundscapes config file:
-
Fill in
config.source_file_patternswith the path to some audio files. eg:config.source_file_patterns = ['/my/drive/*.wav'] -
Fill in the
model_checkpoint_pathwith the path of the model downloaded from TFHub. -
Fill in
config.output_dirwith the path where you would like to write the outputs. eg,config.output_dir = '/my/drive/embeddings' -
Adjust
config.shard_length_sandconfig.num_shards_per_fileaccording to your target data. We produce work-units for each audio file by breaking each file into parts according to these config values: Settingshard_length_sto 60 means each work unit will handle 60 seconds of audio from a given file. Settingnum_shards_per_fileto 15 will then produce a work-unit for each of the first 15 minutes of the audio. If the audio is less than 15 minutes, these extra work units will just do nothing. If the audio is more than 15 minutes long, the extra audio will not be used.
-
-
From the terminal, change directory to the main chirp repository, and use poetry to run the embed.py script:
poetry run python chirp/inference/embed.py --
The agile modeling python notebooks heavily rely on the code in this directory. The three parts of the workflow are embedding, search, and classification. The latter two parts generally require knowing where the embeddings are, how to join embeddings with their source audio, and how to display examples to users in Colab.
Embedding is handled by utilities in embed_lib.py. When embeddings are
computed, a configuration file is written beside the embeddings which indicates
the embedding model used and the audio file globs which were embedded.
For subsequent steps, we coordinate activity with search/bootstrap.py. First,
we create a bootstrap.BootstrapConfig, which collects info on the embedding
model, location of embeddings files, and audio glob (for connecting embeddings
with their source audio). The bootstrap.BootstrapConfig is then used to create
a bootstrap.BootstrapState, which includes an instantiated copy of the
embedding model, and is also used to create certain objects which depend heavily
on the configuration - such as the Tensorflow Dataset of embeddings, or an
iterator over audio files corresponding to embeddings.
Brute-force search is handled by bootstrap/search.py. This is optimized for
fast execution, and is fairly adaptable to new situations, such as searching
with a classifier, finding examples at a specific distance from the query,
or selecting random examples.
The search.TopKSearchResults object maintains a list of search.SearchResult,
both of which are also important for display.
Users may provide data in a 'folder-of-folders' format. The
classify.data_lib.py file contains utilities for loading embeddings of
labeled data into memory in a MergedDataset object. This object contains
everything needed for training small classifiers on top of embeddings.
Actual small-model training code is contained in classify/classify.py.
Displaying examples to the user generally requires connecting embeddings with
source audio. The bootstrap.BootstrapState is responsible for providing an
iterator (via search_results_audio_iterator) which iterates over and
search.TopKSearchResults object, attaching audio to each result. The iterator
provides results in the same order as they appear in the
search.TopKSearchResults object.
Each search.SearchResult object may have iPython widgets attached to it, such
as label buttons. These are used for obtaining user-provided labels, etc.