Skip to content

Simple training tutorial

The objective of this tutorial is to show you the basics of the library and how it can be used to simplify the audio processing pipeline.

This page is generated from the corresponding jupyter notebook, that can be found on this folder

To install the library, uncomment and run this cell:

# !pip install fastaudio

COLAB USERS: Before you continue and import the lib, go to the Runtime menu and select Restart Runtime.

from fastai.vision.all import *
from fastaudio.core.all import *
from fastaudio.augment.all import *
from fastaudio.ci import skip_if_ci

ESC-50: Dataset for Environmental Sound Classification

#The first time this will download a dataset that is ~650mb
path = untar_data(URLs.ESC50, dest="ESC50")

The audio files are inside a subfolder audio/

(path/"audio").ls()
(#2000) [Path('ESC50/master/audio/5-198891-A-8.wav'),Path('ESC50/master/audio/3-128512-A-47.wav'),Path('ESC50/master/audio/4-234879-A-6.wav'),Path('ESC50/master/audio/3-100024-A-27.wav'),Path('ESC50/master/audio/5-263831-A-6.wav'),Path('ESC50/master/audio/1-22804-A-46.wav'),Path('ESC50/master/audio/2-117615-A-48.wav'),Path('ESC50/master/audio/5-221518-A-21.wav'),Path('ESC50/master/audio/2-43802-A-42.wav'),Path('ESC50/master/audio/5-194899-D-3.wav')...]

And there's another folder meta/ with some metadata about all the files and the labels

(path/"meta").ls()
(#2) [Path('ESC50/master/meta/esc50.csv'),Path('ESC50/master/meta/esc50-human.xlsx')]

Opening the metadata file

df = pd.read_csv(path/"meta"/"esc50.csv")
df.head()
filename fold target category esc10 src_file take
0 1-100032-A-0.wav 1 0 dog True 100032 A
1 1-100038-A-14.wav 1 14 chirping_birds False 100038 A
2 1-100210-A-36.wav 1 36 vacuum_cleaner False 100210 A
3 1-100210-B-36.wav 1 36 vacuum_cleaner False 100210 B
4 1-101296-A-19.wav 1 19 thunderstorm False 101296 A

Datablock and Dataloader preparation

# Helper function to split the data
def CrossValidationSplitter(col='fold', fold=1):
    "Split `items` (supposed to be a dataframe) by fold in `col`"
    def _inner(o):
        assert isinstance(o, pd.DataFrame), "ColSplitter only works when your items are a pandas DataFrame"
        col_values = o.iloc[:,col] if isinstance(col, int) else o[col]
        valid_idx = (col_values == fold).values.astype('bool')
        return IndexSplitter(mask2idxs(valid_idx))(o)
    return _inner

Creating the Audio to Spectrogram transform from a predefined config.

cfg = AudioConfig.BasicMelSpectrogram(n_fft=512)
a2s = AudioToSpec.from_cfg(cfg)

Creating the Datablock

auds = DataBlock(blocks = (AudioBlock, CategoryBlock),  
                 get_x = ColReader("filename", pref=path/"audio"), 
                 splitter = CrossValidationSplitter(fold=1),
                 batch_tfms = [a2s],
                 get_y = ColReader("category"))
dbunch = auds.dataloaders(df, bs=64)

Visualizing one batch of data. Notice that the title of each Spectrogram is the corresponding label.

dbunch.show_batch(figsize=(10, 5))

Learner and Training

While creating the learner, we need to indicate that our input spectrograms only have one channel. Besides that, it's the usual vision learner.

learn = cnn_learner(dbunch, 
            resnet18,
            n_in=1,  # <- This is the only audio specific modification here
            loss_func=CrossEntropyLossFlat(),
            metrics=[accuracy])
@skip_if_ci
def run_learner():
    # epochs are a bit longer due to the chosen melspectrogram settings
    learn.fine_tune(10)

# We only validate the model when running in CI
run_learner()
epoch train_loss valid_loss accuracy time
0 4.600243 2.509200 0.337500 00:06
epoch train_loss valid_loss accuracy time
0 2.468132 2.036752 0.475000 00:06
1 1.899496 1.583315 0.595000 00:06
2 1.355151 1.360205 0.635000 00:06
3 0.934699 1.210813 0.667500 00:06
4 0.641157 1.146413 0.685000 00:06
5 0.441391 1.131422 0.695000 00:06
6 0.296554 1.146941 0.692500 00:06
7 0.208189 1.139945 0.710000 00:06
8 0.149129 1.123868 0.692500 00:06
9 0.113467 1.133862 0.685000 00:06