Simple training tutorial

The objective of this tutorial is to show you the basics of the library and how it can be used to simplify the audio processing pipeline.

This page is generated from the corresponding jupyter notebook, that can be found on this folder

To install the library, uncomment and run this cell:

# !pip install fastaudio

COLAB USERS: Before you continue and import the lib, go to the Runtime menu and select Restart Runtime.

from fastai.vision.all import *
from fastaudio.core.all import *
from fastaudio.augment.all import *
from fastaudio.ci import skip_if_ci

ESC-50: Dataset for Environmental Sound Classification

#The first time this will download a dataset that is ~650mb
path = untar_data(URLs.ESC50, dest="ESC50")

The audio files are inside a subfolder audio/

(path/"audio").ls()

(#2000) [Path('ESC50/master/audio/5-198891-A-8.wav'),Path('ESC50/master/audio/3-128512-A-47.wav'),Path('ESC50/master/audio/4-234879-A-6.wav'),Path('ESC50/master/audio/3-100024-A-27.wav'),Path('ESC50/master/audio/5-263831-A-6.wav'),Path('ESC50/master/audio/1-22804-A-46.wav'),Path('ESC50/master/audio/2-117615-A-48.wav'),Path('ESC50/master/audio/5-221518-A-21.wav'),Path('ESC50/master/audio/2-43802-A-42.wav'),Path('ESC50/master/audio/5-194899-D-3.wav')...]

And there's another folder meta/ with some metadata about all the files and the labels

(path/"meta").ls()

(#2) [Path('ESC50/master/meta/esc50.csv'),Path('ESC50/master/meta/esc50-human.xlsx')]

Opening the metadata file

df = pd.read_csv(path/"meta"/"esc50.csv")
df.head()

	filename	fold	target	category	esc10	src_file	take
0	1-100032-A-0.wav	1	0	dog	True	100032	A
1	1-100038-A-14.wav	1	14	chirping_birds	False	100038	A
2	1-100210-A-36.wav	1	36	vacuum_cleaner	False	100210	A
3	1-100210-B-36.wav	1	36	vacuum_cleaner	False	100210	B
4	1-101296-A-19.wav	1	19	thunderstorm	False	101296	A

Datablock and Dataloader preparation

# Helper function to split the data
def CrossValidationSplitter(col='fold', fold=1):
    "Split `items` (supposed to be a dataframe) by fold in `col`"
    def _inner(o):
        assert isinstance(o, pd.DataFrame), "ColSplitter only works when your items are a pandas DataFrame"
        col_values = o.iloc[:,col] if isinstance(col, int) else o[col]
        valid_idx = (col_values == fold).values.astype('bool')
        return IndexSplitter(mask2idxs(valid_idx))(o)
    return _inner

Creating the Audio to Spectrogram transform from a predefined config.

cfg = AudioConfig.BasicMelSpectrogram(n_fft=512)
a2s = AudioToSpec.from_cfg(cfg)

Creating the Datablock

auds = DataBlock(blocks = (AudioBlock, CategoryBlock),  
                 get_x = ColReader("filename", pref=path/"audio"), 
                 splitter = CrossValidationSplitter(fold=1),
                 batch_tfms = [a2s],
                 get_y = ColReader("category"))

dbunch = auds.dataloaders(df, bs=64)

Visualizing one batch of data. Notice that the title of each Spectrogram is the corresponding label.

dbunch.show_batch(figsize=(10, 5))

Learner and Training

While creating the learner, we need to indicate that our input spectrograms only have one channel. Besides that, it's the usual vision learner.

learn = cnn_learner(dbunch, 
            resnet18,
            n_in=1,  # <- This is the only audio specific modification here
            loss_func=CrossEntropyLossFlat(),
            metrics=[accuracy])

@skip_if_ci
def run_learner():
    # epochs are a bit longer due to the chosen melspectrogram settings
    learn.fine_tune(10)

# We only validate the model when running in CI
run_learner()

epoch	train_loss	valid_loss	accuracy	time
0	4.600243	2.509200	0.337500	00:06

epoch	train_loss	valid_loss	accuracy	time
0	2.468132	2.036752	0.475000	00:06
1	1.899496	1.583315	0.595000	00:06
2	1.355151	1.360205	0.635000	00:06
3	0.934699	1.210813	0.667500	00:06
4	0.641157	1.146413	0.685000	00:06
5	0.441391	1.131422	0.695000	00:06
6	0.296554	1.146941	0.692500	00:06
7	0.208189	1.139945	0.710000	00:06
8	0.149129	1.123868	0.692500	00:06
9	0.113467	1.133862	0.685000	00:06