Simple training tutorial
The objective of this tutorial is to show you the basics of the library and how it can be used to simplify the audio processing pipeline.
This page is generated from the corresponding jupyter notebook, that can be found on this folder
To install the library, uncomment and run this cell:
# !pip install fastaudio
COLAB USERS: Before you continue and import the lib, go to the Runtime
menu and select Restart Runtime
.
from fastai.vision.all import *
from fastaudio.core.all import *
from fastaudio.augment.all import *
from fastaudio.ci import skip_if_ci
ESC-50: Dataset for Environmental Sound Classification
#The first time this will download a dataset that is ~650mb
path = untar_data(URLs.ESC50, dest="ESC50")
The audio files are inside a subfolder audio/
(path/"audio").ls()
And there's another folder meta/
with some metadata about all the files and the labels
(path/"meta").ls()
Opening the metadata file
df = pd.read_csv(path/"meta"/"esc50.csv")
df.head()
Datablock and Dataloader preparation
# Helper function to split the data
def CrossValidationSplitter(col='fold', fold=1):
"Split `items` (supposed to be a dataframe) by fold in `col`"
def _inner(o):
assert isinstance(o, pd.DataFrame), "ColSplitter only works when your items are a pandas DataFrame"
col_values = o.iloc[:,col] if isinstance(col, int) else o[col]
valid_idx = (col_values == fold).values.astype('bool')
return IndexSplitter(mask2idxs(valid_idx))(o)
return _inner
Creating the Audio to Spectrogram transform from a predefined config.
cfg = AudioConfig.BasicMelSpectrogram(n_fft=512)
a2s = AudioToSpec.from_cfg(cfg)
Creating the Datablock
auds = DataBlock(blocks = (AudioBlock, CategoryBlock),
get_x = ColReader("filename", pref=path/"audio"),
splitter = CrossValidationSplitter(fold=1),
batch_tfms = [a2s],
get_y = ColReader("category"))
dbunch = auds.dataloaders(df, bs=64)
Visualizing one batch of data. Notice that the title of each Spectrogram is the corresponding label.
dbunch.show_batch(figsize=(10, 5))
Learner and Training
While creating the learner, we need to indicate that our input spectrograms only have one channel. Besides that, it's the usual vision learner.
learn = cnn_learner(dbunch,
resnet18,
n_in=1, # <- This is the only audio specific modification here
loss_func=CrossEntropyLossFlat(),
metrics=[accuracy])
@skip_if_ci
def run_learner():
# epochs are a bit longer due to the chosen melspectrogram settings
learn.fine_tune(10)
# We only validate the model when running in CI
run_learner()