Introduction to Fastaudio

In this notebook, we will go through some of the basic API building blocks in fastaudio that you can use including:

Loading Audio
Preprocess audio by removing the silence and resampling
Apply transforms directly over audio signal like random croping, shifting or adding noise
Transform spectrograms with time and frequency masking (SpecAugment), calculate Delta

To access all of the transforms available, you only need one import:

from fastaudio.augment.all import *

from fastaudio.core.all import *
from fastai.data.all import *

We can load a test dataset using the untar_data method from fastai

speakers = untar_data(URLs.SAMPLE_SPEAKERS10)
speakers = get_audio_files(speakers)
speakers[0]

Then we can create and audio tensor which we can view and listen to using the show() method

audio = AudioTensor.create(speakers[0])
audio.show()

<AxesSubplot:>

If we want to see what the spectrogram for that looks like, we can create a AudioToSpec Transformer:

spectrogram = AudioToSpec.from_cfg(AudioConfig.Voice())(audio)
spectrogram.show()

/home/harry/miniconda3/lib/python3.8/site-packages/librosa/display.py:974: MatplotlibDeprecationWarning: The 'basey' parameter of __init__() has been renamed 'base' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  scaler(mode, **kwargs)
/home/harry/miniconda3/lib/python3.8/site-packages/librosa/display.py:974: MatplotlibDeprecationWarning: The 'linthreshy' parameter of __init__() has been renamed 'linthresh' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  scaler(mode, **kwargs)

<AxesSubplot:>

Now lets go through some of the built in augmentations that we have in the library.

The silence can be easily removed

tfm = RemoveSilence()
tfm(audio).show()

Or else you can crop the first 500 ms

tfm = ResizeSignal(duration=500)
tfm(audio).show()

<AxesSubplot:>

Now with the spectrogram

Masking is easy

tfm = MaskFreq(num_masks=3, size=5)
tfm(spectrogram).show()

/home/harry/miniconda3/lib/python3.8/site-packages/librosa/display.py:974: MatplotlibDeprecationWarning: The 'basey' parameter of __init__() has been renamed 'base' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  scaler(mode, **kwargs)
/home/harry/miniconda3/lib/python3.8/site-packages/librosa/display.py:974: MatplotlibDeprecationWarning: The 'linthreshy' parameter of __init__() has been renamed 'linthresh' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  scaler(mode, **kwargs)

<AxesSubplot:>

And you can compose multiple transforms using Pipeline from fastai

from fastcore.transform import Pipeline

tfms = Pipeline([MaskFreq(), MaskTime()])
tfms(spectrogram).show()

<AxesSubplot:>

For examples of loading a dataset and training. Check out the tutorial notebooks