Introduction to Fastaudio

In this notebook, we will go through some of the basic API building blocks in fastaudio that you can use including:

  • Loading Audio
  • Preprocess audio by removing the silence and resampling
  • Apply transforms directly over audio signal like random croping, shifting or adding noise
  • Transform spectrograms with time and frequency masking (SpecAugment), calculate Delta

To access all of the transforms available, you only need one import:

from fastaudio.augment.all import *
from fastaudio.core.all import *
from fastai.data.all import *

We can load a test dataset using the untar_data method from fastai

speakers = untar_data(URLs.SAMPLE_SPEAKERS10)
speakers = get_audio_files(speakers)
speakers[0]

Then we can create and audio tensor which we can view and listen to using the show() method

audio = AudioTensor.create(speakers[0])
audio.show()
<AxesSubplot:>

If we want to see what the spectrogram for that looks like, we can create a AudioToSpec Transformer:

spectrogram = AudioToSpec.from_cfg(AudioConfig.Voice())(audio)
spectrogram.show()
/home/harry/miniconda3/lib/python3.8/site-packages/librosa/display.py:974: MatplotlibDeprecationWarning: The 'basey' parameter of __init__() has been renamed 'base' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  scaler(mode, **kwargs)
/home/harry/miniconda3/lib/python3.8/site-packages/librosa/display.py:974: MatplotlibDeprecationWarning: The 'linthreshy' parameter of __init__() has been renamed 'linthresh' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  scaler(mode, **kwargs)

<AxesSubplot:>

Now lets go through some of the built in augmentations that we have in the library.

The silence can be easily removed

tfm = RemoveSilence()
tfm(audio).show()

Or else you can crop the first 500 ms

tfm = ResizeSignal(duration=500)
tfm(audio).show()
<AxesSubplot:>

Now with the spectrogram

Masking is easy

tfm = MaskFreq(num_masks=3, size=5)
tfm(spectrogram).show()
/home/harry/miniconda3/lib/python3.8/site-packages/librosa/display.py:974: MatplotlibDeprecationWarning: The 'basey' parameter of __init__() has been renamed 'base' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  scaler(mode, **kwargs)
/home/harry/miniconda3/lib/python3.8/site-packages/librosa/display.py:974: MatplotlibDeprecationWarning: The 'linthreshy' parameter of __init__() has been renamed 'linthresh' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  scaler(mode, **kwargs)

<AxesSubplot:>

And you can compose multiple transforms using Pipeline from fastai

from fastcore.transform import Pipeline

tfms = Pipeline([MaskFreq(), MaskTime()])
tfms(spectrogram).show()
<AxesSubplot:>

For examples of loading a dataset and training. Check out the tutorial notebooks