Introduction to Fastaudio
In this notebook, we will go through some of the basic API building blocks in fastaudio that you can use including:
- Loading Audio
- Preprocess audio by removing the silence and resampling
- Apply transforms directly over audio signal like random croping, shifting or adding noise
- Transform spectrograms with time and frequency masking (SpecAugment), calculate Delta
To access all of the transforms available, you only need one import:
from fastaudio.augment.all import *
from fastaudio.core.all import *
from fastai.data.all import *
We can load a test dataset using the untar_data method from fastai
speakers = untar_data(URLs.SAMPLE_SPEAKERS10)
speakers = get_audio_files(speakers)
speakers[0]
Then we can create and audio tensor which we can view and listen to using the show()
method
audio = AudioTensor.create(speakers[0])
audio.show()
If we want to see what the spectrogram for that looks like, we can create a AudioToSpec Transformer:
spectrogram = AudioToSpec.from_cfg(AudioConfig.Voice())(audio)
spectrogram.show()
Now lets go through some of the built in augmentations that we have in the library.
The silence can be easily removed
tfm = RemoveSilence()
tfm(audio).show()
Or else you can crop the first 500 ms
tfm = ResizeSignal(duration=500)
tfm(audio).show()
Now with the spectrogram
Masking is easy
tfm = MaskFreq(num_masks=3, size=5)
tfm(spectrogram).show()
And you can compose multiple transforms using Pipeline from fastai
from fastcore.transform import Pipeline
tfms = Pipeline([MaskFreq(), MaskTime()])
tfms(spectrogram).show()
For examples of loading a dataset and training. Check out the tutorial notebooks