Audio Classification with perl API

The MXNet forum does not allow more than 2 links for a new user. Reference the README for the code for all links.

This is intended to implement a CNN for audio classification of voice and data transmissions.

The MXNet perl API is used to classify audio files (currently 2 categories). Results so far are good. With my simple requirements and minimal test data there is 100% correct classificaiton.

The input is radio transmissions (.wav) that represent either a human speaking or a data transmission. Previously, I have been doing this classification with SoX voice detection function (much lower success rate).

Unlike Gluon Audio which uses librosa to extract MFCCs I am creating spectrograms (png image files) as input to the network. I would like to use the Gluon Audio approach however it is currently dependent on librosa which is python only. Gluon Audio mentions MXNet FFT operator on CPU as a possible future replacement for this dependency. So hopefully this can be used at some point.

Although the the use of machine learning for my requirements is probably overkill I plan on expanding the categories/capability in the future.

It would be great if this helps anyone like the examples below helped me. I am open to any feedback.

To create training data

WAV file -> extract middle second -> generate spectrogram PNG

Currently ffmpeg is used to generate spectrograms outside the training process. Training data is created via a seperate program that uses metadata from database and audio files from disk. Spectrograms are generated like so:

/usr/bin/ffmpeg -i audio.wav -lavfi showspectrumpic=s=100x50:scale=log:legend=off audio.png

The spectrograms should be placed in a folder structure as documented in ImageFolderDataset.

Dependencies

  • MXNet pull request against ImageFolderDataset
  • ffmpeg

Based on these examples

  • Sergey Kolychev’s mnist.pl
  • Sergey Kolychev’s Machine learning in Perl, Part3
  • Eryk Wdowiak’s MXNet in Perl

Hi @John,

Thanks for sharing your project. Spectrogram => CNN for audio classification is indeed a sound approach to audio classification and I am glad you’ve had good success with it and the perl API. Good luck with your project.