Adding new array operations to Csound II: the Mel-frequency filterbank

As I have discussed before in my previous post, as part of this project we have been selecting a number of useful operations to implement in Csound, as part of its array opcode collection. We have looked at the components necessary for the implementation of the Mel-frequency cepstrum coefficient (MFCC) analysis and in this post I will discuss the Mel-frequency filterbank as the final missing piece.

The word filterbank might be a little misleading in this context, as we will not necessarily implement a complete filter. We will design a set of weighting curves that will be applied to the power spectrum. From each one of this we will obtain an average value, which will be the output of the MFB at a given centre frequency. So from this perspective, the complete filter is actually made up of the power spectrum analysis and the MFB proper.

So what we need to do is the following:

  1. Find L evenly-space centre frequencies in the Mel scale (within a minimum and maximum range).
  2. Construct L overlapping triangle-shape curves, centred at each Mel-scale frequency.
  3. Apply each one of these curves to the power spectrum and averaging the result. These will be the outputs of the filterbank.

The power spectrum input comes as sequence of equally-spaced bins. So, to achieve the first step, we need to convert to/from the Mel scale, and also to be able to establish which bins will be equivalent to the centre frequencies of each triangular curve. We will show how this is done using the Python language as an example.

The following function converts from a frequency in Hz to a Mel-scale frequency.

import pylab as pl
def f2mel(f):
  return 1125.*pl.log(1.+f/700.)

mel

With this function, we can convert our start and end Mel values and linearly space the L filter centre frequencies. From these L Mel values, we can get the power spectrum bins using

def mel2bin(m,N,sr):
  f = 700.*(pl.exp(m/1125.) - 1.)
  return  int(f/(sr/(2*N)))

where m is the Mel frequency, N is the DFT size used and sr is the sampling rate. A list of bin numbers can be created, associating each L Mel centre frequency with a bin number

Step 2 is effectively based on creating ramps that will connect each bin in the list above. The following figure demonstrates the idea for L=10, and N=4096 (2048 bins)

mfb
Each triangle starts at a Mel frequency in the list, rises to the next, and decays to the following one (frequencies are quantised to bin centres). To obtain the output for each filter we weigh the bin values (spectral powers) by these curves and then output the average value for each band.

The Python code for the MFB operation is shown below:

def MFB(input,L,min,max,sr):
  """
  From a power spectrum in input, creates an array 
  consisting of L values containing
  its MFB, from a min to a max frequency sampled 
  at sr Hz.
  """
  N = len(input)
  start = f2mel(min)
  end = f2mel(max)
  incr = (end-start)/(L+1)
  bins = pl.zeros(L+2)
  for i in range(0,L+2):
    bins[i] = mel2bin(start,N-1,sr)
    start += incr
  output = pl.zeros(L)
  i = 0
  for i in range(0,L):
    sum = 0.0
    start = bins[i]
    mid = bins[i+1]
    end = bins[i+2]
    incr =  1.0/(mid - start)
    decr =  1.0/(end - mid)
    g = 0.0
    for bin in input[start:mid]:
      sum += bin*g
      g += incr
    g = 1.0
    for bin in input[mid:end]:
      sum += bin*g
      g -= decr
    output[i] = sum/(end - start) 
  return output

We can demonstrate the use of the MFB  by plotting the output of a N=4096, L=128 full-spectrum magnitude analysis of a flute tone.
mfb2

We can see how the MFB identifies clearly the signal harmonics. Of course, the original application we had in mind (MFCCs) is significantly different from this one, but this example shows what kinds of outputs we should expect from the MFB.

Evolving Neural Networks for Cross-adaptive Audio Effects

I’m Iver Jordal and this is my first blog post here. I have studied music technology for approximately two years and computer science for almost five years. During the last 6 months I’ve been working on a specialization project which combines cross-adaptive audio effects and artificial intelligence methods. Øyvind Brandtsegg and Gunnar Tufte were my supervisors.

A significant part of the project has been about developing software that automatically finds interesting mappings (neural networks) from audio features to effect parameters. One thing that the software is capable of is making one sound similar to another sound by means of cross-adaptive audio effects. For example, it can process white noise so it sounds like a drum loop.

Drum loop (target sound):

White noise (input sound to be processed):

Since the software uses algorithms that are based on random processes to achieve its goal, the output varies from run to run. Here are three different output sounds:

These three sounds are basically white noise that have been processed by distortion and low-pass filter. The effect parameters were controlled dynamically in a way that made the output sound like the drum loop (target sound).

This software that I developed is open source, and can be obtained here:

https://github.com/iver56/cross-adaptive-audio

It includes an interactive tool that visualizes output data and lets you listen to the resulting sounds. It looks like this:

visualization-screenshot
For more details about the project and the inner workings of the software, check out the project report:

Evolving Artificial Neural Networks for Cross-adaptive Audio (PDF, 2.5 MB)

Abstract:

Cross-adaptive audio effects have many applications within music technology, including for automatic mixing and live music. The common methods of signal analysis capture the acoustical and mathematical features of the signal well, but struggle to capture the musical meaning. Together with the vast number of possible signal interactions, this makes manual exploration of signal mappings difficult and tedious. This project investigates Artificial Intelligence (AI) methods for finding useful signal interactions in cross-adaptive audio effects. A system for doing signal interaction experiments and evaluating their results has been implemented. Since the system produces lots of output data in various forms, a significant part of the project has been about developing an interactive visualization tool which makes it easier to evaluate results and understand what the system is doing. The overall goal of the system is to make one sound similar to another by applying audio effects. The parameters of the audio effects are controlled dynamically by the features of the other sound. The features are mapped to parameters by using evolved neural networks. NeuroEvolution of Augmenting Topologies (NEAT) is used for evolving neural networks that have the desired behavior. Several ways to measure fitness of a neural network have been developed and tested. Experiments show that a hybrid approach that combines local euclidean distance and Nondominated Sorting Genetic Algorithm II (NSGA-II) works well. In experiments with many features for neural input, Feature Selective NeuroEvolution of Augmenting Topologies (FS-NEAT) yields better results than NEAT.

The Analyzer and MIDIator plugins

… so with all these DAW examples “where are the plugins” you might ask. Well, the most up-to-date versions will always be available in the code repo at github. BUT, I’ve also uploaded precompiled versions of the plugins for Windows and OSX. To install them you just unzip and put them somewhere in your VST plugin search path (typically /Program Files/Vstplugins on windows and /Library/Audio/Plugins/VST on OSX). You also need to install Cabbage to make them work. You can find the latest Cabbage versions here.

Simple analyzer-modulator setup for Reaper

reaper_screen1
Simple example Reaper project with Analyzer and MIDIator. Spectral flux of the input signal will control Reverb Room Size Since we use only one input and it modulates its own reverb, the setup is not crossadaptive, but rather just an adaptive effect. The basic setup can be extended by adding more input channels (and Analyzers) as needed. We do not need to add more MIDIators until we’ve used up all of its modulator channels.

Following up on the recent Ableton Live set, here’s a simple analyzer-modulator project for Reaper. The routing of signals is simpler and more flexible in Reaper, so we do not have the clutter of an extra channel to
enable MIDI out, rather we can select MIDI hardware output in the routing dialog for the MIDIator track. In Reaper,  the MIDIator plugin will start processing all by itself (no need to open its editing window to kick it to life). You need to enable a virtual MIDI device for input (remember to also enable input for control messages) and output. This is done in Reaper preferences / Midi hardware setttings.

The analyzer does not send audio throu, so we still need to use two input tracks; one for the analyzer and one for the actual audio input for processing. The MIDIator is set up to map the spectral flux of Analyzer channel 1 to Midi controller 11, channel 1.  The audio input is sent to a Reverb channel, and we’ve mapped the MIDI controller (11 on channel 1) to the Room size control of the reverb.

Simple analyzer-modulator setup for Ableton Live

live_screen1
Simple example Live set with Analyzer and MIDIator. Spectral flux of the input signal will control Reverb Decay Time. Since we use only one input and it modulates its own reverb, the setup is not crossadaptive, but rather just an adaptive effect. The basic setup can be extended by adding more input channels (and Analyzers) as needed. We do not need to add more MIDIators until we’ve used up all of its modulator channels.

I’ve created a simple Live set to show how to configure the analyzer and MIDIator in Ableton Live. There are some small snags and peculiarities (read on), but basically it runs ok.

The analyzer will not let audio through, so we use two audio input tracks. One track for the actual audio processing (just send to a reverb in our case), and one track for the analyzer. The MIDI processing also requires two tracks in Live, this is because Live assumes that a MIDI plugin will output audio so it disables the MIDI out routing when a plugin is present on a track. This is easily solved by creating a second MIDI track, and select MIDIator as the MIDI input to that track. From this track we can route MIDI out from Live. You will want to set the MIDI out to a virtual MIDI device (e.g. loopMIDI on windows, IAC bus on OSX). Enable midi input from the virtual midi device (you can enable Track, Sync and Remote for a MIDI input. We want to enable Remote)

In our example setup, we’ve enabled one modulator on the MIDIator. This is set to receive spectral flux from Analyzer 1, and send this modulator data to midi channel 1, controller 11. We’ve mapped this to Reverb Decay Time on the effect return track. Input sounds with high flux (which means that the sound is probably a bit noisy) will have long reverb. Sounds with low flux (probably a tonal or stable sound) will have short reverb.

On my computer (windows), I need to open the MIDIator editing window to force it to start processing. Take care *not* to close the MIDIator window, as it will somehow stop processing when the window is closed (This only happens in Ableton Live, not sure why). To get rid of the window, just click on something in another track. This will hide the MIDIator window without disabling it.