Audio effect: Liveconvolver3

The convolution audio effect is traditionally used to sample a room to create artificial reverb. Others have used it extensively for creative purposes, for example convolving guitars with angle grinders and trains. The technology normally requires recording a sound, then analyzing it and then finally loading the analyzed impulse response (IR) into an effect to use it. The Liveconvolver3 let you live sample the impulse response and start convolving even before the recording is finished. 

In the context of the crossadaptive project, convolution can be a nice way of imprinting the characteristics of one audio source on another. The live sampling of the IR is necessary to facilitate using it in an improvised manner, reacting immediately to what is played here and now.

There are some aesthetic challenges, namely how to avoid everything turning into a (somewhat beautiful) mush. This is because in convolution all samples  of one sound is multiplied with every sample of the other sound. If we sample a long melodic line as the IR, a mere click of the toungue on the other audio channel will fire the whole melodic segment once. Several clicks will create separate echoes of the melody, and a coninuous sound will create literally thousands of echoes. What is nice is that only frequencies that the two signals have in common will come out of the process. So a light whisper will create a high frequency whispering melody (with the long IR described above), while a deep and resonant drone will just let those (spectral) parts of the IR through. Since the IR contains a recording not only of spectral content but also of its evolution over  time, it can lend spectrotemporal morphing features from one sound to another. To reduce the mushyness of the processed sound, we can enhance the transients and reduce the sustained parts of the input sound. Even though this kind of (exaggerated) transient designer processing might sound artificial on its own, it can work well in the context of convolutions. The current implementation, Liveconvolver3, does not include this kind of transient processing, but we have done this earlier so it will be easy to add.

There are also some technical challenges to using this technique in a live setting. These are related to amplitude control, and to the risk of feedback when playing on larger speaker systems. The feedback risk occurs because we are taking a spectral snapshop (the impulse response) of the room we are currently playing in (well, of an instrument in that room, but nevertheless, the room is there), then we process sound coming from (another source in) the same room. The output of the process will enhance those frequencies that the two sources have in common, hence the characteristics of the room (and the speaker system) will be amplified, and this generally creates the risk of feedback to arise. Once we have unwanted feedback with convolution, it will also generally take a while (a few seconds) to get rid of, since the nature of the process creates a revereb-like tail to every sound. To reduce the risk of feedback we use a very small frequency shift of the convolver output. This is not usually perceptible, but it disturbs the feedback chain sufficiently to significantly reduce the feedback potential.

The challenge of the overall amplitude control can be tackled by using the sum of all amplitudes in the IR as a normalization factor. This works reasonably well, and is how we do it in the liveconvolver. One obvious exeption being in the case where the IR and the input sound contains overlapping strong resonances (or single lone notes). Then we will get a lot of energy on those overlapping frequency regions, and very little else. We will work on algorithms to attempt normalization in these cases as well.

The effect

liveconvolver3_reaper_setup
Liveconvolver3 in an example setup in Reaper. Note the routing of the source signals to the two inputs of the effect (aux sends with pan).

The effect uses two separate audio inputs, one for the impulse response sampling, and one for the live input to be convolved.  We have made it as a stereo effect, but do not expect it to convolve a stereo input. It also creates a mono output in the current implementation (the same signal on both stereo outputs). In the figure we see two input sources. Track 1 receives external audio, and routes it to an aux send to the liveconvolver track, panned left so that it will enter only input 1 to the effect.. Track 2 receives external audio and similarly routes it to an aux send to the liveconvolver track, but panned right so the audio is only sent to input 2 of the effect.

The effect itself has contols for input level, highpass filtering (hpFreq), lowpass frequency (lpFreq) and output volume (convVolume). These controls basically do what the control name says. Then we have controls to set the start time (IR_start) of the impulse response (allow skipping a certain number of seconds into the recording), and the impulse response length (IR_length), determining how many seconds of the IR recording we want to use. There are also controls for fading the IR in and out. Without fading, we might experience clicks and pops in the output. The partition length sets the size of partitioned convolution, higher settings will require less CPU but will also make it respond slower. Usually just leave this at the default 2048. The big green button IR_record enables recording of an impulse response. The current max duration is 5.9 seconds at 44.1 kHz sampling rate. If the maximum duration is exceeded during recording, the recording simply stops and is treated as complete. The convolution process will keep running while recording, using parts of the newly recorded IR as they become available. The IR_release knob controls the amount of overlap between the new instances of convolution created during recording. When recording is done, we fall back to using just one instance again. Finally, the switch_inputs button let us (surprise!) switch the two inputs, so that input 1 will be the IR record and input 2 will be the convolver input. If you want to convolve a source with itself, you would first record an IR then switch the inputs so that the same source would be convolved with its own (previously recorded) IR. Finally, to reduce the potential of audio feedback, the f_shift control can be adjusted. This shifts the entire output upwards by the amount selected. Usually around 1 Hz is sufficient. Extreme settings will create artificial sounding effects and cascading delays.

Installation

The effect is written in the audio programming language Csound, and compiled into a VST plugin using a tool called Cabbage. The actual program code is just a small text file (a csd) that you can download here.

You will need to download Cabbage (the bleeding edge version can be found here), then open the csd file in Cabbage and export it as a plugin effect. Put the exported plugin somewhere in your VST path so that your favourite DAW can find it. Then you’re all set.

cabbage_export_liveconvolver
Export as plugin effect in Cabbage

 

Routing in other hosts

As a short update, I just came to think that some users might find it complicated to translate that Reaper routing setup to other hosts. I know a lot of people are using Ableton Live, so here’s a screenshot of how to route for the liveconvolver in Live:

liveconvolver3_live_setup
Example setup with the liveconvolver in Live

Note that

  • the aux sends are “post” (otherwise the sound would not go through the pan pot, and we need that).
  • Because the sends are post, the volume fader has to be up. We will probably not want to hear the direct unprocessed sound, so the “Audio To” selector on the channels is set to “Sends only”
  • Both input channels send to the same effect
  • The two input channel are panned hard left (ch 1) and hard right (ch 2)
  • The monitor selector for the channels is set to “in”, activating the input regardless of arm/recording

Whith all that set up, you can hit “IR_record” and record an IR (of the sound you have on channel 1). The convolver effect will be applied to the sound on channel 2.

 

Evolving Neural Networks for Cross-adaptive Audio Effects

I’m Iver Jordal and this is my first blog post here. I have studied music technology for approximately two years and computer science for almost five years. During the last 6 months I’ve been working on a specialization project which combines cross-adaptive audio effects and artificial intelligence methods. Øyvind Brandtsegg and Gunnar Tufte were my supervisors.

A significant part of the project has been about developing software that automatically finds interesting mappings (neural networks) from audio features to effect parameters. One thing that the software is capable of is making one sound similar to another sound by means of cross-adaptive audio effects. For example, it can process white noise so it sounds like a drum loop.

Drum loop (target sound):

White noise (input sound to be processed):

Since the software uses algorithms that are based on random processes to achieve its goal, the output varies from run to run. Here are three different output sounds:

These three sounds are basically white noise that have been processed by distortion and low-pass filter. The effect parameters were controlled dynamically in a way that made the output sound like the drum loop (target sound).

This software that I developed is open source, and can be obtained here:

https://github.com/iver56/cross-adaptive-audio

It includes an interactive tool that visualizes output data and lets you listen to the resulting sounds. It looks like this:

visualization-screenshot
For more details about the project and the inner workings of the software, check out the project report:

Evolving Artificial Neural Networks for Cross-adaptive Audio (PDF, 2.5 MB)

Abstract:

Cross-adaptive audio effects have many applications within music technology, including for automatic mixing and live music. The common methods of signal analysis capture the acoustical and mathematical features of the signal well, but struggle to capture the musical meaning. Together with the vast number of possible signal interactions, this makes manual exploration of signal mappings difficult and tedious. This project investigates Artificial Intelligence (AI) methods for finding useful signal interactions in cross-adaptive audio effects. A system for doing signal interaction experiments and evaluating their results has been implemented. Since the system produces lots of output data in various forms, a significant part of the project has been about developing an interactive visualization tool which makes it easier to evaluate results and understand what the system is doing. The overall goal of the system is to make one sound similar to another by applying audio effects. The parameters of the audio effects are controlled dynamically by the features of the other sound. The features are mapped to parameters by using evolved neural networks. NeuroEvolution of Augmenting Topologies (NEAT) is used for evolving neural networks that have the desired behavior. Several ways to measure fitness of a neural network have been developed and tested. Experiments show that a hybrid approach that combines local euclidean distance and Nondominated Sorting Genetic Algorithm II (NSGA-II) works well. In experiments with many features for neural input, Feature Selective NeuroEvolution of Augmenting Topologies (FS-NEAT) yields better results than NEAT.