Cross adaptive session with 1st year jazz students, NTNU, March 7-8

This is a description of a session with first year jazz students at NTNU recorded March 7 and 8. The session was organized as part of the ensemble teaching that is given to jazz students at NTNU, and was meant to take care of both the learning outcomes from the normal ensemble teaching, and also aspects related to the cross adaptive project.


Håvard Aufles, Thea Ellingsen Grant, Erlend Vangen Kongstorp, Rino Sivathas, Øyvind Frøberg Mathisen, Jonas Enroth, Phillip Edwards Granly, Malin Dahl Ødegård and Mona Thu Ho Krogstad.

Processing musician:

Trond Engum

Video documentation:

Andreas Bergsland

Sound technician:
Thomas Henriksen


Video digest from the session:


Based on our earlier experiences with bleeding between microphones we located instruments in separate rooms. Since there was quit a big group of different performers it was important that changing set-up took as little time as possible. There was also prepared a system set-up beforehand based on the instruments in use. To gain an understanding of the project from the performer side as early in the process as possible we used the same four step chronology when introducing the performers to the set-up.

  1. Start with individual instruments trying different effects through live processing and decide together with the performers what effects most suitable to add to their instrument.
  2. Introducing the analyser and decide, based on input form the performers, which methods best suited for controlling different effects from their instrument.
  3. Introducing adaptive processing were one performer is controlling the effects on the other, and then repeat vice versa.
  4. Introducing cross-adaptive processing were all previous choices and mappings are opened up for both performers.


Session report:

Day 1. Tuesday 7th March

Trumpet and drums

Sound example 1: (Step 1) Trumpet live processed with two different effects, convolution (impulse response from water) and overdrive.


The performer was satisfied with the chosen effects, also because the two were quite different in sound quality. The overdrive was experienced as nice, but he would not like to have it present all the time. We decided to save these effects for later use on trumpet, and be aware of dynamic control on the overdrive.


Sound example 2: (Step 1) Drums live processed with dynamically changing delay and a pitch shift 2 octaves down. The performer found the chosen effects interesting, and the mapping was saved for later use.


Sound example 3: (Step 1) Before entering the analyser and adaptive processing we wanted to try playing together with the effects we had chosen to see if they blended well together. The trumpet player had some problems with hearing the drums during the performance, felt as they were a bit in the background. We found out that the direct sound of the drums was a bit low in the mix, and this was adjusted. We discussed that it is possible to make the direct sound of both instruments louder or softer depending what the performer wants to achieve.


Sound example 4. (Step 2/3) For this example we entered into the analyser using transient density on drums. This was tried out by showing the analyser at the same time as doing an accelerando on drums. This was then set up as an adaptive control from drums on the trumpet. For control, the trumpet player had a suggestion that the more transient density the less convolution effect was added to the trumpet (less send to a convolution effect with a recording of water). The reason for this was that it could make more sense to have more water on slow ambient parts than on the faster hectic parts. At the same time he suggested that the opposite should happen when adding overdrive to the trumpet by transient density meaning that the more transient density the more overdrive on the trumpet. During the first take a reverb was added to the overdrive in order to blend the sound more into the production. It felt like the dynamical control over the effects was a bit difficult because the water disappeared to easily, and the overdrive was introduced to easily. We agreed to fine-tune the dynamical control before doing the actual test that is present as sound example 4.


Sound example 5: For this example we changed roles and enabled the trumpet to control the drums (adaptive processing). We followed a suggestion from the trumpet player and used pitch as an analyses parameter. We decided to use this to control the delay effect on the drums. Low notes produced long gaps between delays, whereas high notes produced small gap between delays. This was maybe not the best solution for getting good dynamical control, but we decide to keep this anyway.


Sound example 6: Cross adaptive performance using the effects and control mappings introduced in example 4 and 5. This was a nice experience for the musicians. Even though it still felt a bit difficult to control it was experienced as musical meaningful. Drummer: “Nice to play a steady grove, and listen to how the trumpet changed the sound of my instrument”.


Vocals and piano

Sound example 7: We had now changed the instrumentation over to vocals and piano, and we started with a performance doing live processing on both instruments. The vocals were processed using two different effects using a delay, and convolution through a recording of small metal parts. The piano was processed using an overdrive and convolution through water.


Sound example 8: Cross adaptive performance where the piano was analysed by rhythmical consonance controlling the delay effect on vocals. The vocal was analysed by transient density controlling the convolution effect on the piano. Both musicians found this difficult, but musically meaningful. Sometimes the control aspect was experienced as counterintuitive to the musical intention. Pianist: It felt like there was a 3rd musician present.


Saxophone self-adaptive processing

Sound example 9: We started with a performance doing live processing to familiarize the performer with the effects. The performer found the augmentation of extended techniques as clicks and pops interesting since this magnified “small” sounds.


Sound example 10: Self-adaptive processing performances where the saxophone was analysed by transient density and then used to control two different convolution effects (recording of metal parts and recording of a cymbal). The first one resulting in a delay effect the second as a reverb. The higher transient density in the analyses the more delay and less reverb and vice versa. The performer experienced the quality of the effects quit similar so we removed the delay effect.


Sound example 11: Self-adaptive processing performances using the same set-up but changing the delay effect to overdrive. The use of overdrive on saxophone did not bring anything new to the table the way it was set up since the acoustic sound of the instrument could sound similar to the effect when putting in strong energy.


Day 2. Wednesday 8th March


Saxophone and piano

Sound example 12: Performance with saxophone and live processing, familiarizing the performer with the different effects and then choose which of the effects to bring further into the session. Performer found this interesting and wanted to continue with reverb ideas.


Sound example 13: Performance with piano and live processing. The performer especially liked the last part with the delays – Saxophonist: “It was like listening to the sound under water (convolution with water) sometimes, and sometimes like listening to an old radio (overdrive)”. Piano wanted to keep the effects that were introduced.


Sound example 14: Adaptive processing, controlling delay on saxophone from the piano by using analyses of the transient density. The higher transient density, the larger gap between delays on the saxophone. The saxophone player found it difficult to interact since the piano had a clean sound during performance. The piano on the other hand felt in control over the effect that was added.


Sound example 15: Adaptive processing using saxophone to control piano. We analyzed the rhythmical consonance on saxophone. The higher degree of consonance, the more convolution effect (water) was added to piano and vice versa. Saxophone didn’t feel in control during performance, and guessed it was due to not holding a steady rhythm over a longer period. The direct sound of the piano was also a bit loud in the mix making the added effect a bit low in the mix. Piano felt that saxophone was in control, but agreed to the point that the analyses was not able to read to the limit because of the lack of a steady rhythm over a longer time period.


Sound example 16: Crossadptive performance using the same set-up as in example 14 and 15. Both performers felt in control, and started to explore more of the possibilities. Interesting point when the saxophone stops to play since the rhythmical consonance analyses will make a drop as soon as it starts to read again. This could result in strong musical statements.


Sound example 17: Crossadaptive performance keeping the same setting but adding rms analyses on the saxophone to control a delay on the piano (the higher rms the less delay and vice versa).


Vocals and electric guitar

Sound example 18: Performance with vocals and live processing. Vocalist: “It is fun, but something you need to get use to, needs a lot of time”.


Sound example 19: Performance with Guitar and live processing. Guitarist: “Adapted to the effects, my direct sound probably sounds terrible, feel that I`m loosing my touch, but feels complementary and a nice experience”.


Sound example 20: Performance with adaptive processing. Analyzing the guitar using rms and transient density. The higher transient density the more delay added to the vocal, and higher rms the less reverb added to the vocal. Guitar: I feel like a remote controller and it is hard to focus on what I play sometimes. Vocalist: “Feels like a two dimensional way of playing”.


Sound example 21: Performance with adaptive processing. Controlling the guitar by vocals. Analyzing the rhythmical consonance on the vocal to control the time gap between delays inserted on the guitar. Higher rhythmical consonance results in larger gaps and vice versa. The transient density on vocal controls the amount of pitch shift added to the guitar. The higher transient density the less volume is sent to the pitch shift.


Sound example 22: Performance with cross adaptive processing using the same settings as in sound example 20 and 21.

Vocalist: “It is another way of making music, I think”. Guitarist: “I feel control and I feel my impact, but musical intention really doesn’t fit with what is happening – which is an interesting parameter. Changing so much with doing so little is cool”.


Observation and reflections

The sessions has now come to a point were there is less time used on setting up and figuring out how the functionality in the software works, and more time used on actual testing. This is an important step taking in consideration working with musicians that are introduced to the concept the first time. A good stability in software and separation between microphones makes the workflow much more effective. It still took some time to set up everything the first day due to two system crashes, the first one related to the midiator, the second one related to video streaming.


Since preparing the system beforehand there was a lot of reuse both concerning analyzing methods and the choice of effects. Even though there were a lot of reuse on the technical side the performances and results has a large variety in expressions. Even though this is not surprising we think it is an important aspect to be reminded of during the project.


Another technical workaround that was discussed concerning the analyzing stage was the possibility to operate with two different microphones on the same instrument. The idea is then to use one for reading analyses, and one for capturing the “total” sound of the instrument for use in processing. This will of course depend on which analyzing parameter in use, but will surely help for a more dynamical reading in some situations both concerning bleeding, but also for closer focus on wanted attributes.


The pedagogical approach using the four-step introduction was experienced as fruitful when introducing the concept to musicians for the first time. This helped the understanding during the process and therefor resulted in more fruitful discussions and reflections between the performers during the session. Starting with live processing says something about possibilities and flexible control over different effects early in the process, and gives the performers a possibility to be a part of deciding aesthetics and building a framework before entering the control aspect.


Quotes from the the performers:

Guitarist: “Totally different experience”. “Felt best when I just let go, but that is the hardest part”. “It feels like I’m a midi controller”. “… Hard to focus on what I’m playing”. “Would like to try out more extreme mappings”

Vocalist: “The product is so different because small things can do dramatic changes”. “Musical intention crashes with control”. “It feels like a 2-dimensional way of playing”

Pianoist: “Feels like an extra musician”




Convolution experiments with Jordan Morton

Jordan Morton is a bassist and singer, she regularly performs using both instruments combined. This provides an opportunity to explore how the liveconvolver can work when both the IR and the live input are generated by the same musician. We did a session at UCSD on February 22nd. Here are some reflections and audio excerpts from that session.

General reflections

As compared with playing with live processing, Jordan felt it was more “up to her” to make sensible use of the convolution instrument. With live processing being controlled by another musician, there is also a creative input from another source. In general, electronic additions to the instrument can sometimes add unexpected but desirable aspects to the performance. With live convolution where she is providing both signals, there is a triple (or quadruple) challenge: She needs to decide what to play on the bass, what to sing, explore how those two signals work together when convolved, and finally make it all work as a combined musical statement. It appears this is all manageable, but she’s not getting much help from the outside. In some ways, working with convolution could be compared to looping and overdubs, except the convolution is not static. One can overlay phrases and segments by recording them as IR’s, while shaping their spectral and temporal contour with the triggering sound (the one being convolved with the IR).
Jordan felt it easier to play bass through the vocal IR than the other way around. She tend to lead with the bass when playing acoustic on bass + vocals. The vocals are more an additional timbre added to complete harmonies etc with the bass providing the ground. Maybe the instrument playing through the IR has the opportunity of more actively shaping the musical outcome, while the IR record source is more a “provider” of an environment for the other to actively explore?
In some ways it can seem easier to manage the roles roles (of IR provider and convolution source) as one person than splitting the incentive among two performers. The roles becomes more separated when they are split between different performers than when one person has both roles and then switches between them. When having both roles, it can be easier to explore the nuances of each role. Possible to test out musical incentives by doing this here and then this there, instead of relying on the other person to immediately understand (for example to *keep* the IR, or to *replace* it *now*).

Technical issues

We explored transient triggered IR recording, but had a significant acoustic bleed from bass into the vocal microphone, which made clean transient trigging a bit difficult. A reliable transient triggered recording would be very convenient, as it would allow the performer to “just play”. We tried using manual triggering, controlled by Oeyvind. This works reliably but involves some guesswork as to what is intended to be recorded. As mentioned earlier (e.g. in the first Olso session), we could wish for a foot pedal trigger or other controller directly operated by the performer. Hey it’s easy to do, let’s just add one for next time.
We also explored continuous IR updates based on a metronome trigger. This allows periodic IR updates, in a seemingly streaming fashion. Jordan asked for an indication of the metronomic tempo for the updates, which is perfectly reasonable and would be a good idea to do (although had not been implemented yet). One distinct difference noted when using periodic IR updates is that the IR is always replaced. Thus, it is not possible to “linger” on an IR and explore the character of some interesting part of it. One could simulate such exploration by continuously re-recording similar sounds, but it might be more fruitful to have the ability to “hold” the IR, preventing updates while exploring one particular IR. This hold trigger could reasonably also be placed on a footswitch or other accessible control for the performer.

Audio excerpts

jordan1 jordan1

Take 1: Vocal IR, recording triggered by transient detection.


jordan2 jordan2

Take 2: Vocal IR, manually triggered recording 


jordan3 jordan3

Take 3: Vocal IR, periodic automatic trigger of IR recording.


jordan4 jordan4

Take 4: Vocal IR, periodic automatic trigger of IR recording (same setup as for take 3)


jordan5 jordan5

Take 5: Bass IR, transient triggered recording. Transient triggering worked much cleaner on the bass since there was less signal bleed from voice to bass than vice versa.

Crossadaptive session NTNU 12. December 2016


Trond Engum (processing musician)

Tone Åse (vocals)

Carl Haakon Waadeland (drums and percussion)

Andreas Bergsland (video)

Thomas Henriksen (sound technician)

Video digest from session:

Session objective and focus:

The main focus in this session was to explore other analysing methods than used in earlier sessions (focus on rhythmic consonance for the drums, and spectral crest on the vocals). These analysing methods were chosen to get a wider understanding of their technical functionality, but also their possible use in musical interplay. In addition to this there was an intention to include the sample/hold function for the MIDIator plug-in. The session was also set up with a large screen in the live room to monitor the processing instrument to all participants at all times. The idea was to democratize the processing musician role during the session to open up for discussion and tuning of the system as a collective process based on a mutual understanding. This would hopefully communicate a better understanding of the functionality in the system, and how the musicians individually can navigate within it through their musical input. At the same time this also opens up for a closer dialog around choice of effects and parameter mapping during the process.

Earlier experiences and process

Following up on experiences documented through earlier sessions and previous blog posts, the session was prepared to avoid the most obvious shortcomings. First of all, separation between instruments to avoid bleeding through microphones was arranged by placing vocals and drums in separate rooms. Bleeding between microphones was earlier affecting both the analysed signals and effects. The system was prepared to be as flexible as possible beforehand containing several effects to map to, flexibility in this context meaning the possibility to do fast changes and tuning the system depending on the thoughts from the musicians. Since the group of musicians remained unchanged during the session this flexibility was also seen as a necessity to go into details and more subtle changes both in the MIDIator and the effects in play to reach common aesthetical intentions.

Due to technical problems in the studio (not connected with the cross adaptive set up or software) the session was delayed for several hours resulting in shorter time than originally planned. We therefore made a choice to concentrate only on rhythmic consonance (referred to as rhythmical regularity in the video) as analysing method for both drums and vocals. The method we used to familiarize with this analysing tool was that we started with drums trying out different playing techniques with both regular and irregular strokes while monitoring the visual feedback from the analyser plug-in without any effect. Regular strokes in this case resulting in high stable value, irregular strokes resulting in low value.


Figure 1. Consonance (regularity) visualized in the upper graph.

What became evident was that when the input stopped, the analyser stayed at the last measured value, and in that way could act as a sort of sample/hold function on the last value and in that sense stabilise a setting in an effect until an input was introduced again. Another aspect was that the analysing method worked well for regularity in rhythm, but had more unpredictable behaviour when introducing irregularity.

After learning the analyser behaviour this was further mapped to a delay plugging as an adaptive effect on the drums. The parameter controlled the time range of 14 delays resulting in larger delay time range the more regularity, and vice versa.

After fine-tuning the delay range we agreed that the connection between the analyser, MIDIator and choice of effect worked musically in the same direction. (This was changed later in the session when trying out cross-adaptive processing).

The same procedure was followed when trying vocals, but then concentrating the visual monitoring mostly on the last stage of the chain, the delay effect. This was experienced as more intuitive when all settings were mapped since the musician then could interact visually with the input during performance.

Cross-adaptive processing.

When starting the cross-adaptive recording everyone had followed the process, and tried out the chosen analysing method on own instruments. Even though the focus was mainly on the technical aspects the process had already given the musicians the possibility to rehearse and get familiar with the system.

The system we ended up with was set up in the following way:

Both drums and vocals was analysed by rhythmical consonance (regularity). The drums controlled the send volume to a convolution reverb and a pitch shifter on the vocals. The more regular drums the less of the effects, the less regular drums the more of the effects.

The vocals controlled the time range in the echo plugin on the drums. The more regular pulses from the vocal the less echo time range on the drums, the less regular pulses from the vocals the larger echo time range on the drums.

Sound example (improvisation with cross adaptive setup): 


Adding new array operations to Csound II: the Mel-frequency filterbank

As I have discussed before in my previous post, as part of this project we have been selecting a number of useful operations to implement in Csound, as part of its array opcode collection. We have looked at the components necessary for the implementation of the Mel-frequency cepstrum coefficient (MFCC) analysis and in this post I will discuss the Mel-frequency filterbank as the final missing piece.

The word filterbank might be a little misleading in this context, as we will not necessarily implement a complete filter. We will design a set of weighting curves that will be applied to the power spectrum. From each one of this we will obtain an average value, which will be the output of the MFB at a given centre frequency. So from this perspective, the complete filter is actually made up of the power spectrum analysis and the MFB proper.

So what we need to do is the following:

  1. Find L evenly-space centre frequencies in the Mel scale (within a minimum and maximum range).
  2. Construct L overlapping triangle-shape curves, centred at each Mel-scale frequency.
  3. Apply each one of these curves to the power spectrum and averaging the result. These will be the outputs of the filterbank.

The power spectrum input comes as sequence of equally-spaced bins. So, to achieve the first step, we need to convert to/from the Mel scale, and also to be able to establish which bins will be equivalent to the centre frequencies of each triangular curve. We will show how this is done using the Python language as an example.

The following function converts from a frequency in Hz to a Mel-scale frequency.

import pylab as pl
def f2mel(f):
  return 1125.*pl.log(1.+f/700.)


With this function, we can convert our start and end Mel values and linearly space the L filter centre frequencies. From these L Mel values, we can get the power spectrum bins using

def mel2bin(m,N,sr):
  f = 700.*(pl.exp(m/1125.) - 1.)
  return  int(f/(sr/(2*N)))

where m is the Mel frequency, N is the DFT size used and sr is the sampling rate. A list of bin numbers can be created, associating each L Mel centre frequency with a bin number

Step 2 is effectively based on creating ramps that will connect each bin in the list above. The following figure demonstrates the idea for L=10, and N=4096 (2048 bins)

Each triangle starts at a Mel frequency in the list, rises to the next, and decays to the following one (frequencies are quantised to bin centres). To obtain the output for each filter we weigh the bin values (spectral powers) by these curves and then output the average value for each band.

The Python code for the MFB operation is shown below:

def MFB(input,L,min,max,sr):
  From a power spectrum in input, creates an array 
  consisting of L values containing
  its MFB, from a min to a max frequency sampled 
  at sr Hz.
  N = len(input)
  start = f2mel(min)
  end = f2mel(max)
  incr = (end-start)/(L+1)
  bins = pl.zeros(L+2)
  for i in range(0,L+2):
    bins[i] = mel2bin(start,N-1,sr)
    start += incr
  output = pl.zeros(L)
  i = 0
  for i in range(0,L):
    sum = 0.0
    start = bins[i]
    mid = bins[i+1]
    end = bins[i+2]
    incr =  1.0/(mid - start)
    decr =  1.0/(end - mid)
    g = 0.0
    for bin in input[start:mid]:
      sum += bin*g
      g += incr
    g = 1.0
    for bin in input[mid:end]:
      sum += bin*g
      g -= decr
    output[i] = sum/(end - start) 
  return output

We can demonstrate the use of the MFB  by plotting the output of a N=4096, L=128 full-spectrum magnitude analysis of a flute tone.

We can see how the MFB identifies clearly the signal harmonics. Of course, the original application we had in mind (MFCCs) is significantly different from this one, but this example shows what kinds of outputs we should expect from the MFB.

Evolving Neural Networks for Cross-adaptive Audio Effects

I’m Iver Jordal and this is my first blog post here. I have studied music technology for approximately two years and computer science for almost five years. During the last 6 months I’ve been working on a specialization project which combines cross-adaptive audio effects and artificial intelligence methods. Øyvind Brandtsegg and Gunnar Tufte were my supervisors.

A significant part of the project has been about developing software that automatically finds interesting mappings (neural networks) from audio features to effect parameters. One thing that the software is capable of is making one sound similar to another sound by means of cross-adaptive audio effects. For example, it can process white noise so it sounds like a drum loop.

Drum loop (target sound):

White noise (input sound to be processed):

Since the software uses algorithms that are based on random processes to achieve its goal, the output varies from run to run. Here are three different output sounds:

These three sounds are basically white noise that have been processed by distortion and low-pass filter. The effect parameters were controlled dynamically in a way that made the output sound like the drum loop (target sound).

This software that I developed is open source, and can be obtained here:

It includes an interactive tool that visualizes output data and lets you listen to the resulting sounds. It looks like this:

For more details about the project and the inner workings of the software, check out the project report:

Evolving Artificial Neural Networks for Cross-adaptive Audio (PDF, 2.5 MB)


Cross-adaptive audio effects have many applications within music technology, including for automatic mixing and live music. The common methods of signal analysis capture the acoustical and mathematical features of the signal well, but struggle to capture the musical meaning. Together with the vast number of possible signal interactions, this makes manual exploration of signal mappings difficult and tedious. This project investigates Artificial Intelligence (AI) methods for finding useful signal interactions in cross-adaptive audio effects. A system for doing signal interaction experiments and evaluating their results has been implemented. Since the system produces lots of output data in various forms, a significant part of the project has been about developing an interactive visualization tool which makes it easier to evaluate results and understand what the system is doing. The overall goal of the system is to make one sound similar to another by applying audio effects. The parameters of the audio effects are controlled dynamically by the features of the other sound. The features are mapped to parameters by using evolved neural networks. NeuroEvolution of Augmenting Topologies (NEAT) is used for evolving neural networks that have the desired behavior. Several ways to measure fitness of a neural network have been developed and tested. Experiments show that a hybrid approach that combines local euclidean distance and Nondominated Sorting Genetic Algorithm II (NSGA-II) works well. In experiments with many features for neural input, Feature Selective NeuroEvolution of Augmenting Topologies (FS-NEAT) yields better results than NEAT.