MIDIator mix methods

The first instances of the MIDIator could sum two analysis signals, with separate scaling and sign inversion for each one. Recently, we’ve added two new methods for mixing those two signals, so it warrants this post to explain how it works and which problems it is intended to solve.

For all mix methods there is a MIDI output configuration to the right. All modules output MIDI controller values. We can set the MIDI channel and controller number, enable/disable the module, and also enter notes about the mapping/use of the module. Notes will be saved with the project.

Do take care however, for your own records, take a screenshot of your settings and save this with your project. The plugins can and will change during the further development of the project. If this leads to changes in the GUI configuration (i.e. the number of user interface elements) changes, there is a high probability that not everything will be recalled correctly. In that case you must reconstruct your settings from the (previously saved) screenshot. 

Add

Each of the two signals have separate filtering, with separate setting for the rise and fall times. The two signals are scaled (with scale range being from -1 to + 1), and then added together. If both signals are scaled positively, each of them affects the output positively. If one is scaled negatively and the other positively, more complex interactions between them will form. For example, with rms (amplitude) being scaled negatively while transient density is scaled positively; the output will increase with high transient density, but only if we are not playing loud.

midiator_add
Example of the “add” mix method. Higher amplitude (rms) will decrease the output, while higher transient density will increase the output. This means that soft fast playing will produce the highest possible output with these settings.

Abs_diff

This somewhat cryptic term refers to the absolute difference between two signals. We can use it to create an interaction model between two signals, where the output goes high only if the two analysis signals are very diffferent. For example, if we analyze amplitude (rms) from two different musicians, the resulting signal will be low as long as they both play in the same dynamic register. If one plays loud while the other plays soft, the output will be high, regardless of which of the two plays loud. It could also of course be applied to two different analysis signals from the same musician, for example the difference between pitch and the spectral centroid.

midiator_absdiff
Example of the abd_diff mix method. Here we take the difference in amplitude between two different acoustic sources. The higher the difference, the higher the output, regardless of which of the two inputs are loudest.

Gate

The gate mixmethod can be used to turn things on or off. It can also be used to enable/disable the processing of another MIDIator module, effectively acting as a sample and hold gate. The two input channels now are used for different purposes: one channel turns the gate on, the other channel turns the trigger off. Each channel has a separate activation threshold (and selection if the signal must pass the threshold moving upwards or downwards to activate). For simple purposes, this can act like a Schmitt trigger, also termed hysteresis in some applications. This can be used to reduce jitter noise in the output, since the activation and deactivation thresholds can be different.

midiator_gate_same
Gate mixmethod used to create a Schmitt trigger. The input signal must go higher than the activation threshold to turn on. Then it will stay on until the input signal crosses the lower deactivation threshold.

It can also be used to create more untraditional gates. A simple variation let us create a gate that is activated only if the input signal is within a specified band. To do this, the activation threshold must be lower than the deactivation threshold, like this:

midiator_gate_same_band
Band-activated gate. The gate will be activated once the signal crosses the (low) activation threshold. Then it will be turned off once the signal crosses the (higher) deactivation threshold in an upward direction. To activate again, the signal must go lower than the activation threshold.

The up/down triggers can be adjusted to fine tune how the gate responds to the input signal. For example, looking at the band-activated gate above: If we change the deactivation trigger to “down”, then the gate will only turn off after the signal has been higher than the deactivation threshold and then is moving downwards.

So far, we’ve only looked at examples where the two input signals to the gate is the same signal. Since the two input signals can be diffeerent (even come from two different acoustic source, highly intricate gate behaviour can be constructed. Even though the conception of such signal-interdependent gates can be complex (inventing which signals could interact in a meaningful way), the actual operation of the gate is technically no different. Just for the case of the example, here’s a gate that will turn on if the transient density goes high, then it will turn off when the pitch goes high. To activate the gate again, the transient density must first go low, then high.

midiator_gate_different
Gate with different activation and deactivation signals. Transient density will activate the gate, while pitch will deactivate it.

Sample and hold:

The gate mixmethod also can affect the operation of another MIDIator module. is currently hardcoded, so that it will only affect the next module (the one right below the gate). This means that, when the gate is on, the next MIDIator module will work as normal, but when the gate is turned off it will retain the value it has reached at the moment the gate is turned off. In traditional signal processing terms: sample and hold. To enable this function, turn on the button labeled “s/h”.

midiator_gate_sh
The topmost of these two modules agt as a sample and hold gate for the lower module. The lower module mapping amplitude to positively affect the output value, but is only enabled when the topmost gate is activated. The situation in the figure shows the gate enabled.

 

Evolving Neural Networks for Cross-adaptive Audio Effects

I’m Iver Jordal and this is my first blog post here. I have studied music technology for approximately two years and computer science for almost five years. During the last 6 months I’ve been working on a specialization project which combines cross-adaptive audio effects and artificial intelligence methods. Øyvind Brandtsegg and Gunnar Tufte were my supervisors.

A significant part of the project has been about developing software that automatically finds interesting mappings (neural networks) from audio features to effect parameters. One thing that the software is capable of is making one sound similar to another sound by means of cross-adaptive audio effects. For example, it can process white noise so it sounds like a drum loop.

Drum loop (target sound):

White noise (input sound to be processed):

Since the software uses algorithms that are based on random processes to achieve its goal, the output varies from run to run. Here are three different output sounds:

These three sounds are basically white noise that have been processed by distortion and low-pass filter. The effect parameters were controlled dynamically in a way that made the output sound like the drum loop (target sound).

This software that I developed is open source, and can be obtained here:

https://github.com/iver56/cross-adaptive-audio

It includes an interactive tool that visualizes output data and lets you listen to the resulting sounds. It looks like this:

visualization-screenshot
For more details about the project and the inner workings of the software, check out the project report:

Evolving Artificial Neural Networks for Cross-adaptive Audio (PDF, 2.5 MB)

Abstract:

Cross-adaptive audio effects have many applications within music technology, including for automatic mixing and live music. The common methods of signal analysis capture the acoustical and mathematical features of the signal well, but struggle to capture the musical meaning. Together with the vast number of possible signal interactions, this makes manual exploration of signal mappings difficult and tedious. This project investigates Artificial Intelligence (AI) methods for finding useful signal interactions in cross-adaptive audio effects. A system for doing signal interaction experiments and evaluating their results has been implemented. Since the system produces lots of output data in various forms, a significant part of the project has been about developing an interactive visualization tool which makes it easier to evaluate results and understand what the system is doing. The overall goal of the system is to make one sound similar to another by applying audio effects. The parameters of the audio effects are controlled dynamically by the features of the other sound. The features are mapped to parameters by using evolved neural networks. NeuroEvolution of Augmenting Topologies (NEAT) is used for evolving neural networks that have the desired behavior. Several ways to measure fitness of a neural network have been developed and tested. Experiments show that a hybrid approach that combines local euclidean distance and Nondominated Sorting Genetic Algorithm II (NSGA-II) works well. In experiments with many features for neural input, Feature Selective NeuroEvolution of Augmenting Topologies (FS-NEAT) yields better results than NEAT.

Mixing with Gary

During our week in London we had some sessions with Gary Bromham, first at the Academy of Contemporary Music in Guildford on the June 7th , then at QMUL later in the week. We wanted to experiment with cross-adpative techniques in a traditional mixing session. Using our tools/plugins within a Logic session to work similar to the traditional sidechaining, but with the expanded palette of analysis and modulator mappings enabled by our tools developed in the project. Initially we tried to set this up with Logic as the DAW. It kind of works, but seems utterly unreliable. Logic would not respond to learned controller mappings after we close the session and reopen it. It does receive the MIDI controller signal (and can re-learn) but in all cases refuse to respond to the received automation control. In the end we abandoned Logic altogether and went for our safe always-does-the-job Reaper.

As the test session for our experiments we used Sheryl Crow “Soak up”, using stems for the backing tracks and the vocals.

 2016_6_soakup mix1 pitchreverb

Example 1: Vocal pitch to reverb send and reverb decay time.

 2016_6_soakup mix2 experiment

Example 2: Vocal pitch as above. Adding vocal flux to hi cut frequency for the rest of the band. Rhythmic analysis (transient density) of the backing track controls a peaking EQ sweep on the vocals, creating a sweeping effect somewhat like a phaser. This is all somewhat odd all together, but useful as an controlled experiment in polyphonic crossadaptive modulation. The

* First thing Gary ask is to process one track according to the energy in a specific frequency band in another. For example “if I remove 150Hz on the bass drum, I want it to be added to the bass guitar”.  Now, it is not so easy to analyze what is missing, but easier to analyze what is there. So we thought of another thing to try; Sibliants (e.g. S’es) on the Vocals can be problematic when sent to reverb or delay effects. Since we don’t have a multiband envelope follower (yet), we tried to analyze for spectral flux or crest, then use that control signal to duck the reverb send for the vocals.

* We had some latency problems, relating to pitch tracking of vocals, the modulator signal arriving a bit late to precisely control the reverb size for the vocals. The tracking is ok, but the effect responds *after* the high pitch. This was solved by delaying the vocal *after* it was sent to the analyzer, then also delaying the direct vocal signal and the rest of the mix accordingly.

* Trond idea for later: Use vocal amp to control bitcrush mix on drums (and other programmed tracks)

* Trond idea for later: Use vocal transient density to control delay setting (delay time, … or delay mix)

* Bouncing the mix: Bouncing does not work, as we need the external modulation processing (analyzer and MIDIator) to also be active. Logic seems to disable the “external effects” (like Reaper here running via Jack, like an outboard effect in a traditional setting) when bouncing.

* Something good: Pitch controlled reverb send works quite well musically, and is something one would not be able to do without the crossadaptive modulation techniques. Well, it is actually just adaptive here (vocals controlling vocals, not vocals controlling something else).

* Notable: do not try to fix (old) problems, but try to be creative and find new applications/routings/mappings. For example the initial ideas from Gary was related to common problems in a mixing situation, problems that one can already fix (with de-essers or similar)

* Trond: It is unfamiliar in a post production setting to hear the room size change, as one is used to static effects in the mix.

* It would be convenient if we could modulate the filtering of a control signal depending on analyzed features too. For example changing the rise time for pitch depending on amplitude.

* It would also be convenient to have the filter times as sync’ed values (e.g. 16th) relative to the master tempo

FIX:

– Add multiband rms analysis.

– check roundtrip latency of the analyzer-modulator, so the time it takes from an audio signal is sent until the modulator signal comes back.

– add modulation targets (e.g. rise time). This most probably just works, but we need to open the midi feed back into Reaper.

– add sync to the filter times. Cabbage reads bpm from host, so this should also be relatively straightforward.

 

 

Seminar and meetings at Queen Mary University of London

June 9th and 10th we visited QMUL, met Joshua Reiss and his eminent colleagues there.  We were very well taken care of and had a pleasant and interesting stay.  June 9th we had a seminar presenting the project, and discussing related issues with a group of researchers and students. The seminar was recorded on video, to be uploaded on QMUL youtube. The day after we had a meeting with Joshua, going in more detail. We also got to meet several PhD students and got insight into their research.

Seminar discussion

Here’s some issues that were touched upon in the seminar discussion:

* analyze gestures inherent in the signal, e.g. crescendo, use this as trigger, to turn some process on or off, flip a preset etc. We could also analyze for very specific patterns, like a melodic fragment, but probably better to try to find gestures that can be performed in several different ways, so that the musician can have freedom of expression while providing a very clear interface for controlling the processes.

* Analyze features related to specific instruments. Easier to find analysis methods to extract very specific features. Rather than asking for “how can we analyze this to extract something interesting… This is perhaps a lesson for us to be a bit more specific in what we want to extract. This is somewhat opposite to our current exploratory effort in just trying to learn how the currently implemented analysis signals works and what we can get from them.

* Look for deviations from a quantized value. For example pitch deviations within a semitone, and rhythmic deviations from a time grid.

* Semantic spaces. Extract semantic features from the signal, could be timbral descriptors, mood, directions etc. Which semantics? Where to take the terminology from? We should try to develop examples of useful semantics, useful things to extract.

* Semantic descriptors are not necessarily a single point in a multidimensional space, it is more like a blob, and area. Interpolation between these blobs may not be linear in all cases. We don’t have to use all implied dimensions, so we can actually just select features/semantics/descriptors that will give us the possibility of linear interpolation. At least in those situations where we need to interpolate…

* Look at old speech codexes. Open source. Exitation/resonator model, LPC.  This is time domain, so will be really fast/low-latency.

* Cepstral techniques can also be used to separate resonator and exciter. The smoothed cepstrum being the resonator. Take the smoothed cepstrum and subtract it from the full cepstrum to get the excitation.

* The difficulty of control. The challenge to the performer, limiting the musical performance,  inhibiting the natural ways of interaction. This is a recurring issue, and something we might want to take care to handle carefully. This is also really what the project is about: creating *new* ways of interaction.

* Performer will adapt to imperfections of the analysis. Normally, MIR signals are not “aware” that they are being analyzed. They are static and prerecorded. In our case, the performer, being aware of how the analysis method works and what it responds to, can adapt the playing to trigger the analysis method in highly controllable manners. This way the cross-adaptivity is not only technical related to the control of parameters, but adaptive in relation to how the performer shapes her phrases and in turn also what she selects to play.

* Measuring collective features. Features of the mix. Each instrument contributes equally to the modulator signal. One signal can push others down or single them out. Relates to game theory. What is the most favorable behavior over time: suppress others or negotiate and adapt.

Meeting with Joshua

* Josh mentioned a few researchers that we might be interested in. Brech de Man: Intelligen audio switcher. Emannuel Chourdakis: Feature based reverb.  Dave Ronan: Groups, stems, automatic mixing. Vincent Verfaille: effect classification and A-DAFX. Brian Pardo: interfaces for music performance/production, visualization, semantics, machine learning. Pedro Pestana: sound engineer, best practices (phD), automatic mixing. Ryan Stables: SAFE plugins.

* Issues relating to publishing our work and getting an audience for it. As we could in some respect claim to create a new field, creating a community for it might be essential for further use of our research. Increase visibility. Promote also via QMUL press and NTNU info. Among connected fields are Human Computer Interaction, New Instruments for Musical Expression, Audio Engineering Society.

* QMUL has considerable experience in evaluation studies, user experience tests, listening tests etc. Some of this may be beneficial as a perspective on our otherwise experiental approach.

* Collective features (meaning individual signals in relation to each other and to the ensemble mix): Masking (spectral overlap). Onset times in relation to other instruments, lagging. Note durations, percussive/sustained etc.

* We currently use spectral crest, but crest also being useful in the time domain to do rhythmic analysis (rhythmic density, percussiveness, dynamic range). Will work better with loudness matching curve (dB)

* Time domain filterbank faster than FFT. Logarithmically spaced bands

* FFT of the time domain amp envelope

* Separate silence from noise. Automatic gain control. Automatic calibration of noise floor (use peak to average measure to estimate what is background noise and what is actual signal)

* Look for patterns: playing the same note, also pitch classes (octaves), also collectively (between instruments).

* Use log freq spectrum, and amps in dB, then do centroid/skew/flux etc

* How to collaborate with others under QMUL, adapting their plugins. Port the techniques to Csound or re-implement ours in C++? For the prototype and experimentation stage maybe modify their plugins to output just control signals? Describe clearly our framework so their code can be plugged in.