Session UCSD 14. Februar 2017


Session objective

The session objective was to explore the live convolver, how it can affect our playing together and how it can be used. New convolver functionality for this session is the ability to trigger IR update via transient detection, as opposed to manual triggering or periodic metro-triggered updates. The transient triggering is intended to make the IR updating more intuitive and providing a closer interaction between the two performers. We also did some quick exploration of adaptive effects processing (not cross-adaptive, just auto-adaptive). The crossadaptive interactions can sometimes be complex. One way to familiarize ourselves with the analysis methods and the modulation mappings could be to allow musicians to explore how these are directly applied to his or her own instrument.

Kyle Motl: bass
Oeyvind Brandtsegg: convolver/singer/tech/camera

Live convolver

Several takes were done, experimenting with manual and transient triggered IR recording. Switching between the role of “recording/providing the impulse response” and of “playing through, or on, the resulting convolver”. Reflections on these two distinct performative roles were particularly friutful and to some degree surprising. Technically, the two sound sources of audio convolution are equal, it does not matter which way the convolution is done (one sound with the other, or vice versa). The output sound will be the same. However, our liveconvolver does treat the two signals slightly differently, since one is buffered and used as the IR, while the other signal is directly applied as input to the convolver. The buffering can be updated at any time, in such a fashion that no perceptible extra delay occurs due to that part of the process. Still, the update needs to be triggered somehow. Some of the difference in roles occur due to the need for (and complications of) the triggering mechanism, but perhaps the deepest difference occurs due to something else. There is a performative difference in the action of providing an impulse response for the other one to use, and the action of directly playing through the IR left by the other. Technically, the difference is minute, due to the streamlined and fast IR update. Perhaps also the sounding result will be perceptually indistinguishable for an outside listener. Still the feeling for the performer is different within those two roles.  We noted that one might naturally play different type of things, different kinds of musical gestures, in the two different roles. This inclination can be overcome by intentionally doing what would belong to the other role, but it seem the intuitive reaction to the role is different in each case.

Video: a brief glimpse into the session environment.


conv1_mix conv1_mix

Take 1: IR recorded from vocals, with a combination of manual and transient triggering. The bass is convolved with the live vocal IR. No direct (dry) signals was recorded, only the convolver output. Later takes in the session also recorded the direct sound from each instrument, which makes it easier to identify the different contributions to the convolution. This take serves more as a starting point from where we continued working.


conv2_mix conv2_mix

Take 2: Switched roles, so IR is now recorded from the bass, and the vocals are convolved with this live updated IR. The IR updates were triggered by transient detection of the bass signal.


conv3_mix conv3_mix

Take 3: As  for take 2, the IR is recorded from bass. Changed bass mic to try to reduce feedback, adjusted transient triggering parameters so that IR recording would be more responsive


Video: Reflections on IR recording, on the roles of providing the IR as opposed to being convolved by it.

Kyle noticed that he would play different things when recording IR than when playing through an IR recorded by the vocals. Recording an IR, he would play more percussive impulses, and for playing through the IR he would explore the timbre with more sustained sounds. In part, this might be an effect of the transient triggering, as he would have to play a transient to start the recording. Because of this we also did one recording of manually triggered IR recording with Kyle intentionally exploring more sustained sounds as source for the IR recording. This seems to even out the difference (between recording IR and playing through it) somewhat, but there is still a performatively different feeling between the two modes.
When having the role of “IR recorder/provider”, one can be very active and continuously replace the IR, or leave it “as is” for a while, letting the other musician explore the potential in this current IR. Being more active and continuously replacing the IR allows for a closer musical interaction, responding quickly to each other. Still, the IR is segmented in time, so the “IR provider” can only leave bits and pieces for the other musician to use, while the other musician can directly project his sounds through the impulse responses left by the provider.


conv4_mix conv4_mix

Take 4:IR is recorded from bass. Manual triggering of the IR recording (controlled by a button), to explore the use of more sustained impulse responses.


Video: Reflections on manually triggering the IR update, and on the specifics of transient triggered updates.


conv5_mix conv5_mix

Take 5: Switching roles again, so that the IR is now provided by the vocals and the bass is convolved. Transient triggered IR updates, so every time a vocal utterance starts, the IR is updated. Towards the end of the take, the potential for faster interaction is briefly explored.


Video: Reflections on vocal IR recording and on the last take.

Convolution sound quality issues

The nature of convolution will sometimes create a muddy sounding audio output. The process will dampen high frequency content and emphasize lower frequencis. Areas of spectral overlap between two two signals will also be emphasized, and this can create a somewhat imbalanced output spectrum. As the temporal features of both sounds are also “smeared” by the other sound, this additionally contributes to the potential for a cloudy mush. It is welll known that brightening the input sounds prior to convolution can alleviate some of these problems. Further refinements have been done recently by Donahue, Erbe and Puckette in the ICMC paper “Extended Convolution Techniques for Cross-Synthesis”. Although some of the proposed techniques does not allow realtime processing, the broader ideas can most certainly be adapted. We will explore this further potential for refinement of our convolver technique.

As can be heard in the recordings from this session, there is also a significant feedback potential when using convolution in a live environment and the IR is sampled in the same room as it is directly applied. The recordings were made with both musicians listening to the convolver output over speakers in the room. If we had been using headphones, the feedback would not have been a problem, but we wanted to explore the feeling of playing with it in a real/live performance setting. Oeyvind would control simple hipass and lowpass filtering of the convolver output during performance, and thus had a rudimentary means of manually reducing feedback. Still, once unwanted resonances are captured by the convolution system, they will linger for a while in the system output. Nothing has been done to repair or reduce the feedback in these recordings, we keep it as a strong reminder that it is something that needs to be fixed in the performance setup. Possible solutions consist of exploring traditional feedback reduction techniques, but it could also be possible to do an automatic equalization based on the accumulated spectral content of the IR. This latter approach might also help output scaling and general spectral balance, since already prominent frequencies would have less poential to create strong resonances.

Adaptive processing

As a way to investigate and familiarize ourselves with the different analysis features and the modulation mappings of these signals, we tried to work on auto-adaptive processing. Here, features of the audio input affects effect processing of the same signal. The performer can then more closely interact with the effects and explore how different playing techniques are captured by the analysis methods.


cut_ad_dly1 cut_ad_dly1

Adaptive take 1: Delay effect with spectral shift. Short (constant) delay time, like a slapback delay or comb filter. Envelope crest controls the cutoff frequency of a lowpass filter inside the delay loop. Spectral flux controls the delay feedback amount. Transient density controls a frequency shifter on the delay line output.


cut_ad_rvb1 cut_ad_rvb1

Adaptive take 2: Reverb. Rms (amplitude) controls reverb size. Transient density controls the cutoff frequency of a highpass filter applied after the reverb, so that higher density playing will remove low frequencies from the reverb. Envelope crest controls a similarly applied lowpass filter, so that more dynamic playing will remove high frequencies from the reverb.


cut_ad_hadron1 cut_ad_hadron1

Adaptive take 3: Hadron. Granular processing where the effect has its own multidimensional mapping from input controls to effect parameters. The details of the mapping is more complex. The resulting effect is that we have 4 distinctly different effect processing settings, where the X and Y axis of a 2D control surface provides a weighted interpolation between these 4 settings. Transient density controls the X axis, and envelope crest controls the Y axis. A live excerpt of the controls surface is provided in the video below.

Video of the Hadron Particle Synthesizer control surface controlled by bass transient density and envelope crest.

Some comments on analysis methods

The simple analysis parameters, like rms amplitude and transient density works well on all (most) signals. However, other analysis dimensions (e.g. spectral flux, pitch, etc) have a more inconsistent relation between signal and analysis output when used on different types of signals. They will perform well on some instrument signals and less reliably on others. Many of the current analysis signals have been developed and tuned with a vocal signal, and many of them do not work so consistently for example on a bass signal. Due to this, the auto-adaptive control (as shown in this session) is sometimes a little bit “flaky”. The auto-adaptive experiments seems a good way to discover such irregularities and inconsistencies in the analyzer output. Still, we also have a dawning realization that musicians can thrive with some “livelyness” in the control output. Some surprises and quick turns of events can provide energy and creative input for a performer. We saw this also in the Trondheim session where rhythm analysis was explored, and in the discussion of this in the follow-up seminar. There, Oeyvind stated that the output of the rhythm analyzer was not completely reliable, but the musicians stated they were happy with the kind of control it gave, and that it felt intuitive to play with. Even though the analysis sometimes fail or misinterprets what is being played, the performing musician will react to whatever the system gives. This is perhaps even more interesting (for the musician), says Kyle. It creates some sort of tension, something not entirely predictable. This unpredictability is not the same as random noise. There is a difference between something truly random and something very complex (like one could say about an analysis system that misinterprets the input). The analyzer would react the same way to an identical signal, but give unproportionally large variance in the output due to small variances in the input. Thus it is a nonlinear complex response from the analyzer. In the technical sense it is controllable and predictable, but it is very hard to attain precise and determined control on a real world signal. The variations and spurious misinterpretations creates a resistance for the performer, something that creates energy and drive.




Seminar 16. December

Philosophical and aesthetical perspectives

–report from meeting 16/12 Trondheim/Skype

Andreas Bergsland, Trond Engum, Tone Åse, Simon Emmerson, Øyvind Brandtsegg, Mats Claesson

The performers experiences of control:

In the last session (Trondheim December session) Tone and Carl Haakon (CH) worked with rhythmic regularity and irregularity as parameters in the analysis. They worked with the same kind of analysis, and the same kind of mapping analysis to effect parameter. After first trying the opposite, they ended up with: Regularity= less effect, Irregularity= more. They also included a sample/hold/freeze effect in one of the exercises. Øyvind commented on how Tone in the video stated that she thought it would be hard to play with so little control, but that she experienced that they worked intuitively with this parameter, which he found was an interesting contradiction. Tone also expressed in the video that on the one side she would sometimes hope for some specific musical choices from CH (“I hope he understands”) but on the other hand that she “enjoyed the surprises”. These observations became a springboard for a conversation about a core issue in the project: the relationship between control and surprise, or between controlling and being controlled. We try to point here to the degree of specific and conscious intentional control, as opposed to “what just happens” due to technological, systemic, or accidental reasons.  The experience from the Trondheim December session was that the musicians preferred what they experienced as an intuitive connection between input and outcome, and that this facilitated the process in the way that they could “act musically”. (This “intuitive connection” is easily related to Simon’s comment about “making ecological sense” later in this discussion.)  Mats commented that in the first Oslo session the performers stated that they felt a similarity to playing with an acoustic instrument. He wondered if this experience had to do with the musicians’ involvement in the system setup, while Trond pointed out that the Trondheim December session and Oslo session were pretty similar in this respect. A further discussion about what “control”, “alienation” and “intuitive playing” can mean in these situations seems appropriate.

Aesthetic and individual variables

This led to a further discussion about how we should be aware that the need for generalising and categorising – which is necessary at some point to actually be able to discuss matters – can lead us to overlook important variable parameters such as:

  • Each performer’s background, skills, working methods, aesthetics and preferences
  • That styles and genres relate differently to this interplay

A good example of this is Maja’s statement in the Brak/Rug session that she preferred the surprising, disturbing effects, which gave her new energy and ideas. Tone noted that this is very easy to understand when you have heard Maja’s music, and even easier if you know her as an artist and person. And it can be looked upon as a contrast to Tone/CH who seek a more “natural” connection between action and sounding result, in principle they want the technology to enhance what they are already doing. But, as cited above, Tone commented that this is not the whole truth. Surprises are also welcome in the Tone/Carl Haakon collaboration.

Simon underlined, because of these variables, the need to pin down in each session what actually happens, and not necessarily set up dialectical pairs. Øyvind pointed out, on the other hand, the need to lay out possible extremes and oppositions to create some dimensions (and terms) along which language can be used to reflect on the matters at hand.

Analysing or experiencing?

Another individual variable, both as audience and performer, is the need to analyse, to understand what is happening in the perceiving of a performance. One example brought up related to this was Andreas’ experience of his change of audience perspective after he studied ear training. This new knowledge led him to take an analysing perspective, wanting to know what happened in a composition when performed. He also said: “as an audience you want to know things, you analyse”. Simon referred to “the inner game of tennis” as another example: how it is possible to stop appreciating playing tennis because you become too occupied analysing the game – thinking of the previous shot rather than clearing the mind ready for the next. Tone pointed at the individual differences between performers, even within the same genre (like harmonic jazz improvisation) – some are very analytic, also in the moment of performing, while others are not. This also goes for the various groups of audiences, some are analytic, some are not – and there is also most likely a continuum between the analytic and the intuitive musician/audience.  Øyvind mentioned experiences from presenting the crossadaptive project to several audiences over the last few months. One of the issues he would usually present is that it can be hard for the audience to follow the crossadaptive transformations, since it is an unfamiliar mode of musical (ex)change.  However, responses to some of the simpler examples he then played (e.g. amplitude controlling reverb size and delay feedback), yielded the response that it was not hard to follow. One of the places where this happened was Santa Barbara, where Curtis Roads commented he thought it quite simple and straightforward to follow. Then again, in the following discussion, Roads also conceded that it was simple because the mapping between analysis parameter and modulated effect was known. Most likely it would be much harder to deduce what the connection was just by listening alone, since the connection (mapping) can be anything. Cross Adaptive processing may be a complicated situation, not easy to analyse either for the audience or the performer. Øyvind pointed towards differences in parameters, as we had also discussed collectively: That some were more “natural” than others, like the balance between amplitude and effect, while some are more abstract, like the balance between noise and tone, or regular/irregular rhythms.

Making ecological sense/playing with expectations

Simon pointed out that we have a long history of connections to sound: some connections are musically intuitive because we have used them perhaps for thousands of years, they make ‘ecological’ sense to us. He referred to Eric Clarke’s “ Ways of listening: An ecological approach to the perception of musical meaning (2005)” and William Gaver “What in the world do we hear?: An ecological approach to auditory event perception” (1993). We come with expectations towards the world, and one way of making art is playing with those expectations. In modernist thinking there is a tendency to think of all musical parameters as equal – or at least equally organised – which may easily undermine their “ecological validity” – although that need not stop the creation of ‘good music’ in creative hands.

Complexity and connections

So, if the need for conscious analysis and understanding will vary between musicians, is this the same for the experienced connection between input and output? And what about the difference between playing and listening as part of the process, or just listening, either as a musician, musicologist, or an audience member? For Tone and Carl Haakon it seemed like a shared experience that playing with regularity/non- regularity felt intuitive for both – while this was actually hard for Øyvind to believe, because he knew the current weakness in how he had implemented the analysis. Parts of the rhythmic analysis methods implemented are very noisy, meaning they produce results that sometimes can have significant (even huge) errors in relation to a human interpretation of the rhythms being analysed. The fact that the musicians still experienced the analyses as responding intuitively is interesting, and it could be connected to something Mats said later on: “the musicians listen in another way, because they have a direct contact with what is really happening”. So, perhaps, while Tone &CH experienced that some things really made musical sense, Øyvind focused on what didn’t work – which would be easier for him to hear?  So how do we understand this, and how is the analysis connected to the sounding result? Andreas pointed out that there is a difference between hearing and analysing: you can learn how the sound behaves and work with that. It might still be difficult to predict exactly what will happen.Tone’s comment here was that you can relate to a certain unpredictability and still have a sort of control over some larger “groups of possible sound results“ that you can relate to as a musician. There is not only an urge to “make sense” (= to understand and “know” the connection) but also an urge to make “aesthetical sense”.

With regards to the experienced complexity, Øyvind also commented that the analysis of a real musical signal is in many ways a gross simplification, and by trying to make sense of the simplification we might actually experience it as more complex. The natural multidimensionality of the experienced sound is lost, due to the singular focus on one extracted feature. We are reinterpreting the sound as something simpler. An example mentioned was the vibrato, which is a complex input and a complex analysis, that could in some analyses be reduced to a simple “more or less” dimension. This issue also relates to the needs of our project to construct new methods of analysis, so that we can try to find analysis dimensions that correspond to some perceptual or experiential features.

Andreas commented “It is quite difficult to really know what is going on without having knowledge of the system and the processes. Even simple mappings can be difficult to grasp only by ear”. Trond reminded us after the meeting about the further complexity that was perhaps not so present in our discussion: we do not improvise with only one parameter “out of control” (adaptive processing). In the cross adaptive situation someone else is processing our own instrument, so we do not have full control over this output, and at the same time we do not have any control over what we are processing, the input (cross- adapting), which in both cases could represent an alienation and perhaps a disconnection from the input-result- relation. And of course the experience of control is also connected to “understanding” the processing analysis you are working with.

The process of interplay:

Øyvind referred to Tone’s experience of a musical “need” during the Trondheim session, expressed: ”I hope he understands…” –  when she talked about the processes in the interplay. This was pointing at how you realise during the interplay that you have very clear musical expectations and wishes towards the other performer.  This is not in principle different from a lot of other musical improvising situations. Still, because you are dependent on the other’s response in a way that is defining not only the wholeness, but your own part in it, this thought seemed to be more present than is usual in this type of interplay.

Tools and setup

Mats commented that very many of the effects that are used are about room size, and that he felt that this had some – to him – unwanted aesthetical consequences. Øyvind responded that he wanted to start with effects that are easy to control and easy to hear the control of. Delay feedback and reverb size are such effects. Mats also suggested that it was an important aesthetical choice not to have effects all the time, and thereby have the possibility to hear the instrument itself. So to what extent should you be able to choose? We discussed the practical possibilities here: some of the musicians (for example Bjørnar Habbestad) have suggested a foot pedal,  where the musician could control the degree to which their actions will inflict changes on the other musician’s sound (or the other way around, control the degree to which other forces can affect his sound). Trond suggested one could also have control over the level of signal/output  for the effects, adjusting the balance between processed and unprocessed sound. As Øyvind commented, these types of control could be a pedagogical tool for rehearsing with the effect, turning the processing on and off to understand the mapping better. The tools are of course partly defining the musician’s balance between control, predictability and alienation. Connected to this, we had a short discussion regarding amplified sound in general, that the instrumental sound coming from a speaker located elsewhere in the room in itself could already represent an alienation. Simon referred to the Lawrence Casserley /Evan Parker principle of “each performer’s own processor”, and the situation before the age of the big PA, where the electronic sound could be localised to each musician’s individual output. We discussed possibilities and difficulties with this in a cross adaptive setting: which signal should come out of your speaker? The processed sound of the other, or the result of the other processing you? Or both? and then what would the function be – the placement of the sound is already disturbed.


New in this session was the use of the rhythmical analysis. This is very different from all other parameters we have implemented so far. Other analyses relate to the immediate sonic character, but rhythmic analysis tries to extract some  temporal features, patterns and behaviours. Since much of the music played in this project is not based on a steady pulse, and even less confined to a regular grid (meter), the traditional methods of rhythmic analysis will not be appropriate. Traditionally one will find the basic pulse, then deduce some form of meter based on the activity, and after this is done one can relate further activity to this pulse and meter. In our rhythmical analysis methods we have tried to avoid the need to first determine pulse and meter, but rather looked into the immediate time relationships between neighbouring events. This creates much less support for any hypothesis the analyser might have about the rhythmical activity, but also allows much greater freedom of variation (stylistically, musically) in the input. Øyvind is really not satisfied with the current status of the rhythmic analysis (even if he is the one mainly responsible for the design), but he was eager to hear how it worked when used by Tone and Carl Haakon.  It seems that the live use by real musicians allowed the weaknesses of the analyses to be somewhat covered up. The musicians reported that they felt the system responded quite well (and predictably) to their playing. This indicates that, even if refinements are much needed, the current approach is probably a useful one. One thing that we can say for sure is that some sort of rhythmical analysis is an interesting area of further exploration, and that it can encode some perceptual and experiential features of the musical signal in ways that make sense to the performers. And if it makes sense to the performers, we might guess that it will have the possibility of making sense to the listener as well.

Andreas: How do you define regularity (ex. claves-based musics), how “less regular” is that from a steady beat.

Simon: If you ask a difficult question with a range of possible answers this will be difficult to implement within the project.

As a follow up to the refinement of rhythmic analysis, Øyvind asked:  how would *you* analyze rhythm?

Simon: I wouldn’t analyze rhythm for example timeline in African music: a guiding pulse that is not necessarily performed and may exist only in the performer’s head. (This relates directly to Andreas’s next point – Simon later withdrew the idea that he would not analyse rhythm and acknowledged its usefulness in performance practice.)

Andreas: Rhythm is a very complex phenomenon, which involves multiple interconnected temporal levels, often hierarchically organised. Perceptually, we have many ongoing processes involving present, past and anticipations about future events. It might be difficult to emulate such processes in software analysis. Perhaps pattern recognition algorithms can be good for analysing rhythmical features?

Mats: what is rhythm? In our examples: gesture may be more useful than rhythm

Øyvind: rhythm is repeatability, perhaps? Maybe we interpret this in the second after

Simon: no I think we interpret virtually at the same time

Tone: I think of it as a bodily experience first and foremost.  (Thinking about this in retrospect, Tone adds: The impulses -when they are experienced as related to each other- creates a movement in the body. I register that through the years (working with non-metric rhythms in free improvisation) there is less need for a periodical set of impulses to start this movement. (But when I look at babies and toddlers, I recognise this bodily reaction to irregular impulses.) I recognice what Andreas says- the movement is trigged by the expectation of more to come. (Think about the difference in your body when you wait for the next impulse to come (anticipations) and when you know that it is over….)

Andreas: When you listen to rhythms, you group events on different levels and relate to what was already played. The grouping is often referred to as «chunking» (psychology). Thus, it works both on an immediate level (now) as well as a more overarching level (bar, subsection, section) because we have to relate what we hear to the earlier. You can simplify or make it complex

Seminar 21. October

We were a combination of physically present and online contributors to the seminar.  Joshua Reiss and Victor Lazzarini participated via online connection, present together in Trondheim were: Maja S.K. Ratkje, Siv Øyunn Kjenstad, Andreas Bergsand, Trond Engum, Sigurd Saue and Øyvind Brandtsegg

Instrumental extension, monitoring, bleed

We started the seminar by hearing from the musicians how it felt to perform during Wednesday’s session. Generally, Siv and Maja expressed that the analysis and modulation felt like an extension to their original instrument. There were issues raised about the listening conditions, how it can be difficult to balance the treatments with the acoustic sound. One additional issue in this respect when working cross-adaptively (as compared to e.g. the live processing setting), is that there is not a musician controlling the processing, to the processed sound is perhaps a little bit more “wild”. In a live processing setting, the musician controlling the processing will attempt to ensure a musically shaped phrasing and control that is at the current stage not present in the crossadaptive situation. Maja also reported acoustic bleed from the headphones to the feature extraction for her sound. With this kind of sensitivity to crosstalk, the need for source separation (as discussed earlier) is again put to our attention.  Adjusting the signal gating (noise floor threshold for the analyzer) is not sufficient in many situations, and raising the threshold also lowers the analyzer sensitivity to soft playing. Some analysis methods are more vulnerable than others, but it is safe to say that none of the analysis methods are really robust against noise or bleed from external signals.

Interaction scenarios as “assignments” for the performers

We discussed the different types of mapping (of features to modulators) which the musicians also called “assignments”, as it was experienced as a given task to perform utilizing certain expressive dimensions in specific ways to control the timbre of the other instrument. This is of course true. Maja expressed that she was most intrigued by the mappings that felt “illogical”, and that illogical was good. By illogical, she means mappings that does not feel natural or as intuitive musical energy flows. Things that break up the regular musical attention, and break up the regular flow from  idea to expression. As an example was mentioned the use of pitch to control reverberation size. For Maja (for many, perhaps for most musicians), pitch is such a prominent parameter in the shaping of a musical statement, so it is hard to play when some external process interferes with the intentional use of pitch. The linking of pitch to some timbral control is such an interference, because it creates potential conflict between the musically intended pitch trajectory and the musically intended timbral control trajectory. An interaction scenario (or in effect, a mapping from features to modulators to effects), can in some respects be viewed as a composition, in that it sets a direction for the performance. In many ways similar to the text scores of the experimental music of the 60’s, where the actual actions or events unfolding are perhaps not described, but more a description of an idea of how the musicians may approach the piece. For some, this may be termed a composition, others might use the term score. In any case it dictates or suggests some aspects of what the performers might do, and as such consists of an external  implication on the performance.

Analysis, feature extraction

Some of our analysis methods are still a bit flaky, i.e. we see spurious outliers in their output that is not necessarily caused by perceptible changes in the signal being analyzed. One example of this is the rhythm consonance feature, where we try to extract a measure of rhythmic complexity by measuring neighboring delta times between events and looking for simple ratios between these. The general idea being that simpler the ratio, the simpler the rhythm is. The errors sneak in as part of the tolerance for human deviation in rhythmic performance, where one may clearly identify one intended pattern, while the actual measured delta times can deviate more than a musical time division (for example, when playing a jazzy “swing 8ths” triplet pattern, which may be performed somewhere between equal 8th notes, a triplet pattern, or even towards a dotted 8th plus a 16th and in special cases a double dotted 8th plus a 32nd). When looking for simple integer relationships small deviations in phrasing may lead to large changes in the algorithm’s output. For example 1:1 for a straight repeating 8th pattern, 2:1 for a triplet pattern and 3:1 for a dotted 8th plus 16th pattern, re the jazz swing 8ths. See also this earlier post for more details if needed.  As an extreme example, think of a whole note at a slow tempo (1:1) versus an accent on the last 16th note on a measure (giving a 15:16 ratio). These deviations create single values with a high error. Some common ways of dampening the effect of such errors would be lowpass filtering, exponential moving average, or median filtering. One problem in the case of rhythmic analysis is that we only get new values for each new event, so the “sampling rate” is variable and also very low. This means that any filtering has the consequence of making the feature extraction respond very slowly (and also with a latency that varies with the number of events played), something that we would ideally try to avoid.

Stabilizing extraction methods

In the light of the above, we discussed possible methods for stabilizing the feature extraction methods to avoid the spurious large errors. One conceptual difficulty is differentiating between a mistake and an intended expressive deviation.  More importantly, to differentiate between different possible intended phrasings. How do we know what the intended phrasing is without embedding too many assumptions in our analysis algorithm? For rhythm, it seems we could do much better if we first deduct the pulse and the meter, but we need to determine what our (performed) phrasings are to be sure of the pulse and meter, so it turns into a chicken and egg problem. Some pulse and meter detection algorithms maintain several potential alternatives, giving each alternative a score for how well it fits in light of the observed data. This is a good approach, assuming we want to find the pulse and meter. Much of the music we will deal with in this project does not have a strong regular pulse, and it most surely does not have a stable meter. Let’s put the nitty gritty details of this aside for a moment, and just assume that we need some kind of stabilizing mechanism. As Josh put it, a restricted form of the algorithm.
Disregarding the actual technical difficulties, let’s say we want some method of learning what the musician might be up to, what is intended, what is mistake, and what are expressive deviations from the norm. Some  sort of calibration the normal, average, or regularly occurring behavior. Track change during session, and this is change relative to the norm as established in the traning. Now, assuming we could actually build such an algorithm, when should it be calibrated (put into learn mode)? Should we continuously update the norm, or just train once and for all? If training before performance (as some sort of sound check), we might fail miserably because the musician might do wildly different things in the “real” musical situation compared to when “just testing”. Also, if we continuously update the norm, then our measures are always drifting, so something that was measured as “soft” in the beginning of the performance might be indicated as something else entirely by the end of the performance. Even though we listen to musical form as a relative change, we might also as listeners recognize when “the same” happens again later. E.g. activity in the same pitch register, the same kind of rhythmic density, the same kind of staccato abrupt complex rhythms etc. with a continuously updated norm, we might classify the same as something different. Regarding the attempt to define something as the same see also the earlier post on philosophical implications. Still, with all these reservations, it seems necessary to attempt creating methods for relative change. This can perhaps be used as a restricted form, as Joshua suggests, or in any case as an interesting variation on the extracted features we already have.  It would extend the feature output formats of absolute value, dynamic range, crest. In some ways it is related to dynamic range (f.ex. as the relative change would to some degree be high when the dynamic range is high, but then again the relative change would have a more slowly moving reference, and it would also be able to go negative). As a reference for the relative change, we could use a long time average, a model of expectation, assumption of the current estimate (maintaining several candidates as with pulse and meter induction), or normal distribution and standard deviation. These long term estimates have been used with success in A-DAFx (adaptive audio effects) for live applications.

Josh mentioned the possibility of doing A/B comparision of the extraction methods with some tests at QMUL. We’ll discuss this further.

Display of extracted features

When learning (an instrument), multimodal feedback can significantly reduce the time taken to gain proficiency. When learning how the different feature extraction methods work, and how they react to intentional expressive changes in the signal, visual feedback might be useful. Other means of feedback could be sonifying the analysis (which is what we do already, but perhaps make it more pointed and simple. This could be especially useful when working with the gate mix method, as the gate will give not indication to the performer that it will soon open, whereas a sonification of the signal could aid the performer in learning the range of the feature and getting an intuitive feel for when the gate will open. Yet another means of feedback is giving the performer the ability to adjust the scaling of the feature-to-modulator mapping. In this way, it would act somewhat like giving the performer that ability to tune the instrument, ensuring that it reacts dynamically to the ranges of expression utilized by this performer. Though not strictly a feedback technique, we could still treat it as a kind of familiarization aid in that it acts as a two-way process between performer and instrument. The visualization and the performer adjustable controls could be implemented as a small GUI component running on portable devices like cellphone or touchpad. The signals can be communicated from the analyzer and MIDIator via OSC, and the selection of features to display can be controlled by the assignments of features in the MIDIator. A minimal set of controls can be exposed, and the effects of these being mirrored in the plugins (MIDIator). Some musicians may prefer not to have such a visual feedback. Siv voiced concern that she would not be so interested in focusing visually on the signals. This is a very valid concern for performance. Let us assume it will not be used during actual performance, but as a tool for familiarization during early stages of practice with the instrument.

Theoretical/philosophical issues regarding the development of the project

Skype with Solveig Bøe, Simon Emmerson and Øyvind Brandtsegg.

The starting point and main focus of our conversation was the session that took place 20.-21. September in Studio Olavskvartalet, NTNU, and that is the theme of a former post on this blog. That post also contains audio takes and a video where the musicians discuss their experiences. Øyvind told us about what had been going on and about the reactions and reflections of the participants and himself. Simon had seen the video and listened to the takes, and Solveig will look at the material, try to use Merleau-Ponty’s language to describe the interactions going on.


One interesting case is when two (to make the system simple) musicians control each other, where for example the louder an instrument plays, the less effect of some kind on the other instrument, the louder the other instument plays the larger effect on the other. Simon had noted in the documentation that the participating musicians mentioned this (way of controlling each other) could lead to confusion. An analogy to playing football with two footballs was made, and the ball changing colour each time someone touched it. While not entirely similar in timing and pace, the image sheds some light on the added complexity. We discussed how this confusion could be reduced and how this could create interesting music. There is a learning curve involved and this must be tied to getting the musicians to know each other’s interaction patterns well, learning (by doing) how effects are created in each others sounds. Learning by listening and doing. But also by getting visual technical feedback? We all noted that the video documentation from the session made it particularly easy to gain insight into the session. More so than listening to the audio without video. There is an aspect of seeing the physical action, and also seeing the facial and bodily expression of the performers. Naturally, this in itself is not particular to our project, it also can be said of many recordings of performances. Still, since the phenomenon of cross-adaptiveness in the interaction can be a quite hard to grasp fully, this extra dimension of perception seems to help us engage when reviewing the documentation.  That said, the video on the blog post was also very effectively edited by Andreas to extract significant points of interest and points of reflection. The editing also affects the perception of the event significantly of course. With that perspective, of the extra sensory dimension letting us engange more readily in the perception and understanding of the event, how could this be used to aid the learning process (for the performers)? Some of the performers also noted in the session interviews that some kind of visual feedback would be nice to have. Simon thought that visual feedback of the monitoring (of the sound) could be of good help. Øyvind also agrees to this, while at the same time also stressed that it could result in making the interactions between the musicians even more taxing, because they now also had to look at the screen, in addition to each other. Commonly in computer music performance, there has been a concern to get the performer’s focus away from the computer screen as it in many cases has been detrimental to the experience both for the performer and the audience. Providing a visual interface could lead to similar problems. Then again, the visual interface could potentially be used with success during the learning stage, to speed up the understanding of how the instrument (i.e. the full system of acoustic instrument plus electronic counterparts) works.

We discussed the confusion felt when the effects seemed difficult to control, the feeling of “ghostly transformations” taking place, the disturbances of the awareness of themselves and their “place”. Could confusion here be viewed as something not entirely negative too? Some degree of confusion could be experienced in terms of a pregnant complexity, to stimulate curiousness and lead to deeper engagement? How could we enable this flipping of sign for the confusion, making it more of a positive feature than a negative disorientation? Perhaps one aspect of this process is to provide just enough traction for the perception (in all senses) to hold on to the complex interactions. One way of reducing confusion would be to simplify the interactions. Then again, the analysis dimensions in this session were already simplified (just using amplitude and event density), and the affected modulation (relative balance between effects) also relatively simple. With respect to getting traction for perception to grasp the situation, experimentation with a visual feedback interface is definitely something that needs to be explored.


According to Merleau-Ponty interactions with others are variations in the same matter, something is the same for all participants, even if – and it always is – experienced from different viewpoints. What is the same in this type of interaction? Solveig said that to interact successfully there have to be something that is the same for all of the performers, they have to be directed towards something that is the same. But what could be a candidate for “the same” in a session where the participants interact with each others interactions? The session seems to be an evolving unity encompassing the perceptions and performances of the participants as they learn how ones instruments work in the “network” of instruments. Simon pointed to the book ‘Sonic Virtuality – Sound as Emergent Perception’ (Mark Grimshaw and Tom Garner – OUP 2015), that argues for the fact that no two sounds are ever the same. Neurophysiologically in each brain sounds are experienced differently, so we can’t say really that we hear the same sound. Solveig’s response was that the same is the situation where the participants create the total sound. In this situation one is placed, even if displaced by the effects created on one’s produced sounds. Displacement became a theme that we tried to reflect upon, the being “there” not “here”, by one being controlled by the other instrument(s). The dialectic between dislocation and relocation being important in this connection. Dislocation of the sound could feel like dislocation of oneself. How does amplification (generally in electroacoustic and electronic music) change the perspectives of oneself? How do we perceive the sound of a saxophone and the music performed on it differently when it is amplified so that the main acoustic impression of the instrument is coming to us through speakers? The speakers usually not being positioned in the same location as the acoustic instrument, and even if they were, the acoustic radiation patterns of the speakers would radically differ from the sound coming from the acoustic instrument. In our day and age, this type of sound reproduction and amplification is so common that we sometimes forget how it affects perception of the event. With unprocessed, as clean as possible or “natural” sound reproduction, the percepual effect is still significant. With processed sound even more so, and with the crossadaptive interactions the potential for dislocation and disconnection is manifold. As we (the general music loving population) have learned to love and connect to amplified and processed musics, we assume a similar process needs to take place for our new means of interaction. Similarly, the potential for intentionally exploring the dislocation effects for expressive purposes also can be a powerful resource.

Sound is in the brain, but also in the haptic, visual, auditory, in general, sensual, space. Phenomenologically what is going on in this space is the most interesting, according to Solveig, but the perspective from neuroscience is also something that could bring productive insights to the project.

We returned to the question of monitoring: How much information should the performers get? What should they be able to control? Visualization of big data could help in the interaction and the interventions, but which form should the visualization have? Øyvind showed us an example of visualization from the Analyzer plugin developed in the project. Here, the different extracted features are plotted in three dimensions (x,y and colour) over time. It provides a way of getting insight into how the actual performed audio signal and the resulting analysis correlates. It was developed as a means of evaluating and selecting which features should be included in the system, but can potentially also be used directly by the performer trying to learn how the instrumental actions result in control signals.


Skype on philosophical implications

Skype with Solveig Bøe, Simon Emmerson and Øyvind Brandtsegg.

Our intention for the conversations was to start sketching out some of the philosophical implications of our project. Partly as a means to understand what we are doing and what it actually means, and partly as a means of informing our further work, choosing appropriate directions and approaches. Before we got to these overarching issues, we also discussed some implications of the very here and now of the project. We come back to the overarching issues in the latter part of this post.


Initially, we commented on rhythm analysis not based on the assumption of an underlying constant meter or pulse. Work of Ken Fields was mentioned, on internet performance and the nature of time. Also John Young’s writing on form and phrase length (in Trevor Wishart’s music); Ambrose Seddon’s chapter on time structures in Emmerson/Landy’s recent book “Expanding the Horizon of Electroacoustic Music Analysis”. In live performance there is also George Lewis’ rhythm analysis (or event analysis) in his interactive pieces (notably Voyager). See

Score as guide

In the context of Oeyvind being at UCSD, where one of their very strong fields of competence is contemporary music performance, he thought loosely about a possible parallel between a composed score and the cross-adaptive performance setting. One could view the situation designed for/around the performers in a cross-adaptive setting as a composition, in terms of it setting a direction for the interplay and thus also posing certain limits (or obstacles/challenges) to what the performer can and cannot do. On the possible analogy between the traditional score and our crossadaptive performance scenario, one could flag the objection that a classical performer not so much just follow the do’s and don’ts described  in a score but rather uses the score as a means to engage in the composer’s intention. This may or may not apply to the musics created in our project, as we strive to not limit ourselves to specific musical styles, still leaning solidly against the quite freely improvised expression in a somewhat electroacoustic timbral environment. In some of our earlier studio sessions, we noted that there is two competing modes of attention:

  • To control an effect parameter with one’s audio signal, or
  • to respond musically to the situation

… meaning that what the performer chooses to play would traditionally be based on an intended musical statement to contribute to the current musical situation, whereas now she might rather play something that (via its analyzed features) will modulate the sound of another instrument in the ensemble. This new something, being a controller signal rather than a self-contained musical statement, might actually express something of its own in contradiction to the intended (and assumedly achieved) effect it has on changing the sound of another instrument. Now on top of this consider that some performers (Simon was quoting the as yet unpublished work of postgraduate and professional bass clarinet performer Marij van Gorkom) experience the use of electronics as perhaps disturbing the model of engagement with the composer’s intention through the written score. This being something that one needs to adjust to and accommodate on top of the other layers of mediation. Then we might consider our crossadaptive experiments as containing the span of pain to love in one system.

…and then to philosophy

A possible perspective suggested by Oeyvind was a line connecting Kant-Heidegger-Stiegler, possibly starting offside but nevertheless somewhere to start. The connection to Kant being his notion of the noumenal world (where the things in themselves are) and the phenomenal world (which is where we can experience and to some extent manipulate the phenomena). Further, to Heidegger with his thoughts on the essence of technology. Rather than viewing technology as a means to an end (instrumental), or as a human activity (anthropological), he refers to a bringing-forth or a bringing out of concealment into unconcealment. Heidegger then assumes technology as a threat to this bringing-forth (of truth), bypassing or denying human active participation (with the real events, i.e. the revealing of truth). Again, he suggests that art can be a way of navigating this paradox and actively sidestepping the potential threat of technology. Then onto Stiegler with his view of technics as organized inorganic matter, somewhat with an imminent drive or dynamic of its own.  This possibly constituting an impediment to socialization, individuation and to intersubjectivization.  In hindsight (after today’s discussion), perhaps a rather gray, or bleak approach to the issues of technology in the arts. In the context of our project, it was intended to portray the difficulty of controlling something with something else, the reaching into another dimension while acting from the present one.

Solveig countered with a more phenomenological approach, bringing Merleau-Ponty’s perspectives of technology as an extension of the body, of communication and communion, of learning and experiencing together, acting on a level below language. On this level the perceptive and the corporal – included extensions that are incorporated into the body schemes, are intertwined and form a unity. Now this seems like a path of further investigation.  One could easily say that we are learning to play a new instrument, or learning to navigate a new musical environment, so our current endeavors must be understood as baby steps in this process (as also reflected on earlier), and we are this intertwined unity taking form. In this context, also Merleau-Ponty’s ideas about  the learning of language as something that takes place by learning a play of roles, where understanding how to change between roles is an essential part of the learning process, comes into view as appropriate.

Simon also reported on another research student project: cellist Audrey Riley was a long-time member of the Merce Cunningham ensemble and is engaged in examining a (post-Cage) view of performance – including ideas of ‘clearing the mind’ of thought and language – as they might ‘get in the way’.  This relates to the idea of ‘communion’ where there is a sense of wholeness and completeness.  This also leads us to Wittgenstein’s ideas that both in art and philosophy there is something that is unutterable, that may just be demonstrated.

How does this affect what we do?

We concluded our meeting with a short discussion on how the philosophical aspects can be set into play to aid our further work in the project. Is it just high-and-wide ramblings or do we actually make use of it? The setting up of some dimensions of tension, or oppositions (that may not be oppositions but rather different perspectives on the same thing sometimes) may be of help in practically approaching the problems we face. Like the assumed opposition of communion – disturbance, does the crossadaptive interventions into the communication between performers act as a disturbance, or can it be a way of enabling communion? Can we increase the feeling of connectivity? We can also use these lines of thought to describe what happened (in an improvisation), or try to convey the implications of the project as a whole.  Most importantly however, is probably the growth of a vocabulary for reflection through the probing of our own work in the light of the philosophic implications.