Project start meeting in Trondheim

Kickoff

Monday June 6th we had a project start meeting with the NTNU based contributors: Andreas Bergsland, myself, Solveig Bøe, Trond Engum, Sigurd Saue, Carl Haakon Waadeland and Tone Åse. This gave us the opportunity to present the current state of affairs and our regular working methods to Solveig. Coming from philosophy, she has not taken part in our earlier work on live processing. As the last few weeks have been relatively rich in development, this also gave us a chance to bring all of the team up to speed. Me and Trond also gave a live demo of a simple crossadaptive setup where vocals control delay time and feedback on the guitar, while the guitar controls reverb size and hi-freq damping for the vocal. We had discussions and questions interspersed within each section of the presentation. Here’s a brief recounting of issues wwe touched upon.

Roles for the musician

The role of the musician in crossadaptive interplay has some extra dimensions when compared to a regular acoustic performance situation. A musician will regularly formulate her own musical expression and relate this to what the other musician is playing. On top of this comes the new mode of response created by live processing, where the instrument’s sound constantly changes due to the performative action of a live processing musician. In the cross-adaptive situation, these changes are directly controlled by the other musicians’ acoustic signal, so the musical response is two-fold: responding to the expression and responding to the change in own sound. As these combine, we may see converging or diverging flow of musical energy between the different incentives and responses at play. Additionally, her own actions will influence changes on the other musician’s sound, so the expressive is also two-fold; creating the (regular) musical statement and also considering how the changes inflicted on the other’s sound will affect both how the other one sounds and how that affects their combined effort. Indeed, yes, this is complex. Perhaps a bit more complex that we had anticipated. The question was raised if we do this only to make things difficult for ourselves. Quite justly. But we were looking for ways to intervene in the regular musical interaction between performers, to create yet unheard ways of playing together. It might appear complex just now because we have not yet figured out the rules and physics of this new situation, and it will hopefully become more intuitive over time. Solveig voiced it like we put the regular modes of perception in parenthesis. For good or for bad, I think she may be correct.

Simple testbeds

It seems wise to initially set up simplified interaction scenarios, like the vocal/reverb guitar/delay example we tried in this meeting. It puts emphasis on exploring the combinatorial parameter modulation space. Even with a simple situation of extracting two features for each sound source, controlling two parameters on each other’s sound, the challenges to the musical interaction is prominent. Controlling two features of one’s own sound, to modulate the other’s processing is reasonably manageable while also concentrating on the musical expression.

Interaction as composition

An interaction scenario can be thought of as a composition. In this context we may define a composition as something that guides the performers to play in a certain way (think of the text pieces from the 60’s for example, setting the general mood or each musician’s role while allowing a fair amount of freedom for the performer as no specific events are notated). As the performers formulate their musical expression to act as controllers just as much as to express an independent musical statement, the interaction mode has some of the same function as a composition has in written music. Namely to determine or to guide what the performers will play. In this setting, the specific performative action is freely improvised, but the interaction mode emphasizes certain kinds of action to such an extent that the improvisation is in reality not really free at all, but guided by the possibilities (the affordance [https://en.wikipedia.org/wiki/Affordance]) of the system. The intervention into the interaction also sheds light on regular musical interaction. We become acutely aware of what we normally do to influence how other musicians play together with us. Then this is changed and we can reflect on both the old (regular, unmodulated) kind of interaction and the new crossadaptive mode.

Feature extraction, performatively relevant features

Extracting musically salient features is a Big Question. What is musically relevant? Carl Haakon suggested that some feature related to the energy could be interesting. Energy, for the performer can be induced into the musical statement in several ways. It could be rhythmic activity, loudness, timbre, and other ways of expressing energetic performance. As such it could be a feature taking input from several mathematical descriptions. It could also be a feature allowing a certain amount of expressive freedom for the performer, as energy can be added by several widely different performative gestures, leaving some sort of independence from having to do very specific actions in order to trigger the control of the destination parameter. Mapping the energy feature to a destination parameter that results in a more rich and energetic sound could lead to musically convergent behavior, and conversely, controlling a parameter that makes the resulting sound more sparse could create musical and interactive tension. In general, it might be a good idea to use such higher level analysis. This simplifies the interaction for the musician, and also creates several alternative routes to inflict a desired change in the sound. The option to create the same effect by several independent routes/means, also provides the opportunity for doing so with different kinds of side effects (like in regular acoustic playing too), e.g. creating energy in this manner or that manner gives very different musical results but in general drives the music in a certain direction.
Machine learning (e.g. via neural networks) could be one way of extracting such higher level features, different performance situations, different distinct expressions of a performer. We could expect some potential issues of recalibration due to external conditions, slight variations in the signal due to different room, miking situation etc. Will we need to re-learn the features for each performance, or could we find robust classification methods that are not so sensitive to variations between instruments and performance situations?

Meta mapping, interpolations between mapping situations

Dynamic mappings, allowing the musicians to change the mapping modes during different sections of the performed piece. If the interaction mode becomes limited or “worn out” after a while of playing, the modulation mappings could be gradually changed. This can be controlled by an external musician or sound engineer, or it can be mapped to yet other layers of modulations. So, a separate feature of the analyzed sound is mapped to a modulator changing the mappings (the preset or the general modulation “situation” or “system state”) of all other modulators, creating a layered meta-mapping-modulator configuration. At this point this is just an option, still too complex for our initial investigation. It brings attention to the modulator mapping used in the Hadron Particle Synthesizer, where a simple X-Y pad is used to interpolate between different states of the instrument. Each state containing modulation routing and mapping in addition to parameter values. The current Hadron implementation allows control over 209 parameters and 54 modulators via a simple interface. This enables a simplified multidimensional control in Hadron. Maybe the cross-adaptive situation can be thought as somehow similar. The instrumental interface of Hadron behaves in highly predictable ways, but it is hardly possible to decode intellectually, one has to interact by intuitive control and listening.

Listening/monitoring

The influence of the direct/unprocessed sound; With acoustic instruments, the direct sound from the instrument will be heard clearly in addition to the processed sound. In our initial experiments, we’ve simplified this by using electric guitar and close miked vocals. We mostly hear the result of the effects processing. Still, the analysis of features is done on the dry signal. This creates a situation where it may be hard to distinguish what features controls which modulations, because the modulation source is not heard clearly as a separate entity in the sound image. It is easy to mix the dry sound higher, but then we hear less of the modulations. It is also possible to allow the modulated sound be the basis of analysis (creating the possibility for even more complex cross-adaptive feedback modulations as the signals can affect each other’s source for analysis). Then this would possibly make it even harder for the musicians to have intentional control over the analyzed features and thus the modulations. So, the current scheme is, if not the final answer, it is a reasonable starting point.

Audience and fellow musicians’ perception of the interplay

How will the audience perceive this? Our current project does not focus on this question, but it is still relevant to visit it briefly. It also relates to expectations, to schooling of the listener. Do we want the audience to know? Does knowledge of the modulation interaction impede the (regular) appreciation of the musical expression? One could argue that a common symphony orchestra concert-goer does not necessarily know the written score, or have analyzed the music, but appreciates it as an aesthetic object on its own terms. The mileage may vary, some listeners know more and are more invested in details and tools of the trade. Still, the music itself does not require knowledge of how it is made to be appreciated. For a schooled listener, and also for ourselves, we can hope to be able to play with expectations with in the crossadaptive technique. Andreas mentions that listening to live crossadaptive processing as we demonstrated it is like listening to an unfamiliar spoken language, trying to extract meaning. There might be some frustration over not understanding how it works. Also, expectations of the fantastic new interaction mode and not hearing it can lead to disappointment. Using it as just another means of playing together, another parameter of musical expression alleviates this somewhat. The listener does not have to know, but will probably get the opportunity for an extra layer of appreciation with understanding of the process. In any case, our current research project does not directly concern the audience’s appreciation of the produced music. We are currently at a very basic stage of exploration and we need to experiment at a much lower level to sketch out how this kind of musics can work before starting to consider how (or even if) it can be appreciated by the audience.