Multi-camera recording and broadcasting

Audio and video documentaion is often an important component of projects that analyse or evaluate musical performance and/or interaction. This is also the case in the Cross Adaptive project where every session was to be recorded in video and multi-track audio, and subsequently within two days, this material should be organized and edited into a 2-5 minute digest. However, when documenting sessions in this project we face the challenge of having several musicians located in one room (the studio recording room), often so that they are facing each other rather than being oriented in the same direction. Thus, an ordinary single camera recording of the musicians would have difficulties including all musicians in one image. Our challenge, then, has been to find some way of doing this that has high enough quality to be able to detect salient aspects of the musical performances, that is in sync both for video streams and audio, and but isn’t too expensive, since the budget for this work has been limited, or too complicated to handle. In this blog post I will present some of the options I have considered, both in terms of hardware and software, and present some of the tests I have done in the process.

Commercial hardware/software solutions
One of the things we considered in the first investigative phase was to check the commercial market for multi-camera recording solutions. After some initial searches on the web I made a few inquiries, which resulted in two quotes for two different systems:

1. A system based on the Matrox VS4 Recorder pro (http://www.matrox.com/video/en/products/vs4/vs4recorderpro/)
2. StreamPix 7, multi-camera version (https://www.norpix.com/products/streampix/streampix.php).

Both included all hardware and software needed for in-sync multi-camera recording, except for the cameras in themselves. The StreamPix quote also included a computer with suitable specs. Both quotes were immediately deemed inadequate since prices were well beyond our budget (> $5000).

I also have to mention that after having set up a system for the first session in September 2016, we have seen the Apollo multicamera recorder/switcher, which looks like a very light weight and convenient way of doing multi-camera recording:

https://www.convergent-design.com/apollo

At $2.995 (price listed on the US web site), it might be a possibility to consider in the future, even if it still is in the high end of our budget range.

Using PCs and USB3 cameras
A project currently running at the NTNU (in collaboration with NRK and UNINETT) is Nettmusikk (Net Music). This project is testing solutions to do high-resolution, low latency streaming of audio and video over the internet. The project has bought four USB3 cameras (Point Grey Grasshopper, GS3-U3-41C6C-C) and four dual boot (Linux/win) PCs with high-performance specs. These cameras deliver a very high quality image with low latency. However, one challenge is that the USB3 standard is still not very developed so that cameras can “plug-and-play” on all platforms, and specifically that the software and drivers for the Point Grey cameras is proprietary, and won’t therefore provide images for all software out-of-the-box.

After doing some research on the Point Grey camera solution, we found that there were several issues that made this option unsuitable and/or impractical.
1. To get full sync between cameras, they needed to have an optical strobe sync, which required a master/slave configuration with all cameras to in the same room.
2. There was no software that could do multi-camera recording from USB3 out-of-the-box. Rather, Point Grey provided an SDK (Fly Capture) with working examples that allowed building of our own applications. While this was an option we considered, it still looked like an option that demanded a bit of programming work (C++).
3. Because the cameras stream and record in raw we would also need very fast SSDs to record 4 video files at once. If we were recording 4 video streams at 8-bit 1920×1080 @ 30 FPS that would be 248.8 mb/s (62.2 x 4) of data. This would also fill up hard drive space pretty fast.

USB2 HD cameras
It turned out that one of the collaborators in the Nettmusikk project had access to four USB2 HD cameras. Since the USB2 protocol is a cross-platform standard that provides plug-and-play and usually has few issues with compatibility. The quality of these Tanberg Precision HD cameras was far from professional production quality, but had reasonably clear images. When viewing the camera image on a computer, there was a barely noticable latency. Here is a review made by the gadgeteer:

http://the-gadgeteer.com/2010/03/26/tandberg-precisionhd-usb-video-camera-review/

Without high-quality lenses, however, the possibilities for adjusting zoom, angle and light sensiticity were non-existent, and particularly the latter proved to be an issue in the sessions. Especially we went into problems if direct light sources were in the image, so that careful placement of light sources was necessary. Also, it was sometimes difficult to get enough distance between the cameras and the performer since the room wasn’t very big, and an adjustable lens would probably have been of help here. Furthermore, the cameras has an autofocus function that sometimes was confused in the first session, making the image blurry for a second or two, until it was appropriately adjusted. Lastly, the cameras had a foot suitable for table placement, but no possibilities of fastening it to a standard stand, like the Point Grey camera has. I therefore had to use duck tape to fasten the cameras, which worked ok, but looked far from professional.

A second issue for concern was whether these cameras could provide images with long cable lengths. What I learned here was that they would operate perfectly with an 5m USB extension if they were connected to a powered hub at the end, that provided the cameras with sufficient power. It turned out that two cameras per hub did work, but might sometimes introduce some latency and drop-outs. Thus, taking the signal from four cameras via four hubs and extension chords into the four USB ports on the PC seemed like the best solution. (Although we only used three hubs in the first session). That gave us a fairly high flexibility in terms of placing the cameras.

Mosaic image
Due to the very high amounts of data involved in dealing with multiple cameras, using instead a composite/ mosaic gathering all four images into one, seemed like a possible solution. Arguably, it would going from HD to HD/4 per image. Still, this option had the great advantage of making post-production switching between cameras superfluous. And since the goal in this project was to produce a 2-5 minute digest in two days this seemed like a very attracive option. The question was then how to collate and sync four images into one without any glitches or hiccups. In the following, I will present the four solutions that was tested:

1. ffmpeg (https://www.ffmpeg.org/)
ffmpeg is an open source command line software that can do a number of different operation on audio and video streams and files; recording, converting, filtering, ressampling, muxing, demuxing, etc. It is highly flexible and customizable with the cost of having a steep user threshold with cumbersome operation, since everything has to be run at command line. The flexibility and the fact that it was free still made it an option to consider.

ffmpeg could be easily installed OSX from pre built binaries. On Windows it also has to be built from sources with the mingw-w64 project (see https://trac.ffmpeg.org/wiki/CompilationGuide/MinGW). This option seemed like a bit of work, but at the time when it was considered, it still sounded like a viable option.

After some initial tests based on examples in the ffmpeg wiki (https://trac.ffmpeg.org/wiki) I was able to run the following script as a test on OSX:

ffmpeg -f avfoundation -framerate 30 -pix_fmt:0 uyvy422 -i “0” -f avfoundation -framerate 30 -pix_fmt:0 uyvy422 -i “1” -f avfoundation -framerate 30 -pix_fmt:0 uyvy422 -i “0” -f avfoundation -framerate 30 -pix_fmt:0 uyvy422 -i “0” -filter_complex “nullsrc=size=640×480 [base]; [0:v] setpts=PTS-STARTPTS, scale=320×240 [upperleft]; [1:v] setpts=PTS-STARTPTS, scale=320×240 [upperright];  [2:v] setpts=PTS-STARTPTS, scale=320×240 [lowerleft]; [3:v] setpts=PTS-STARTPTS, scale=320×240 [lowerright]; [base][upperleft] overlay=shortest=1 [tmp1]; [tmp1][upperright] overlay=shortest=1:x=320 [tmp2]; [tmp2][lowerleft] overlay=shortest=1:y=240 [tmp3]; [tmp3][lowerright] overlay=shortest=1:x=320:y=240” -t 5 -target film-dvd -y ~/Desktop/output.mpg

The script produced four streams of video, one from the an external web camera and three from the internal Face Time HD camera. However, the individual images were far from in sync, and seemed to loose the stream and/or lag behind several seconds at a time. This initial test plus the sync issues and the prospects of a time consuming build process on Windows made me abandon the ffmpeg solution.

2. Processing
Using Processing, described as a “flexible software sketchbook and a language for learning how to code within the context of the visual arts” (https://processing.org/), seemed like a more promising path. Again I relied on examples I located on the web and put together a sketch that seemed to do what we were after (I also had to install the video library to be able to run the sketch).

import processing.video.*;
Capture camA;
Capture camB;
Capture camC;
String[] cameras;

void setup(){
size(1280, 720);
cameras=Capture.list();
println(“Available cameras:”);
for (int i = 0; i < cameras.length; i++) {
println(cameras[i], i);
}

/// Choose cameras with the appropriate number from the list and corresponding resolution
/// Here I had to look in the printout list to find the correct numbers to enter
camA = new Capture(this,640,360,cameras[18]);
camB = new Capture(this,640,360,cameras[3]);
camC = new Capture(this,640,360,cameras[33]);
camA.start();
camB.start();
camC.start();
}
void draw() {
image(camA, 0, 0, 640,360);
image(camB, 640, 0, 640,360);
image(camC, 0, 360, 640,360);
}

void captureEvent(Capture c) {
if(c==camA){
camA.read();
}else if(c==camB) {
camB.read();
}else if(c==camC) {
camC.read();
}
}

This sketch nicely gathered the images from three cameras into my mac, with little latency (approx. 50-100 ms). This made me opt to go further and port the sketch to Windows. After installing Processing on the win-PC and running the sktetch there however, I could only get one image out at a time. My hypothesis was that the problem came from all the different cameras having the same number in the list of drivers. These problem made me abandon Processing in search for simpler solutions.

3. IP Camera Viewer
After doing some more searches, I came across IP Camera viewer (http://www.deskshare.com/ip-camera-viewer.aspx), a free and light-weight application for different win versions, with support for over 2000 cameras according to their web site. After some initial tests, I found that this application was a quick and easy solution to what we wanted in the project. I could easily gather up to four camera streams in the viewer and the quality seemed good enough to capture details of the performers. It was also very easy to set up and use, and also seemed stable and robust. Thus, this solution turned out to be what we used in our first session, and it gave us results that were good enough in quality for performance analysis.

4. VLC (http://www.videolan.org/vlc/)
The leader of the Nettmusikk project, Otto Wittner, made a VLC script to produce a mosaic image, albeit with only two images:

# Webcam no 1
new ch1 broadcast enabled
setup ch1 input v4l2:///dev/video0:chroma=MJPG:width=320:height=240:fps=30
setup ch1 output #mosaic-bridge{id=1}

# Webcam no 2
new ch2 broadcast enabled
setup ch2 input v4l2:///dev/video1:chroma=MJPG:width=320:height=240:fps=30
setup ch2 output #mosaic-bridge{id=2}

# Background surface (image)
new bg broadcast enabled
setup bg input bg.png
# Add mosaic on top of image, encode everything, display the result as well as stream udp
setup bg output #transcode{sfilter=mosaic{width=640,height=480,cols=2,rows=1,position=1,order=”1,2″,keep-picture=enabled,align=1},vcodec=mp4v,vb=5000,fps=30}:duplicate{dst=display,dst=udp{dst=streamer.uninett.no:9400}}
setup bg option image-duration=-1  # Make background image be streamed forever

#Start everything
control bg play
control ch1 play
control ch2 play

While it seemed to work well on Linux with two images, the fact that we already had a solution in place made it natural not to persue this solution further at the time.

5. Other software
Among other software I located that could gather several images into one was CaptureSync (http://bensoftware.com/capturesync/). This software was for mac only, and was therefore not tested.

Screen capture and broadcast
Another important function that still wasn’t covered with the IP camera viewer was recording. Another search uncovered many options here, most of them directed towards gaming capture. The first of these I tested was OBS, Open Broadcaster Software (https://obsproject.com/). This open source software turned out to do exactly what we wanted and more, so this became the solution we used at the first session. The software was easily setup to record from the desktop with a 30fps frame rate and close to HD quality with output set to .mp4. There was, however, a noticable lag between audio and video, with video lagging behind approximately 50-100ms. This was corrected in post-production editing, but could potentially have been solved with inserting a delay on the audio stream until it was in sync. Also there were occasional clicks in the audio stream, but I did not have time to figure out whether this was caused by the software or the (internal) sound device we used for recording. We will do more tests on this matter later. The recordings were started with a clap to allow for post-sync.

Surprisingly, OBS was very easy to set it up for live-streaming. I used my own YouTube account where I set up for the Stream Now option (https://support.google.com/youtube/answer/2853700?hl=en). This gave me a a key code that I could copy into OBS, and then simply press the “Start streaming” button. The stream had a lag of about 10 seconds, but had a good quality and consistence. Thereby we had easily set up something we only considered as optional to begin with. Øyvind was very pleased to be able to follow the last part of the first session from his breakfast table in San Diego!

 

Live stream from session with jazz students
Live stream from session with jazz students

Custom video markup solution
Øyvind wrote a small command line program in python that could generate a time-code based list of markers to go with the video. When pressing a number one could indicate when (number of seconds before pressing) a comment was to be inserted relative to a custom set timer. This made it possible to easily synchronize with the clock that IP Camera Viewer printed on top of the video images. Moreover, it allowed rating significance. Although it required a little bit of practice, I found it useful for making notes during the session, as well as when going through the post-session interview videos. One possible way of improving it would be to have a program that could merge and order (in time) comments made either in different run throughs or by different commentators.

Editing
After two days of recording the videos were to be edited down to a single five minute video. After some testing with free software like iMovie (mac) and MovieMaker (win), I abandoned both options due to lack of options and intuitive use. After a little bit of serching I discovered Filmora from Wondershare (http://filmora.wondershare.com/), which I tried out as a free demo before I decided to buy it ($29.90 for a 1-year license). In my view, it was lightweight, had sufficient options to do simple editing and it was quick and easy to use.

Conclusions
We ended up with a multicamera recording and live-streaming solution that was easy to use and very cheap to set up. The only expenses we have had so far has been USB extension cords and hubs as well as the Filmora editor, which was also cheap. Although we do not have our own cameras yet, the prices of new USB2 cameras would not imply a big cut into the budget, if we need to buy four of them. Moreover, finding free software that gave us what we wanted out-of-the-box was a huge relief after intital tests with software like ffmpeg, Processing and VLC.

Session 19. October 2016

Location: Kjelleren, Olavskvartalet, NTNU, Trondheim

Participants: Maja S. K. Ratkje, Siv Øyunn Kjenstad,  Øyvind Brandtsegg, Trond Engum, Andreas Bergsland, Solveig Bøe, Sigurd Saue, Thomas Henriksen

Session objective and focus

Although the trio BRAG RUG has experimented with crossadaptive techniques in rehearsals and concerts during the last year, this was the first dedicated research session on the issues specific to these techniques. The musicians were familiar with the interaction model of live processing, the electronic modifications of the instrumental sound in real time is familiar ground. Notably, the basic concepts of feature extraction and the use of these features as modulators for the effects processing was also familiar ground for the musicians before the session. Still, we opted to start with very basic configurations of features and modulators as a means of warming up. The warm-up can be valid both for the performative attention, the listening mode, and also a way of verifying that the technical setup works as expected.
The session objective and focus is simply put to explore further the crossadaptive interactions in practice. As of now, the technical solutions work well enough to start making music, and the issues that arise during musical practice are many-faceted and we have opted not to put any specific limitations or preconceived directinos on how these explorations should be made.

As preparation for the session we (Oeyvind) had prepared a set of possible configurations (sets of features, mapped to sets of effects controllers). As the musicians was familiar with the concepts and techniques, we expected that a larger set of possible situations to be explored would be necessary. As it turned out, we spent more time investigating the simpler situations, so we did not get to test all of the prepared interaction situations. The extra time spent on the simpler situations was not due to any unforeseen problems, rather that there were interesting issues to explore even in the simpler situations, and that it seemed highly valid to explore those more fully rather than thinly testing a large set of situations. With this in mind, we could easily have continued our explorations for several more days.
The musician’s previous experience with said techniques provided an effective base for communication around the ideas, issues and problems. The musicians’ ideas were clearly expressed, and easily translated into modifications in the mapping. With regards to the current status of the technical tools, we were quickly able to do modifications to the mappings based on ad hoc suggestions during the sessions. For example changing the types of features (by having a plethora of usable features readily extracted), inverting, scaling, combining different features, gating etc. We did not get around to testing the new modulator mapping of “absolute difference” at all, so we note that as a future experiment. One thing Oeyvind notes as still being difficult is changing the modulation destination. This is presently done by changing midi channels and controller numbers, with no reference to what these controller numbers are mapped to.  This also partly relates to the Reaper DAW lacking a single place to look up active modulators. It seems there are no known technologies for our MIDIator to poll the DAW for the names of the modulator destinations (the effect parameter ultimately being controlled). Our MIDIator has a “notes” field that allow us to write any kind of notesassumed to be used for jotting down destination parameter names. One obvious shortcoming of this is that it relies solely on manual (human) updates. If one changes the controller number without updating the notes, there will be a discrepancy, perhaps even more confusing than not having any notes at all.

Takes

Session 19.10.16 take 1 brak_rug Session 19.10.16 take 1 brak_rug
  1. [numbered as take 1] Amplitude (rms_dB) on voice controls reverb time (size) on drums. Amplitude (rms_dB) on drums controls delay feedback on vocals. In both cases, higher amplitude means more (longer, bigger) effect.

Initial comments: When the effect is big on high amplitude, the action is very parallel. The energy flowing in only one direction (intensity-wise).  In this particular take, the effects modulation was a bit on/off, not responding to the playing in a nuanced manner. Perhaps the threshold for the reverb was also too low, so it responded too much (even to low amplitude playing). It was a bit uncleer for the performers how much control they had over the modulations.
The use of two time based effects at the same time tends to melt things together – working against the same direction/dimension (when is this wanted unwanted in a musical situation?).

Reflection when listening to take 1 after the session: Although we now have experienced two sessions where the musicians preferred controlling the effects in an inverted manner (not like here in take 1, but like we do later in this session, f.ex. take 3), when listening to the take it feels like a natural extension to the performance sometimes to simply have more effect when the musicians “lean into it” and play loud. Perhaps we could devise a mapping strategy where the fast response of the system is to reduce the effect (e.g. the reverb time), then if the energy (f.ex. the amplitude) remains high over an extended duration (f.ex. longer than 30 seconds), then the effect will increase again. This kind of mapping strategy suggests that we could create quite intricate interaction schemes just with some simple feature extraction (like amplitude envelope), and that this could be used to create intuitive and elaborate musical forms (“form” here defined as the evolution of effects mapping automation over time.).

 

 

session-19-10-16-take-1-2-brak_rug session-19-10-16-take-1-2-brak_rug
  1. [numbered as take 1.2] Same mappings as in take 1, with some fine tuning: More direct sound on drums; Less reverb on drums; Shorter release on drums’s amplitude envelope (vocal delay feedback stops sooner when drums goes from loud to soft). More nuanced disposition of vocal amplitude tracking to reverb size.
    session-19-10-16-take-1-3-brak_rug session-19-10-16-take-1-3-brak_rug
  2. [numbered as take 1.3] Inverted mapping, using same features and same effects parameters as in take 1 and 2. Here, louder drums means less feedback of the vocal delay; Similarly, louder vocals gives shorter reverb time on the drums.

There was a consensus amongst all people involved in the session that this inverted mapping is “more interesting”. It also leads to the possibility for some nice timbral gestures; When the sound level goes from loud to quiet, the room opens up and prolongs the small quiet sounds. Even though this can hardly be described as a “natural” situation (in that it is not something we can find in nature, acoustically), it provides a musical tension of opposing forces, and also a musical potential for exploring the spatiality in relation to dynamics.

 

Session 19.10.16 take 2 brak_rug Session 19.10.16 take 2 brak_rug
  1. [numbered as take 2] Similar to the mapping for take 3, but with a longer release time, meaning the effect will take longer to come back after having been reduced.

The interaction between the musicians, and their intentional playing of the effects control is tangible in this take. One can also hear the drummers’ shouts of approval here and there (0:50, 1:38).

 

 

Session 19.10.16 take 3 brak_rug Session 19.10.16 take 3 brak_rug
  1. [numbered as take 3] Changing the features extracted to control the same effects . Still controlling reverb time (size) and delay feedback. Using transient density on the drums to control delay feedback for the vocals, and pitch of the vocals to control reverb size for the drums. Faster drum playing means more delay feedback on vocals. Higher vocals pitch means longer reverberation on the drums.

Maja reacted quite strongly to this mapping, because pitch is a very prominent and defining parameter of the musical statement. Linking it to control some completely other parameter brings something very unorthodox into the musical dialogue. She was very enthusiastic about this opportunity, being challenged to exploit this uncommon link. It is not natural or intuitive at all, thus being hard work for the performer, as one’s intuitive musical response and expression is contradicted directly. She would make an analogy to a “score”, like written scores for experimental music. In the project we have earlier noted that the design of interaction mappings (how features are mapped to effects) can be seen as compositions. It seems we think in the same way about this, even though the terminology is slightly different. For Maja, a composition involves form (over time), while a score to a larger extent can describe a situation. Oeyvind on the other hand uses “composition” to describe any imposed guideline put on the performer, anything that instructs the performer in any way. Regardless of the specifics of terminology, we note that the effect of the pitch-to-reverbsize mapping was effective in creating a radical or unorthodox interaction pattern.
One could crave for also testing out this kind of mapping in the inverted manner, letting transient density on the drums give less feedback, and lower pitches give larger reverbs.  At this moment in the session, we decided to spend the last hour on trying something entirely different, just to have also touched upon other kinds of effects.

 

 

Session 19.10.16 take 4 brak_rug Session 19.10.16 take 4 brak_rug
  1. [numbered as take 4] Convolution and resonator effects. We tested a new implementation of the live convolver. Here, the impulse response for the convolution is live sampled, and one can start convolving before the IR recording is complete. An earlier implementation was the subject of another blog post, the current implementation being less prone to clicks during recording and considerably less CPU intensive. The amplitude of the vocals is used to control a gate (Schmitt trigger) that in turn controls the live sampling. When the vocal signal is stronger than the selected threshold, recording starts, and recording continues as long as the vocal signal is loud. The signal to be recorded is the sound of the drums. The vocal signal is used as the convolution input, so any utterance on the vocals gets convolved with the recent recording of the drums.
    In this manner, the timbre of the drums is triggered (excited, energized) by the actions of the vocal. Similarly, we have a resonator capturing the timbral character of the vocal and this resonator is triggered/excited/energized by the sound of the drums. Transient density of the drums is run through a gate (Schmitt trigger), similar to the one used on the vocal amplitude. Fast playing on the drums trigger recording of the timbral resonances of the voice. The resonator sampling (one could also say resonator tuning) is a one-shot process triggered each time the drums has been playing slow and then changes to faster playing.
    Summing up the mapping: Vocal amplitude triggers IR sampling (using drums as source, applying the effect to the vocals). Drums transient density triggers vocal resonance tuning (applying the effect on the drums).

The musicians liked very well playing with the resonator effect on the drums. It gave a new dimension to taking features from the other instrument and applying it to one’s own. In some way it resembles the “unmoderated” (non-cross-adaptive) improvising together, as in that setting these (these two specific) musicians also commonly borrows material from each other. The convolution process on the other hand seemed a bit more static, like one-shot playback of a sample (although it is not, it is also somewhat similar). The convolution process can also be viewed as getting delay patterns from the drum playing. Somehow, the exact patterns does not differ so much when fed a dense signal, so changes in delay patterns are not so dramatic. This can be exploited with intentionally playing in such a way as to reveal the delay patterns. Still, that is a quite specific and narrow musical situation. Other things one might expect to get from IR sampling is capturing “the full sound” of the other instrument. Perhaps the drums are not that dramatically different/dynamic  to release this potential, or perhaps also the drum playing could be adapted to the IR sampling situation, putting more emphasis on different sonic textures (wooden sounds, boomy sounds, metallic sounds, click-y sounds).  In any case, for our experiments in this session, we did not seem to release the full expected potential of the convolution technique.

Comments:

Most of the specific comments done for each take above, but some general remarks for the session:

Status of tools:
We seem to have more options (in the tools and toys) than we have time to explore,  which is a good indication that we can perhaps relax the technical development and focus more on practical exploration for a while.  It also indicates that we could (perhaps should) book practical sessions over several consecutive days, to allow more in-depth exploration.

Making music:
It also seems that we are well on the way to actually making music with these techniques, even though there is still quite a bit of resistance in the techniques. What is good is that we have found some situations that work well musically (for example when amplitude inversely affecting room size). Then we can expand and contrast these with new interaction situations where we experience more resistance (for example like we see when using pitch as a controller). We also see that even the successful situations appears static after some time, so the urge to manually control and shape the result (like in live processing) is clearly apparent. We can easily see that this is what we will probably do when playing with these techniques outside the research situation.

Source separation:
During this studio situation, we put the vocals and drums in separate rooms to avoid audio bleed between the instruments. This made for a clean and controlled input signal to the feature extractor, and we were able to better determine the qualitu of the feature extraction. Earlier sessions done with both instruments in the same room gave much more crosstalk/bleed and we would sometimes be unsure if the feature extractor worked correctly.  As of now it seems to work reasonably well. Two days after this session, we also demonstrated the techniques on stage with amplification via P.A.  This proved (as expected) very difficult due to a high degree of bleed between the different signals. We should consider making a crosstalk-reduction processor. This might be implemented as a simple subtraction of one time-delayed spectrum from the other, introducing additional latency to the audio analysis. Still, the effects processing for audio output need not be affected by the additional latency, only the analysis. Since many of the things we want to control change relatively slowly, the additional latency might be bearable.

Video digest

 

Session 20. – 21. September

Location: Studio Olavskvartalet, NTNU, Trondheim

Participants:

Trond Engum, Andreas Bergsland, Tone Åse, Thomas Henriksen, Oddbjørn Sponås, Hogne Kleiberg, Martin Miguel Almagro, Oddbjørn Sponås, Simen Bjørkhaug, Ola Djupvik, Sondre Ferstad, Björn Petersson, Emilie Wilhelmine Smestad, David Anderson, Ragnhild Fangel

Session objective and focus:

This post is a description of a session with 3rd year Jazz students at NTNU. It was the first session following the intended procedure framework for documentation of sound and video in the cross adaptive processing project. The session was organized as part of the ensemble teaching that is given to Jazz students at NTNU, and was meant to take care of both the learning outcomes from the normal ensemble teaching, and also aspects related to the cross adaptive project. This first approach was meant as a pilot for the processing musician and the video and documentation aspect. It can also be seen as a test on how musicians not directly involved in the project react and reflect within the context of cross adaptive processing. We made a choice to keep the processing system as simple as possible to make the technical aspect of the session as understandable as possible for all parts involved. To achieve this we only used analyses of RMS and transient density on the instruments to link conventional instrumental performance close to how the effects were controlled. As an example: The louder you play either the more or less of one effect is heard, or depending on the transient density the more or less of one effect is heard. All instruments were set up to control the amount of processed signal on two different effects each, giving the system a potential to use up to four different effects at the same time. The processing musician had a possibility to adjust the balance between the different processed signals during performance. The direct signal from the instruments was also kept as part of the mix to reduce an alienation from the acoustic sound of the instrument since this was a first attempt for those involved. Since the musicians, besides performing on their instruments, also indirectly perform the sound production we chose to follow up a premise of a shared listening strategy used by the T-Emp ensemble. https://www.researchcatalogue.net/view/48123/53026   The basic idea is that if everyone hears the same mix, they will adjust individually and consequently globally to the sound image as a whole through their instrumental and sound production output.

Sound examples:

Besides normalising, all sound examples presented in this post are unprocessed and unedited in postproduction meaning that they have the same mix between instruments and processing as the musicians and observers auditioned during recording.
20.09.2016

Take 1 – Session 20.09.16 take 1 drums_piano.wav

Drums and piano

Drums controlling the volume of how much overdrive or reverb is added to the piano. The analysing parameters used on the drums are rms_dB and trans_dens. The louder the drums play, the more overdrive on the piano, and the larger transient density the less reverb on the piano. The take has some issues with bleeding between microphones since both performers are in the same room. There was also a need for fine-tuning the behaviour of both effects through the rise and fall parameters in the mediator plug in during the take.

Even though it would be natural to presume that more is more in a musical interplay – for example that the louder you play the more effect you receive – the performers experienced that more is less was more convenient on the reverb in this particular setting. It felt natural for the musicians that when the drummer played faster (higher degree of transient density) the amount of reverb on the piano was decreased. After this take we had a system crash. As a consequence the whole set up was lost, and a new set up was built from scratch for the rest of the session.

Take 2 – Session 20.09.16 take 2 drums_piano.wav

Drums and piano

The piano controlling the loudness on the amount of echo and reverb are added to the drums. The larger transient density the more echo on drums, the louder piano the more reverb on the drums.

Take 3 – Session 20.09.16 take 3 drums_piano.wav

The piano controlling the volume of how much echo and reverb are added to the drums. The larger transient density the more echo on drums, the louder piano the more reverb on the drums. Loudness on drums controls overdrive on piano (the louder the more overdrive), and transient density controls reverb – the larger transient density the less reverb on the piano.

 

21.09.2016

Due to technical challenges the first day we made some adjustments on the system in order to avoid the most obvious obstacles. We changed from condenser to dynamic microphones to decrease the amount of bleeding between them (bleeding affects the analyser plug-in and also adds the same effect to both instruments, as experienced earlier in the project). The reason for the system crash was identified in the Reaper setup, and a new template was also made in Ableton live. We chose to keep the set up simple during the session to clarify the musical results (Only two analysing parameters controlling two different effects on each instrument).

Take 1 – Session 21.09.16 take 1 drums_guitar.wav

Percussion and guitar 

Percussion controlled the amount of effect on the guitar using rms-dB and trans-dens as analysis parameters – the louder percussion, the more reverb on the guitar, and the larger the transient density on percussion, the more echo on the guitar. This set up was experienced as uncomfortable for the performers, especially when loud playing resulted in more reverb. There was still an issue with bleeding even though we had changed to dynamical microphones.

Take 2 – Session 21.09.16 take 2 drums_guitar.wav

We agreed upon some changes suggested by the musicians. We used the same analysis parameters on the percussion, with percussion controlling the amount of effect on the guitar, using rms-dB and trans-dens: The louder percussion, the less overdrive on guitar, and the larger transient density of the percussion, the more echo on guitar. This resulted in a more comfortable relationship between the instruments and effects seen from the performers perspective.

Take 3 – Session 21.09.16 take 3 drums_guitar.wav

This setup was constructed to interact both ways through the effects applied to the instruments. We used the same analysis parameters on the percussion with percussion controlling the amount of effect on the guitar, using rms-dB and trans-dens: The louder percussion, the less overdrive on guitar, and the more transient density of the percussion, the more echo on guitar. We used the same analysis parameters on the guitar as on the percussion (rms-dB and trans-dens): The louder guitar, the more overdrive on percussion, and larger transient density on guitar the more reverb on percussion. During this session the percussionist also experimented with the distance to the microphones to test out movement and proximity effect in connection with the effects. The trans-dens analyses on the guitar did not work as dynamically as expected due to some technical issues during the take.

Take 4 – Session 21.09.16 take 4 harmonica_bass.wav

Double bass and harmonica

We started out with letting the harmonica control the effects on the double bass based using the analyses of rms-dB and trans_dens. The more harmonica volume, the less overdrive on bass, and the more transient density on the harmonica, the more echo on bass. This first take was not very successful due to bleeding between microphones (dynamical). There was also an issue with the fine-tuning in the mediator concerning the dynamical range sent to the effects. This resulted in a experience that the effects where turned on and off, and not changed dynamically over time which was the original intention. There was also a contradiction between musical intention and effects. (The harmonica playing louder reducing overdrive on the bass)

Take 5 – Session 21.09.16 take 5 harmonica_bass.wav

Based on suggestions by the musicians we set up a system controlling effects both ways. The harmonica still controlled the effects on the double bass based on the analyses of rms-dB and trans_dens, but now we tried the following: The more harmonica volume, the more overdrive on bass, and the more transient density, the more echo on bass. The bass used analyses of two instances of rms-dB to control the effects on the harmonica: The louder the bass, the less reverb on the harmonica, and the louder the bass the more overdrive on the harmonica. This set created a much more “intuitive” musical approach seen from the performers perspective. The function with the harmonica controlling the bass overdrive worked against the performers musical intention, and this functionality was removed during the take. This should probably not have been done during a take, but it was experienced as a bit confusing for the instrumentalists in this setting. This take show how dependent the musicians are on each other in order to “activate” the effects, and also the relationship and potential between musical energy followed up by effects.

Take 6 – Session 21.09.16 take 6 vocals_saxophone.wav

Vocals and Saxophone

In this setup the saxophone controls the effects on the vocals based on two instances of rms_dB analyses: The louder the saxophone, the less reverb on the vocals, and the louder the saxophone, the more echo is added to the vocals. The first attempt had some problems with the balance between dry and wet signal in the mix from the control room. The vocal input was lower compared to sources present earlier that day. As a consequence, the saxophone player was unable to hear both the direct signal of the voice, but also the effects he added to the voice. This was tried compensated for during the take by boosting the volume of the echo during the take, but without any noticeable result in the performance. Another challenge that occurred in this take was a result of the choice of effects. Both reverb and echo can sound quit similar on long vocal notes, and as a consequence blurred the experienced amount of control over the effects. This pinpoints the importance of good listening conditions for the musicians in this project, especially if working with more subtle changes in the effects.

Take 7 – Session 21.09.16 take 7 vocals_saxophone.wav

Before recording a new take we made adjustments in the balance between wet and dry sound in the control room. We then did a second take with the same settings as take 6. It was clear that the performers took more control over the situation because of better listening conditions. The performers expressed a wish to practise with the set up on beforehand. Another aspect that was asked for by the performers was that the system should open up for subtle changes in the effects while still keeping the potential to be more radical at the same time depending on the instrumental input. The saxophone player also mentioned that it could be interesting to add harmonies as one of the effects since both instruments are monophonic.

Take 8 – Session 21.09.16 take 8 vocals_saxophone.wav

The last take of the day included the vocals to control the effects on the saxophone. The saxophone was kept with the same system as in the two former takes: The louder the saxophone, the less reverb on the vocals, and the louder the saxophone, the more echo was added to the vocals. The vocals were analysed by rms_dB and trans_dens: The more loudness on vocals, the more overdrive on saxophone, and the larger transient density on the vocals, the more echo on the saxophone. Both performers experienced that this system was meaningful both musically and in aspect of control over the effects. They communicated that there was a clear connection between the musical intention and the sounding result.

Comments:

All the involved musicians communicated that the experiment was meaningful even though it was another way of interacting in a musical interplay. The performers want to try this more with other set ups and effects. Even though this was the first attempt for the performers and the processing musician we achieved some promising musical results.

Technical issues to be solved:

  • There are still some technical issues to be solved in the software (first of all avoiding system crashes during sessions). Bleeding between microphones is also an issue that needs to be attended to. (This challenge will be further magnified in a live setting).
  • On/off control versus dynamical control, We need more time together with the performers to rehearse and fine – tune the effects.
  • Performers involved need more time together with the processing musician to rehearse before takes in order to familiarize with the effects and how they affect them.
  • The performers experience of being in control/not being in control of the effects.

Other thoughts from the performers:

  • Suggestions about visual feedback and possibility to make adjustments directly on the effects themselves.
  • It could be interesting to focus just as much on removing attributes from the effects through instrumental control as adding. (Adding more effects is often the first choice when working with live electronics.
  • Agree upon an aesthetical framework before setting up the system involving all performers.

 

Link to a video with highlights from the session:

https://youtu.be/PqgWUzG8B9c

 

 

Mixing with Gary

During our week in London we had some sessions with Gary Bromham, first at the Academy of Contemporary Music in Guildford on the June 7th , then at QMUL later in the week. We wanted to experiment with cross-adpative techniques in a traditional mixing session. Using our tools/plugins within a Logic session to work similar to the traditional sidechaining, but with the expanded palette of analysis and modulator mappings enabled by our tools developed in the project. Initially we tried to set this up with Logic as the DAW. It kind of works, but seems utterly unreliable. Logic would not respond to learned controller mappings after we close the session and reopen it. It does receive the MIDI controller signal (and can re-learn) but in all cases refuse to respond to the received automation control. In the end we abandoned Logic altogether and went for our safe always-does-the-job Reaper.

As the test session for our experiments we used Sheryl Crow “Soak up”, using stems for the backing tracks and the vocals.

2016_6_soakup mix1 pitchreverb 2016_6_soakup mix1 pitchreverb

Example 1: Vocal pitch to reverb send and reverb decay time.

2016_6_soakup mix2 experiment 2016_6_soakup mix2 experiment

Example 2: Vocal pitch as above. Adding vocal flux to hi cut frequency for the rest of the band. Rhythmic analysis (transient density) of the backing track controls a peaking EQ sweep on the vocals, creating a sweeping effect somewhat like a phaser. This is all somewhat odd all together, but useful as an controlled experiment in polyphonic crossadaptive modulation. The

* First thing Gary ask is to process one track according to the energy in a specific frequency band in another. For example “if I remove 150Hz on the bass drum, I want it to be added to the bass guitar”.  Now, it is not so easy to analyze what is missing, but easier to analyze what is there. So we thought of another thing to try; Sibliants (e.g. S’es) on the Vocals can be problematic when sent to reverb or delay effects. Since we don’t have a multiband envelope follower (yet), we tried to analyze for spectral flux or crest, then use that control signal to duck the reverb send for the vocals.

* We had some latency problems, relating to pitch tracking of vocals, the modulator signal arriving a bit late to precisely control the reverb size for the vocals. The tracking is ok, but the effect responds *after* the high pitch. This was solved by delaying the vocal *after* it was sent to the analyzer, then also delaying the direct vocal signal and the rest of the mix accordingly.

* Trond idea for later: Use vocal amp to control bitcrush mix on drums (and other programmed tracks)

* Trond idea for later: Use vocal transient density to control delay setting (delay time, … or delay mix)

* Bouncing the mix: Bouncing does not work, as we need the external modulation processing (analyzer and MIDIator) to also be active. Logic seems to disable the “external effects” (like Reaper here running via Jack, like an outboard effect in a traditional setting) when bouncing.

* Something good: Pitch controlled reverb send works quite well musically, and is something one would not be able to do without the crossadaptive modulation techniques. Well, it is actually just adaptive here (vocals controlling vocals, not vocals controlling something else).

* Notable: do not try to fix (old) problems, but try to be creative and find new applications/routings/mappings. For example the initial ideas from Gary was related to common problems in a mixing situation, problems that one can already fix (with de-essers or similar)

* Trond: It is unfamiliar in a post production setting to hear the room size change, as one is used to static effects in the mix.

* It would be convenient if we could modulate the filtering of a control signal depending on analyzed features too. For example changing the rise time for pitch depending on amplitude.

* It would also be convenient to have the filter times as sync’ed values (e.g. 16th) relative to the master tempo

FIX:

– Add multiband rms analysis.

– check roundtrip latency of the analyzer-modulator, so the time it takes from an audio signal is sent until the modulator signal comes back.

– add modulation targets (e.g. rise time). This most probably just works, but we need to open the midi feed back into Reaper.

– add sync to the filter times. Cabbage reads bpm from host, so this should also be relatively straightforward.

 

 

Mixing example, simplified interaction demo

When working further with some of the examples produced in an earlier session , I wanted to see if I could demonstrate the influence of one instrument’s influence of the the other instruments sound more clearly. Here’ I’ve made an example where the guitar controls the effects processing of the vocal. For simplicity, I’ve looped a small segment of the vocal take, to create a track the is relatively static so the changes in the effect processing should be easy to spot. For the same reason, the vocal does not control anything on the guitar in this example.

The reaper session for the following examples can be found here.

2016_5_CA_sidechain_pitchverb_git_to_voc_static 2016_5_CA_sidechain_pitchverb_git_to_voc_static

Example track: The guitar track is split into two control signals, one EQ’ed to contain only low frequencies, the other with only high frequencies. The control signals are then gated, and used as sidechain control signals for two different effects tracks processing the vocal signal. The vocal signal is just a short loop of quite static content, to make it easier to identify the changes in the effects processing.

2016_5_CA_sidechain_pitchverb_git_to_voc_take 2016_5_CA_sidechain_pitchverb_git_to_voc_take

Example track: As above, but here the original vocal track is used as input to the effects, giving a more dynamic and flexible musical expression.