June 9th and 10th we visited QMUL, met Joshua Reiss and his eminent colleagues there. We were very well taken care of and had a pleasant and interesting stay. June 9th we had a seminar presenting the project, and discussing related issues with a group of researchers and students. The seminar was recorded on video, to be uploaded on QMUL youtube. The day after we had a meeting with Joshua, going in more detail. We also got to meet several PhD students and got insight into their research.
Here’s some issues that were touched upon in the seminar discussion:
* analyze gestures inherent in the signal, e.g. crescendo, use this as trigger, to turn some process on or off, flip a preset etc. We could also analyze for very specific patterns, like a melodic fragment, but probably better to try to find gestures that can be performed in several different ways, so that the musician can have freedom of expression while providing a very clear interface for controlling the processes.
* Analyze features related to specific instruments. Easier to find analysis methods to extract very specific features. Rather than asking for “how can we analyze this to extract something interesting… This is perhaps a lesson for us to be a bit more specific in what we want to extract. This is somewhat opposite to our current exploratory effort in just trying to learn how the currently implemented analysis signals works and what we can get from them.
* Look for deviations from a quantized value. For example pitch deviations within a semitone, and rhythmic deviations from a time grid.
* Semantic spaces. Extract semantic features from the signal, could be timbral descriptors, mood, directions etc. Which semantics? Where to take the terminology from? We should try to develop examples of useful semantics, useful things to extract.
* Semantic descriptors are not necessarily a single point in a multidimensional space, it is more like a blob, and area. Interpolation between these blobs may not be linear in all cases. We don’t have to use all implied dimensions, so we can actually just select features/semantics/descriptors that will give us the possibility of linear interpolation. At least in those situations where we need to interpolate…
* Look at old speech codexes. Open source. Exitation/resonator model, LPC. This is time domain, so will be really fast/low-latency.
* Cepstral techniques can also be used to separate resonator and exciter. The smoothed cepstrum being the resonator. Take the smoothed cepstrum and subtract it from the full cepstrum to get the excitation.
* The difficulty of control. The challenge to the performer, limiting the musical performance, inhibiting the natural ways of interaction. This is a recurring issue, and something we might want to take care to handle carefully. This is also really what the project is about: creating *new* ways of interaction.
* Performer will adapt to imperfections of the analysis. Normally, MIR signals are not “aware” that they are being analyzed. They are static and prerecorded. In our case, the performer, being aware of how the analysis method works and what it responds to, can adapt the playing to trigger the analysis method in highly controllable manners. This way the cross-adaptivity is not only technical related to the control of parameters, but adaptive in relation to how the performer shapes her phrases and in turn also what she selects to play.
* Measuring collective features. Features of the mix. Each instrument contributes equally to the modulator signal. One signal can push others down or single them out. Relates to game theory. What is the most favorable behavior over time: suppress others or negotiate and adapt.
Meeting with Joshua
* Josh mentioned a few researchers that we might be interested in. Brech de Man: Intelligen audio switcher. Emannuel Chourdakis: Feature based reverb. Dave Ronan: Groups, stems, automatic mixing. Vincent Verfaille: effect classification and A-DAFX. Brian Pardo: interfaces for music performance/production, visualization, semantics, machine learning. Pedro Pestana: sound engineer, best practices (phD), automatic mixing. Ryan Stables: SAFE plugins.
* Issues relating to publishing our work and getting an audience for it. As we could in some respect claim to create a new field, creating a community for it might be essential for further use of our research. Increase visibility. Promote also via QMUL press and NTNU info. Among connected fields are Human Computer Interaction, New Instruments for Musical Expression, Audio Engineering Society.
* QMUL has considerable experience in evaluation studies, user experience tests, listening tests etc. Some of this may be beneficial as a perspective on our otherwise experiental approach.
* Collective features (meaning individual signals in relation to each other and to the ensemble mix): Masking (spectral overlap). Onset times in relation to other instruments, lagging. Note durations, percussive/sustained etc.
* We currently use spectral crest, but crest also being useful in the time domain to do rhythmic analysis (rhythmic density, percussiveness, dynamic range). Will work better with loudness matching curve (dB)
* Time domain filterbank faster than FFT. Logarithmically spaced bands
* FFT of the time domain amp envelope
* Separate silence from noise. Automatic gain control. Automatic calibration of noise floor (use peak to average measure to estimate what is background noise and what is actual signal)
* Look for patterns: playing the same note, also pitch classes (octaves), also collectively (between instruments).
* Use log freq spectrum, and amps in dB, then do centroid/skew/flux etc
* How to collaborate with others under QMUL, adapting their plugins. Port the techniques to Csound or re-implement ours in C++? For the prototype and experimentation stage maybe modify their plugins to output just control signals? Describe clearly our framework so their code can be plugged in.