Rhythm analysis, part 1

Many of the currently used methods for rhythmic analysis, e.g. in MIR, makes assumptions about musical style, and for this reason does particularly well in analysing music within certain geographical and/or cultural origins. For our purposes we’d rather try to find analysis methods of a more generic nature. We want our analysis methods to be adaptable to many different musical situations. Also, since we assume the performance can be of an improvised nature, we do not know what is going to be played before it is actually performed. Finally, the audio stream to be analyzed is a realtime stream and this poses certain restrictions as we can not for example do multiple passes iterating over the whole song.

A goal at this point has been to try to find methods of rhythmical analysis that work without assumptions about pulse and meter. One such methods could be based just on the immediate rhythmic ratios between adjacent time intervals. Our basic assumption is then that rhythm consists of time intervals, and the relationship between these intervals. As such we can describe rhythm as time ratio constellations, patterns of time ratios. Moreover, simple rhythms are created from simple ratios, more complex rhythms from more irrational and irrregular combinations of time intervals. This could give rise to a measurement of some kind of rhythmic complexity, although rhythmic complexity may be made up of complexity in many different dimensions, so we need to come back to what we actually will use as a term for the output of our analyses. Then, sticking to complexity for the time being, how do we measure the complexity of the time ratios?
One way of looking at the time ratios (also grouping them into cathegories) is to represent them as the closest rational approximation. Using the Farey sequence , we can find the closest rational approximation with a given highest denominator. For example, a ratio of 0.6/1 will be approximated by 1/2 (1/2 = 0.5) , 2/3 (= 0.667), or 3/5 (= 0.6 exactly) depending on how high we allow the denominator to go. This way, we can decide how finely spaced we want our rhythm analysis grid to be. In the previous example, if we decided not to go higher than 3 for the denominator, we would only roughly approximate the actual observed time ratio but in return always get relatively simple fractions to work with. Deciding on an appropriate grid can be difficult, since the allowed deviation by human musical perception will often be higher even than the difference between relatively simple fractions (for example in the case of an extreme ritardando or other expressive timing). The perceived rhythm also being dictated by musical context. As the definition (or assertion) of musical context always will make assumptions about musical style, our current rhythmic analysis method does not take a larger musical context into account. We will however make a small local rhythmic context out of groupings of time ratios that follow each other. The current implementation includes 3 time ratios, so it effectively considers rhythmic patterns of up to 4 events as a rhythmic motif to be analyzed. This also allows it to respond quickly to changes in the performed rhythm, as it will only take 2 events of a new rhythmic pattern to generate a significant change in the output. This relative simplicity may also help us in narrowing down the available choices to making up a rhythmic grid for the approximations. If we regard only separate time ratios between neighbouring events, there are some ratios that will be more likely than others. Halving and doubling of tempo, tripling etc obviously will happen a lot, and more so, they will happen a lot more that what you’d expect. Say for example the rhythmic pattern of steady quarter notes followed by some steady 8th notes, then back to quarter notes:

rhythmn_484

Here, we will observe the ratio of 1/1 between the quarter notes, then the ratio 2/1 when we change into 8th notes, and then 1/1 for as long as we play 8th notes, and a ratio of 1/2 when we move back to quarter notes. Similarly for triplets, except we’d go 1/1, 3/1, 1/1, 1/3, 1/1.
Certain common rhythmic patterns (like a dotted 8th followed by a 16th) may create 3/4 and 1/3 ratios.

rhythmn_4_dot8_16_4

or with triplets, 2/3 and 1/2 ratios:

rhythmn_4_tripl4_tripl8_4

See, all these ratios express the most recent time interval as a ratio to the next most recent one. More complex relationships are also of course observed quite commonly. We have tried to describe ratios assumed to be more frequently used. The list of ratios at the time of writing is [1/16, 1/8, 1/7, 1/6, 1/5, 1/4, 1/3, 1/2, 3/5, 2/3, 3/4, 1/1] for ratios below 1.0, and then further [5/4, 4/3, 3/2, 5/3, 7/4, 2/1, 3/1, 4/1, 5/1, 6,/1, 7/1, 8/1, 12/1, 16/1].  The selection of time ratios to be included can obviously be the focus of a more in-depth study. Such a study could analyse a large body of notated/composed/performed patterns in a wide variety of styles to find common ratios not included in our initial assumptions (and as an excercise to the reader: find rhythms not covered by these ratios and send them to us). Analysis of folk music and other oral musical traditions, radical interpretations and contemporary music will probably reveal time ratios currently not taken into account by our analysis. However, an occational missed ratio will perhaps not be a disaster to the outputof the algorithm. The initial template of ratios let us get started, and we expect that the practical use of the algorithm in the context of our cross-adaptive purposes will be the best test for its utility and applicability. The current implementation use a lookup table for the allowed ratios. This was done to make it easier to selectively allow some time ratios while excluding others of the same denominator, for example allowing 1/16 but not 15/16. Again, it migh be a false assumption that the music will not contain the 15/16 ratio, but as of now we assume it is then more likely that the intended ratio was 16/16 boiling down to 1/1).

Rhythmic consonance

So what can we get out of analysing the time ratios expressed as simple fractions? A basic idea is that simpler ratios represent simpler relationships. But how do we range them? Is 3/4 simpler than 2/3? Is 5/3 simpler than 2/7? If we assume rhythmic ratios to be in some way related to harmonic ratios, we can find theories within the field of just intonation and microtonality, where the Benedetti height (the product of the numerator and denominator) or Tenney height (log of the Benedetti height) is used as a measure of inharmonicity. For our purpose we multiply the normalized Tenney height (with a small offset to avoid zeros for log(0)), of the three latest time intervals, as this will also help short term repetitions to show up as low inharmonicity. So if we take the inverse of inharmonicity as a measure of consonance, we can in the context of this algorithm invent the term “rhythmic consonance” and use it to describe the complexity of the time ratio.

Rhythmic irregularity

Another measure of rhythmic complexity might be the plain irregularity of the time intervals. So for example if all intervals have a 1/1 ratio, the irregularity is low because the rhythm is completely regular. Gradual deviations from the steady pulse give gradually higher irregularity. For example a whole note followed by a 16th note (16/1) is quite far from the 1/1 ratio so this yields a high irregularity measure. As a means of filtering out noise (for example due to single misplaced events), we take the highest two out of the last three irregularity measurements, then we multiply them with each other. In addition to acting like a filter, it also provides aa little more context than just measuring single events. Finally the resulting value is lowpass filtered.

Deviation

In our quantization of time intervals into fractions we may sometimes have misinterpreted the actual intended rhythm, as described above. Although we have no measure of how far the analyzed fraction is from the musically intended rhythm, we can measure how far the observed time interval is from the quantized one. We can call this rhythm ratio deviation , and it is expressed as a fraction of the interval between possible quantizations. For example if our quantization grid consist of 1/4, 1/3, 1/2, 2/3, 3/4 and 1/1 (just for the case of the example), and we observe a time ratio of 0.3, this will be expressed as 1/3 in the rhythmic ratio analysis since that is the ratio it is closest to. To express the deviation between the observed value and the quantized value in a practically usable manner, we need to scale the numeric deviation by the interval within which it can deviate before being interpreted as some other fraction.  Lets number the fractions in our grid as , that is, the first fraction is ,the second is , and so on. We call the actual observed time ratio and the quantized time ratio . The deviation can then be expressed as

in our case. If the observed ratio was rounded down when quantizing, the formula would use in place of :

The deviation measure is not reasonably used for anything musical as a control signal, but it can give an indication of the quality of our analysis.

The GUI

rhythm_ratios1

For the display of the rhythm ratio analysis, we write each value to a cyclic graph. This way one can get an impression of the recent history of values. There is a marker (green) showing the current write point, which wraps around when it reach the end. Rhythm consonance values are plotted in light blue/grey, rhythm irrregularity is plotted in red. The deviation and the most recent rhythm ratios are not plotted but just shown as number boxes updated on each rhythmic event.

Next up

Next post on rhythmic analysis will be looking at patterns on a somewhat longer time scale (a few seconds). For that, we’ll use autocorrelation to find periodicities in the envelope.

Mixing with Gary

During our week in London we had some sessions with Gary Bromham, first at the Academy of Contemporary Music in Guildford on the June 7th , then at QMUL later in the week. We wanted to experiment with cross-adpative techniques in a traditional mixing session. Using our tools/plugins within a Logic session to work similar to the traditional sidechaining, but with the expanded palette of analysis and modulator mappings enabled by our tools developed in the project. Initially we tried to set this up with Logic as the DAW. It kind of works, but seems utterly unreliable. Logic would not respond to learned controller mappings after we close the session and reopen it. It does receive the MIDI controller signal (and can re-learn) but in all cases refuse to respond to the received automation control. In the end we abandoned Logic altogether and went for our safe always-does-the-job Reaper.

As the test session for our experiments we used Sheryl Crow “Soak up”, using stems for the backing tracks and the vocals.


Example 1: Vocal pitch to reverb send and reverb decay time.


Example 2: Vocal pitch as above. Adding vocal flux to hi cut frequency for the rest of the band. Rhythmic analysis (transient density) of the backing track controls a peaking EQ sweep on the vocals, creating a sweeping effect somewhat like a phaser. This is all somewhat odd all together, but useful as an controlled experiment in polyphonic crossadaptive modulation. The

* First thing Gary ask is to process one track according to the energy in a specific frequency band in another. For example “if I remove 150Hz on the bass drum, I want it to be added to the bass guitar”.  Now, it is not so easy to analyze what is missing, but easier to analyze what is there. So we thought of another thing to try; Sibliants (e.g. S’es) on the Vocals can be problematic when sent to reverb or delay effects. Since we don’t have a multiband envelope follower (yet), we tried to analyze for spectral flux or crest, then use that control signal to duck the reverb send for the vocals.

* We had some latency problems, relating to pitch tracking of vocals, the modulator signal arriving a bit late to precisely control the reverb size for the vocals. The tracking is ok, but the effect responds *after* the high pitch. This was solved by delaying the vocal *after* it was sent to the analyzer, then also delaying the direct vocal signal and the rest of the mix accordingly.

* Trond idea for later: Use vocal amp to control bitcrush mix on drums (and other programmed tracks)

* Trond idea for later: Use vocal transient density to control delay setting (delay time, … or delay mix)

* Bouncing the mix: Bouncing does not work, as we need the external modulation processing (analyzer and MIDIator) to also be active. Logic seems to disable the “external effects” (like Reaper here running via Jack, like an outboard effect in a traditional setting) when bouncing.

* Something good: Pitch controlled reverb send works quite well musically, and is something one would not be able to do without the crossadaptive modulation techniques. Well, it is actually just adaptive here (vocals controlling vocals, not vocals controlling something else).

* Notable: do not try to fix (old) problems, but try to be creative and find new applications/routings/mappings. For example the initial ideas from Gary was related to common problems in a mixing situation, problems that one can already fix (with de-essers or similar)

* Trond: It is unfamiliar in a post production setting to hear the room size change, as one is used to static effects in the mix.

* It would be convenient if we could modulate the filtering of a control signal depending on analyzed features too. For example changing the rise time for pitch depending on amplitude.

* It would also be convenient to have the filter times as sync’ed values (e.g. 16 th ) relative to the master tempo

FIX:

– Add multiband rms analysis.

– check roundtrip latency of the analyzer-modulator, so the time it takes from an audio signal is sent until the modulator signal comes back.

– add modulation targets (e.g. rise time). This most probably just works, but we need to open the midi feed back into Reaper.

– add sync to the filter times. Cabbage reads bpm from host, so this should also be relatively straightforward.