Hi there!
So I started using Framelib a bit more seriously, and I think now I can grasp the basics. I am really into sampling, and working with large corpora of sounds with granular synths. I have always used poly~
for these things, and enjoyed the @parallel 1
mode with multithreading that to me seems to always give way more headroom than what’s normally possible in a single patch.
So when I started diving into Framelib, and realized I can easily launch hundreds, or thousands of grains per sec, reliably and accurately, and without too much boilerplate, I caught the scent of blood.
But I quickly realized that - on Windows at least - the multithread
method of fl.contextcontrol~
was not as rewarding as on a Mac system, where it seems to almost always help. (On Windows it very often just adds a 1-5% CPU bump with scary spikes every now and then.) Without the additional headroom I could get from multithreading, I am still ultimately better off with poly~
, but the scent of blood already clouded my judgement, so I wanted to see if I can somehow squeeze more multithreading out of Framelib on Windows.
I then discovered that the multithreading starts to really work its magic when I am using multistream networks. I was still not fully convinced to abandon my poly~
-workflow though.
So I set out to make some simple tests (not really benchmarks, but something like that) to see what gives me the best “yield”, and this post is sharing the results with patches.
The task
What I am ultimately after is to be able to play a large corpora of sounds that reside in a polybuffer~
. Since @a.harker added the wonderful fl.makestring~
, this became possible in Frameland. So the test will be to load a folder of 4895 sound files into a polybuffer~
and see in what configuration can I squeeze out the highest number of concurrent grains without clogging my CPU (or reaching its ceiling). My screen-recording will distort readings a bit, but I’ll in each case run the patch for a while before recording so you see the real CPU load (before recording).
Environment
As I mentioned I am on Windows 10, “rocking” a 6-core/12-thread Intel Core i7-10750H CPU @2.60GHz.
Attempt #1 - single stream, single patch, no tricks
Here is the first “baseline” attempt:
…and the patch: fl_multicore_single.maxpat (22.7 KB)
It basically gets a random buffer at each trigger frame and plays it through its full length. The sound files range from around 500ms to 3-4s, and rarely 7-8s in length.
I can push it to around 500 Hz, where I start to lick my CPU ceiling from below (the screen recording makes it a bit worse, but just observe the graphs before the huge spike in the right side). I get regular 100% spikes, and the 1-second mean CPU is hovering around 59-60 (pre-screencast).
I forgot to record that, but in this scenario (and again, on Windows) multithreading makes no perceivable difference.
Attempt #2 - multithreading with poly~
So now I am curious if I can improve this by simply wrapping it into a poly~
driving it with a set of phasor~
s phased at equal “distance” from each other. Since my CPU has 12 threads, I give the poly~
12 voices, and create a 12-channel mc.phasor~
to provide the ticks.
Inside the poly~ patch:
main patch: fl_multicore_w_poly.maxpat (26.7 KB)
poly~ patch: p.fl_polybuf_player.maxpat (9.1 KB)
This time, with the same 500Hz, I seem to get lower CPU measures (mean is hovering around 42-43%, pre-screencast), and almost no 100% spikes ever.
That’s a significant improvement. Again, enabling multithreading in Frameland does not seem to change much, but I would guess the idea anyway does not make much sense at this point(?).
Unfortunately, it does not scale too well, and I can’t really get over 700Hz safely. I also noticed that the load on the different threads is not evenly distributed, or maybe hyperthreading is ignored my poly~-s threading, no clue.
Attempt #3 - multithreading with multistream networks
This attempt follows my hunch that multithreading Ă la framelib - on Windows at least - favors multistream networks. So what if I would create a 12-stream network, and distribute the ticks with fl.chain~
? Something like this:
as patch: fl_multicore_w_streams.maxpat (35.5 KB)
The idea here is that I have an fl.interval~
with 1/12th of the interval I want at a given time, and subdivide that interval into 12 parts. Each part goes into its own stream, and at the sink we separate them.
In this attempt we let Framelib do the multithreading for us.
The results are impressive. My (pre-screencast) mean CPU is around 30-31% with 500 Hz grain density. And look at the super-equal nice distribution over all cores.
I can verify that the multithreading does magic here: if I switch it off, the CPU (in the patch) just runs to 100% permanently, and audio glitches and sweats.
With this version I can even go safely to around 1300-1400 Hz without hitting the ceiling with the occasional spikes (mean CPU is still around 60% there). This frequency is far from being possible with the previous two attempts.
Wrapping it into an abstraction
So now I can verify, that at least on Windows, multistreaming is the key to squeeze out the most from a framelib network (of this kind, I guess there are many scenarios I don’t consider here). So why not make it into an abstraction? I call it fl.multiinterval~
and at the moment it doesn’t have too many options, it always expects intervals in Hz, it ignores any /option, and only considers one positional argument which is the number of streams to generate:
patch: fl.multiinterval~.maxpat (17.0 KB)
Sadly, Max does not recognize the #1
after the =
sign, hence the “LOL” subpatch:
Nevertheless, it seems to work just fine:
patch: fl_multicore_w_streams_w_abs.maxpat (26.6 KB)
everything in a zip: fl_multicore.zip (18.0 KB)
Happy framing!