WebAudio DSP in Web Assembly from Rust
This article details an experiment running audio DSP in WASM, compiled from
Rust. The result of the exploration is a template repo for writing Rust crates
to perform DSP, compiling them to WASM, and some JS glue + utilities for calling
these WASM modules in Web Audio AudioWorkletProcessors. These processors are
packaged in a contained npm package.
On the Rust side, the goal was roughly to be able to define the main
AudioWorkletProcessor process function, so the JS side can just delegate this
to WASM (we roughly get there, with some tweaks that handle memory allocation
and pointers to buffers on the WASM linear memory).
The culmination of the experiment is this template repo, which the below details.
Rust -> WASM
At the time of writing (~Nov. 2024) wasm-pack JS bindings do not work in an
AudioWorklet context due to some JS APIs not being available there that
wasm-pack depends on (fetch, URL, etc…), and there is no compatible
build target.
So instead we use cargo with --target wasm32-unkown-unkown and need to
handle JS bindings ourselves (darn!).
Loading WASM in an AudioWorkletNode
To load the wasm module then, we can’t rely on the init or initSync utils
we would otherwise get from wasm-pack. Instead, we fetch the wasm module
in our main JS thread, decoded it as an array buffer (respons.arrayBuffer()),
and then pass it over to the AudioWorklet context via
AudioWorkletNode.port.postMessage (eg. here).
const response = await fetch(pathToWasm);
const wasmBytres = await response.arrayBuffer();
// ...
myAudioWorkletNode.port.postMessage({type: 'init-wasm', wasmBytes});
On the receiving side, our AudioWorkletNode needs to handle this post message
and await WebAssembly.instantiate(message.wasmArrayBuffer) to load the wasm
module (e.g. here).
The postmessage handler might look something like:
async initWasm(data) {
this._wasm = (await WebAssembly.instantiate(
data.wasmBytes,
)).instance.exports;
// We'll get to this part later...
this.alloc_memory();
// Tell main JS thread the wasm module loaded
this.port.postMessage({ type: "init-wasm-complete" });
}
Handling Memory
Our WASM memory needs to size appropriately for the amount of audio channels and parameters our AudioWorklet supports.
E.g. if we’re creating a mono effects plugin we likely need (roughly):
+ 128 * 4-byte floats (=1 audio block) allocated for the input buffer to our `AudioWorkletProcessor`'s `process`
function
+ 128 * 4-byte floats allocated for the output buffer
+ 128 * 4-byte floats * Number of Audio-Rate Params
+ Number of K-Rate Params * 4-bytes
(WebAudio block size is currently hard-configured to 128, though this may change or be left to clients to determine based on available hardware in the future)
How I approached this:
Allocating Memory From Rust
Created a publicly exposed alloc method in Rust to allocate memory (with
thanks/credit to the-drunk-coder/wasm-loop-player):
#[no_mangle]
pub extern "C" fn alloc(size: usize) -> *mut f32 {
// initialize a vec32
let vec: Vec<f32> = vec![0.0; size];
// convert heap-allocated array to just the pointer of the beginning of that
// array on the heap.
Box::into_raw(
// convert vec 32 to a heap-allocated array of f32 values
vec.into_boxed_slice(),
) as *mut f32
}
Coupling a WASM Memory Pointer w/ a Float32Array
I created a tiny JS class that runs in the AudioWorklet context for allocating
buffers (Float32Arrays) that both JS and Rust can read/write to.
This is basically just a class to (intentionally) tightly couple 2 things together:
this.ptrwhich represents a pointer to a location in the WASM module’s linear memory model where the array buffer starts- the
Float32Arraywhich contains the values (and also the length of the buffer)class WasmBuffer { constructor(size, wasm) { this.ptr = wasm.alloc(size); this.buffer = new Float32Array( wasm.memory.buffer, this.ptr, size ); } }
Growing WASM Memory Using a Collection Helper Class
I created one more helper class called WasmMemory to behave as a collection of
WasmBuffers that handles growing/shrinking the wasm memory’s size as needed.
In brief, it exposes one main public method: alloc, which allows allocating a
new buffer on a wasm modules linear-memory, and re-grows the
memory as needed (see below)
Why is this needed? If we allocate a new buffer, and the size of that buffer exceeds the prior size of the WebAssembly.Memory for our WASM module, all previously allocated buffers will become read-only. This will of course be an issue when we need to write to these buffers.
Thus, we need to sense when memory has grown as a result of allocating and re-allocate all previously allocated buffers.
(I know, yuck, but this should be pretty infrequent and we can probably make educated bets about how to right-size memory to begin with based on our DSP patch).
The full class can be found is here but in brief:
class WasmMemory {
constructor(wasm) {
this.wasm = wasm
this.buffers = {}; // {[k:string]: WasmBuffer}
}
...
alloc(name, size) {
const beforeSize = this.getWasmBufferLength();
// If wasm memory grows, we need to re-create old views on the memory.
// We need to store the size of that memory before grow happens because
// if a grow does occur, these buffers will have 0 length.
const bufferLengths = this.getBufferLengths();
this.buffers[name] = new WasmBuffer(size, this.wasm);
const afterSize = this.getWasmBufferLength();
if (beforeSize !== afterSize) {
// When we grow all prior buffers will be transfered and readonly.
// So we need to re-create views on the regrown wasm memory.
// (grow sparingly!)
const rebuildBufferNames = Object
.keys(this.buffers)
.filter(bufferName => name !== bufferName)
rebuildBufferNames
.forEach(bufferName => {
this.buffers[bufferName] = new WasmBuffer(
bufferLengths[bufferName],
this.wasm,
)
});
}
}
}
Invoking Rust/WASM DSP from AudioWorklet.process(inputs, outputs, params)
This distills to 3 things in the simplest case:
- Set the values of the
Float32Arrayof the input buffer (assuming mono) to the values of theinputsparameter- (again, we should also have a reference to the pointer that
alloc(from Rust) gave us, referencing the start location of this Float32Array)
- (again, we should also have a reference to the pointer that
- Call the
wasmmodule’s DSP function, passing the pointer to the inputFloat32Array, and the pointer to our outputFloat32Array- (this should probably set some values in the output array)
- Set the values of the
outputparameter to those of the outputFloat32Array
In pseudo code (or concrete implementation here)
process(inputs, outputs, params){
// (assuming this.inBuffer and this.outBuffer are WasmBuffer instances from
// above)
this.inBuffer.buffer.set(inputs[0][0])
this.wasm.process(this.inBuffer.ptr, this.outBuffer.ptr)
outputs[0][0].set(this.outBuffer.buffer)
return true
}
(this speaks to the motivation for tightly-coupling a Float32Array with the
wasm pointer in the WasmBuffer class above - anywhere we’re passing a buffer
of values to wasm we need both the Float32Array and a pointer to where it
starts in the WASM linear memory model)
Structuring the Repo
npm package
I wanted the root of this repo to be an installable npm package that exports
a AudioWorkletNode and handles the messy bits regarding fetching public .wasm
files and passing them to the AudioWorklet context.
There are at least 2 static assets we need to include as part of this npm
package:
- all
.wasmfiles - our
worklets.jsfile containing the AudioWorkletProcesor sub-class(es)
I keep both of these in a folder called public and then configure vite to
ensure these are copied over. vite.config.js:
export default defineConfig({
...
build: {
...
copyPublicDir: true,
assetsDir: "public",
},
});
The details of the public exports of this npm package can be seen in full here
but the crux of it is basically that a new AudioWorkletNode that runs DSP in
wasm should only require specifying a few things, which I’ve abstracted in a
(terribly-java-named) class called WorkletModuleFactory
For example:
const BitCrusher = new WorkletModuleFactory(
// The path to `public/worklets.js` containing AudioWorkletProcessor JS
// subclass
new URL('./worklets.js', import.meta.url).href,
// The name of the processor (ie. the same as where you call
// registerProcessor('some name',...) in your worklet.js file
"BitCrush",
// The path to the wasm module (compiled from rust)
new URL('./bit_crusher.wasm', import.meta.url).href,
// A timeout, how long to wait for the wasm module to load before erroring
// out.
5000,
...
)
The Rust Crate(s)
I chose to structure this as 1 crate (and thus 1 wasm) per
AudioWorkletProcessor.
./crates/cargo.toml aggregates a few sub-crates eg:
[workspace]
members = [
"bit_crusher", "wave_shaper", "common",
]
Each sub member imports common utilities from the common crate (Eg. such as
the alloc function which is always needed from JS)
eg: `./crates/bit_crusher/cargo.toml:
[package]
name = "bit_crusher"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["cdylib"]
[dependencies]
common = { path = "../common" }
Building & installing these .wasm files then just involves:
# build
cd crates && cargo build --target wasm32-unknown-unknown --release
# install
cp crates/target/wasm32-unknown-unknown/release/*.wasm public/
Attempts at Profiling
Part of the motivation for moving the bulk of the work done in
AudioWorkletProcessor.process to wasm was to see if we’d achieve
performance gains by doing so.
Profiling this was a bit tricky, a few things were considered/attempted ( although to be honest, this wasn’t terribly well conceived).
Precise Profiling
Ideally we could author the equivalent DSP algorithm in JS and Rust/wasm and
use the Performance API to determine which implementation was faster in an
AudioWorkletProcessor context.
Unfortunately the Performance API is not available in the
AudioWorkletProcessor.
new Date() APIs are also not fine-grained enough for comparing performance of
processing an audio block. If our audio sample rate is 41kHz, and our dsp is
just keeping up with this (without audio drop outs), we’re processing
41000/128=~320 audio blocks per second. So each block takes less than or equal
to 1/320s = ~3.125ms. In practice, these are often sub 1ms but the Date API
only goes up to the millisecond (at the very least, profiling in ms would be
pretty inaccurate).
Stress-test Profiling
Instead of precisely profiling the speed of processing 1 audio block, we can
instead see how the browser performs if we try to run a ton of
AudioWorkletProcessors at once and compare whether JS or WASM implementations
differ.
I attempted this with a DSP/process method that simply plays back a loaded
sound buffer (e.g. a drum loop .wav).
After attempting to run about 1000 instances of the same wasm processor
concurrently I started seeing errors:
WebAssembly.Memory failed to reserve a large virtual memory region. This may be due to low configured virtual memory limits on this system.
This isn’t all that suprising: for each of the 1000 nodes, I was allocating:
- 2 buffers the size of an audio block for input/output
- the full audio buffer we’re playing back (~4MB in my case)
Empirically, it seems the JS implementation started to perform better than my wasm implementation for simple audio file playback at ~900 concurrent nodes (wasm started to have audio dropouts, JS implementation seemed more stable).
This isn’t all that interesting or surprising; in the wasm implementation we
have extra overhead copying inputs to a Float32Array that is accessible to
our wasm instance, and then copying the wasm’s outputs back over to
JS-land.
Alas this wasn’t a terribly well-conceived test and my interest in going deeper dropped off.
Startup Time
Fetching the .wasm module, initializing memory, and instantiating
a WebAssembly module clearly incurs some overhead that we wouldn’t have if just
working in JS.
I didn’t profile this because its pretty minimal and manageable. In my case
anyway, my .wasm files were pretty small so startup has no perceivable delay.
We should take care to cache .wasm bytes (the ArrayBuffers we decode after
fetching a .wasm file) so we don’t re-fetch and re-decode each time we
instantiate a new Node.
This is trivial to do by keeping a reference to each Promise we create for
each .wasm we try to load and decode (see here).
