context

This post documents a design correction I made while building Penny, a browser-based work-stealing runtime, after better understanding JavaScript’s execution model.

landscape judgement

The scheduler.postTask() api, allows tasks to be scheduled with priority to the event loop which is a time sliced scheduler. This would help reduce jitters or blocking on the main thread, but the time to completion of the task might go up. The objective of penny is to utilize hardware threads to compute faster, reliably and smoothly. Same constraints for scheduler.yield(). These help resolve main thread responsiveness, they’re not built for worker cpu scheduling.

issues with Penny

Based on the previous memory model, I thought having a task registry, that holds the entire closure, and just passing IDs was enough, and it helped me get a minor state where work was being executed between workers with the API, but it was in no way work stealing. Because work stealing in my head would only work well if units of work are bounded , because we don’t want one long running task completely monopolizing a thread, which based on newfound knowledge of the execution model of JS is not supported by any primitive, at least not in the multithreaded sense. we have a primitive in the form of generator function like

function\* gen(){
console.log(first point);
yield;
console.log(second point);
}
const g = gen();
g.next(); // "first point"
g.next(); // "second point"

and it does indeed give me controllable execution, and it holds execution state within the object, but this is within an agent’s context.

but “agents” holds their own stack, heap and event loop (and realms, but it’s a relation of 1 agent = many realms, not vice versa), which means I can’t move a generator across workers

So, is that it? no more preemption? I have to wait for wasm fiber proposal to implement it? well, not exactly. The generator function gave me an idea, which upon research is a pretty popular idea in the state machine / compiler world.

So, most compute heavy workloads are iteration based, and iterations are essentially state machines. Now, if we can mix the concept of yield, and expressing iterations as state: i.e -> exampleTask {i: number; maxIter: number; state1: somePODType; state2: AnotherPODType} we could store this in a SAB, and pass this around instead. For embarrassingly parallelizable tasks, this helps quite a bit, since the iteration can be for

for (let k = 0; k < ChunkSize; k++) {
  someStateTransformerOrOutput(i);
  i++;
}
if (i < N) "yielded"; // back to scheduler
return markAsDone();

it gets a bit more complicated with tasks whose iteration[x] depends on the result of iteration[x-1], but I just have to guarantee there are no 2 instances of a task running at the same time at that case, but this approach seems promising, since it gives me (albeit explicit) preemption points to manage priority inversion which already established patterns. It’s not “preemption” in the OS sense, but should be enough to build a usable work-stealing runtime inside browsers

The thing I have to be a bit careful about is how I’m going to manage my memory model, now. My current iteration holds:

//
//
//                 ┌────────────┼────────────────────────────────────────────────────────────────────┐
//                 │            │                                                                    │
//                 │   Thread 1 │                                                                    │
//                 │┌───┬───┬──┐│               Per Thread Deque                                     │
//                 ││Tsk│Tsk│  ││                                                                    │
//                 │└───┴───┴──┘│                                                                    │
//                 │            │                                                                    │
//                 ├────────────┼───────────┬──────────┬───────────┬─────────┬───────────────────────┤
//                 │            │  Task 2   │          │           │         │                       │
//                 │            │   Args    │          │           │         │                       │
//                 │   Task     │   Iter    │  ────────┼───────────┼─────────┼─►   Task N            │
//                 │    1       │   Max     │          │           │         │                       │
//                 │            │   Owner?  │          │           │         │                       │
//                 │            │MarkForKill?          │           │         │                       │
//                 │            │           │          │           │         │                       │
//                 └────────────┴───────────┴──────────┴───────────┴─────────┴───────────────────────┘
//
//

which I have to clean up a bit.

but the main thing being, that the registry of tasks(by id) is stored in the sab along with the per thread dequeue. This has it’s tradeoffs:

PROS

Fixed layout
easy to visualize and debug

CONS

Fixed layout (ironic?)
memory growth of O(n) per active task that needs to be completed. Functions are not pointers anymore. If someone wants to execute the same task 100 times, there would be 100 copies of the same task in 100 slots.
args will be limited, due to fixed size layouts

But, fixing it to an upper limit (200 for now) and apply backpressure by having a main thread queue, and the benefit of having preemption is a win in my opinion

Also, the yield points need to be inserted exactly at the boundary of each iteration. Due to the nature of this approach and due to lack of stackful co-routines, yielding in the middle of an iteration execution is just not possible, there would be no guarantee of the accuracy of computation at that point. I just don’t know how to encode those rules, or what the rules would be yet, that’s why I’m manually writing the tasks for the MVP, but I imagine having syntatic sugar that lowers into the same (or similar) task layout I’m writing would be doable once I learn more about it

why not just use Async/Microtasks?

I did consider this approach and after reading this excellent post: tasks-microtasks-queues-and-schedules, it doesn’t really help compute heavy workloads. It’s optimized to handle promise callbacks, and don’t give me preemption. continuations only run after the current call stack completes, i.e: Run until complete policy mention -> (the JS execution model has this design by choice). So, this won’t help me deal with priority inversion

next steps

Implement fixed-size global task pool in SAB
Represent tasks as explicit state machines
Integrate single-owner execution + yielding
Validate work stealing with a simple loop workload