Rake ALPHA

A vector-first language for CPU SIMD

Early development. Rake is a research language. Syntax and semantics are still evolving. The examples below represent the current implementation.

Principles

Rake is named for its execution model: data rakes through tine patterns.

↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ (8 lanes of data enter) ════════════════════════════════════ | tine A |███░░░███░░░███░░░██████| (some lanes blocked) ════════════════════════════════════ ↓ ↓ ↓ (survivors continue) ═══════════════════════════════════ | tine B |░░░███░░░░░░███░░░░░░███| (different pattern) ═══════════════════════════════════ ↓ ↓ (fewer survivors) sweep: < < (collect results)

Each tine is a horizontal barrier—the teeth of a rake—that filters lanes. Data flows downward through tine declarations. Results are swept up at the end.

Core Thesis

Auto-vectorization fails on divergent code. Traditional if/else branches cannot map efficiently to SIMD because different lanes need different code paths. Rake inverts the model:

Tines declare masks — boolean vectors defining which lanes are active
Through blocks execute under masks — all lanes compute, mask selects results
Sweep collects results — lanes merge based on which tines they passed

This is SIMD semantics made explicit in the language.

Design Principles

Vectors are primitive — A rack is one SIMD register, not an array
Scalars are marked — <name> broadcasts, preventing accidental confusion
Control flow is predication — No branches; tines create masks, through applies them
Data layout is explicit — stack (SoA) vs single (scalars) is visible
Vocabulary matches semantics — tine, rake, through, sweep reinforce parallel thinking

Vocabulary

Term	Meaning
rack	Vector value (one per SIMD lane)
tine	Named mask declaration
through	Masked computation region
sweep	Collect results from tines
crunch	Pure function (all lanes same logic)
rake	Divergent function (lanes may differ)
stack	Structure-of-Arrays type
single	All-scalar configuration struct
pack	Collection of stack chunks
over	Iterate over pack in SIMD-width chunks

Syntax

Racks

A rack is a vector of values—one per SIMD lane. Arithmetic is lane-wise. Scalars use angle brackets to broadcast.

let positions : float rack = load_positions()
let new_pos = positions + velocities * <dt>

Stacks

A stack is a struct in Structure-of-Arrays format. Each field is a rack.

stack Ray {
  ox: float rack, oy: float rack, oz: float rack,
  dx: float rack, dy: float rack, dz: float rack
}

Crunches

A crunch is a pure function where all lanes execute identical logic. Always inlined.

crunch dot (ax, ay, az, bx, by, bz) -> d:
  let d = ax*bx + ay*by + az*bz
  d

Tines

Tines are named boolean masks that partition lanes. The # prefix evokes grid lines.

| #miss  := (disc < <0.0>)
| #maybe := (!#miss)

Rakes

A rake function handles divergent control flow with tines, through blocks, and sweep.

rake intersect (ray_ox, ray_oy, ray_oz, ray_dx, ray_dy, ray_dz,
               <sphere_cx>, <sphere_cy>, <sphere_cz>, <sphere_r>) -> t_result:

  let disc = b * b - <4.0> * a * c

  | #miss  := (disc < <0.0>)
  | #maybe := (!#miss)

  through #maybe:
    let sqrt_disc = sqrt(disc)
    let t = (- b - sqrt_disc) / (<2.0> * a)
    t
  -> t_value

  through #miss:
    < -1.0>
  -> miss_result

  sweep:
    | #miss  -> miss_result
    | #maybe -> t_value
  -> t_result

Packs & Over

Experimental. The over construct is still in flux.

A pack is a collection of stack data. over iterates in SIMD-width chunks.

run render_all (rays : Ray pack, <count> : int64) -> result:
  over rays, <count> |> ray:
    let t = intersect(ray.ox, ray.oy, ray.oz, ray.dx, ray.dy, ray.dz, ...)
    t

Install

Prerequisites: OCaml 5.0+ with dune, LLVM/MLIR 17+

$ git clone https://github.com/KaiStarkk/rake-lang
$ cd rake-lang && dune build
$ ./rake examples/demo.rk --emit-mlir