A vector-first language for SIMD and SPIR-V with divergent control flow.
Rake makes SIMD programming explicit. Instead of hoping the compiler auto-vectorizes your loops, you write code where every value is inherently vectorized. The result: clean, readable code that compiles to optimal SIMD instructions.
In Rake, a rack is a vector of values across SIMD lanes. When you write
float rack, you get 8 floats (AVX) or 16 (AVX-512) that operate in parallel.
~~ A rack holds one value per SIMD lane let positions : float rack = load_positions() let velocities : float rack = load_velocities() ~~ Arithmetic operates on all lanes simultaneously let new_positions = positions + velocities * dt
Scalars are uniform values broadcast to all lanes. The angle bracket
syntax <dt> makes it visually clear which values are scalars vs racks.
~~ Scalars use angle brackets let result = positions + <gravity> * <dt> ~~ Field access works too let radius = <sphere.r> ~~ Negative literals need a space: < -1.0> let miss_t = < -1.0>
A stack is a struct laid out in SoA (Structure of Arrays) format. Each field is a rack, giving cache-friendly memory access patterns.
~~ Define a stack type stack Particle { x : float rack, y : float rack, z : float rack, vx : float rack, vy : float rack, vz : float rack } ~~ Access fields naturally let speed = sqrt(p.vx*p.vx + p.vy*p.vy + p.vz*p.vz)
A crunch is a pure function that operates on racks. Crunches are always inlined, producing zero function call overhead in the generated code.
~~ A crunch computes dot product across lanes crunch dot ax ay az bx by bz -> d: d <- ax*bx + ay*by + az*bz ~~ Zero overhead when called let dist_sq = dot(dx, dy, dz, dx, dy, dz)
When different SIMD lanes need different behavior, you define tines. A tine is a named boolean mask that partitions lanes based on a predicate.
~~ Tines use # prefix (like grid lines / SIMD lanes) | #miss := (disc < <0.0>) | #hit := (!#miss) ~~ Tines can compose | #close := (#hit && t < <max_dist>)
A through block executes computation only for lanes where the tine is true (i.e., where the rack passes through the tine). Other lanes receive an identity value. No branches—just masked SIMD operations.
~~ Compute sqrt only for lanes where disc >= 0 through #hit: let sqrt_disc = sqrt(disc) let t = (- b - sqrt_disc) / (<2.0> * a) t -> t_value ~~ Miss lanes get -1 through #miss: < -1.0> -> miss_value
Sweep combines results from different tines using masked selection. Each lane picks the result from whichever tine matched.
sweep: | #miss -> miss_value | #hit -> t_value -> final_t
A rake function combines tines, through blocks, and sweep for complete divergent control flow. Here's ray-sphere intersection:
rake intersect ray_ox ray_oy ray_oz ray_dx ray_dy ray_dz <sphere_cx> <sphere_cy> <sphere_cz> <sphere_r> -> t_result: let disc = b * b - <4.0> * a * c | #miss := (disc < <0.0>) | #hit := (!#miss) through #hit: let sqrt_disc = sqrt(disc) (- b - sqrt_disc) / (<2.0> * a) -> t_value through #miss: < -1.0> -> miss_value sweep: | #miss -> miss_value | #hit -> t_value -> t_result
Ray-sphere intersection benchmark, 8 million rays with 50% hit/miss divergence:
Anti-hype disclaimer: this is not magic. It's the power of exclusively targeting vector and GPU dialects of MLIR from day one. The backend engineers and architecture designers are the experts, rake just provides an ergonomic frontend which is much easier to do when you're not trying to embed it inside the grammar of an entirely different paradigm. Rake generates clean AVX2 assembly with inlined functions, optimal mask handling, and no unnecessary memory traffic. And things will only get better as OCaml MLIR support improves.
Prerequisites: OCaml 5.0+ with dune, LLVM 17+ (mlir-opt, mlir-translate, llc)
VS Code extension for syntax highlighting:
SAFE HARBOUR: Rake is still very much in research and development. What you see here is essentially
the entire content of the lexer / parser. Testing is currently through an FFI harness while I work
on finalizing the specification and competing feature sets to enable Rake to support application
programs, as opposed to just CPU kernels.
Rake 0.2.0 is an alpha release. The compiler pipeline works:
.rk → Parser → Type Checker → MLIR → LLVM IR → Native Code.
Parsing, type inference, MLIR codegen, AVX2 output
AVX-512, standard library, multi-file projects