Rake

A vector-first language for SIMD and SPIR-V with divergent control flow.

Rake makes SIMD programming explicit. Instead of hoping the compiler auto-vectorizes your loops, you write code where every value is inherently vectorized. The result: clean, readable code that compiles to optimal SIMD instructions.

Racks: Native Vector Types

In Rake, a rack is a vector of values across SIMD lanes. When you write float rack, you get 8 floats (AVX) or 16 (AVX-512) that operate in parallel.

~~ A rack holds one value per SIMD lane
let positions : float rack = load_positions()
let velocities : float rack = load_velocities()

~~ Arithmetic operates on all lanes simultaneously
let new_positions = positions + velocities * dt

Scalars: Broadcast Values

Scalars are uniform values broadcast to all lanes. The angle bracket syntax <dt> makes it visually clear which values are scalars vs racks.

~~ Scalars use angle brackets
let result = positions + <gravity> * <dt>

~~ Field access works too
let radius = <sphere.r>

~~ Negative literals need a space: < -1.0>
let miss_t = < -1.0>

Stacks: Structure of Arrays

A stack is a struct laid out in SoA (Structure of Arrays) format. Each field is a rack, giving cache-friendly memory access patterns.

~~ Define a stack type
stack Particle {
  x : float rack,
  y : float rack,
  z : float rack,
  vx : float rack,
  vy : float rack,
  vz : float rack
}

~~ Access fields naturally
let speed = sqrt(p.vx*p.vx + p.vy*p.vy + p.vz*p.vz)

Crunch: Pure SIMD Computation

A crunch is a pure function that operates on racks. Crunches are always inlined, producing zero function call overhead in the generated code.

~~ A crunch computes dot product across lanes
crunch dot ax ay az bx by bz -> d:
  d <- ax*bx + ay*by + az*bz

~~ Zero overhead when called
let dist_sq = dot(dx, dy, dz, dx, dy, dz)

Tines: Named Lane Masks

When different SIMD lanes need different behavior, you define tines. A tine is a named boolean mask that partitions lanes based on a predicate.

~~ Tines use # prefix (like grid lines / SIMD lanes)
| #miss  := (disc < <0.0>)
| #hit   := (!#miss)

~~ Tines can compose
| #close := (#hit && t < <max_dist>)

Through: Masked Computation

A through block executes computation only for lanes where the tine is true (i.e., where the rack passes through the tine). Other lanes receive an identity value. No branches—just masked SIMD operations.

~~ Compute sqrt only for lanes where disc >= 0
through #hit:
  let sqrt_disc = sqrt(disc)
  let t = (- b - sqrt_disc) / (<2.0> * a)
  t
-> t_value

~~ Miss lanes get -1
through #miss:
  < -1.0>
-> miss_value

Sweep: Collect Results

Sweep combines results from different tines using masked selection. Each lane picks the result from whichever tine matched.

sweep:
  | #miss -> miss_value
  | #hit  -> t_value
-> final_t

Rake: Putting It Together

A rake function combines tines, through blocks, and sweep for complete divergent control flow. Here's ray-sphere intersection:

rake intersect ray_ox ray_oy ray_oz ray_dx ray_dy ray_dz
  <sphere_cx> <sphere_cy> <sphere_cz> <sphere_r>
  -> t_result:

  let disc = b * b - <4.0> * a * c

  | #miss := (disc < <0.0>)
  | #hit  := (!#miss)

  through #hit:
    let sqrt_disc = sqrt(disc)
    (- b - sqrt_disc) / (<2.0> * a)
  -> t_value

  through #miss:
    < -1.0>
  -> miss_value

  sweep:
    | #miss -> miss_value
    | #hit  -> t_value
  -> t_result

Performance

Ray-sphere intersection benchmark, 8 million rays with 50% hit/miss divergence:

C (auto-vectorized, -O3 -march=native) 207.89 M rays/sec
Rake 677.89 M rays/sec

Anti-hype disclaimer: this is not magic. It's the power of exclusively targeting vector and GPU dialects of MLIR from day one. The backend engineers and architecture designers are the experts, rake just provides an ergonomic frontend which is much easier to do when you're not trying to embed it inside the grammar of an entirely different paradigm. Rake generates clean AVX2 assembly with inlined functions, optimal mask handling, and no unnecessary memory traffic. And things will only get better as OCaml MLIR support improves.

Install

Prerequisites: OCaml 5.0+ with dune, LLVM 17+ (mlir-opt, mlir-translate, llc)

$ git clone https://github.com/KaiStarkk/rake-lang
$ cd rake-lang && dune build
$ ./scripts/compile.sh examples/intersect_flat.rk

VS Code extension for syntax highlighting:

$ code --install-extension rake-lang-0.2.0.vsix

Status

SAFE HARBOUR: Rake is still very much in research and development. What you see here is essentially the entire content of the lexer / parser. Testing is currently through an FFI harness while I work on finalizing the specification and competing feature sets to enable Rake to support application programs, as opposed to just CPU kernels. Rake 0.2.0 is an alpha release. The compiler pipeline works: .rk → Parser → Type Checker → MLIR → LLVM IR → Native Code.

Working

Parsing, type inference, MLIR codegen, AVX2 output

Planned

AVX-512, standard library, multi-file projects