Picking a language for Sheaf's Compiler
Feb 10th 2026
Last week I outlined the vision for Sheaf V2. The next question was immediate: now that I am on my own without XLA, which language should I write the compiler in?
My favorite language for lower level code are C and Go, with Zig another contender. C++ and Rust are usually not on my radar unless I have to.
C is clean, small and dependable despite its notorious undefined behaviors. I have used it for countless projects already, but writing an MLIR compiler in C promises some entertaining debugging... Besides, most modern ML infrastructure uses "modern" C++11, pretty removed from ANSI C. If I want to call MLIR APIs directly, manipulate IR programmatically, or debug the compilation pipeline, C++ is the obvious choice even though I find it messy.
Rust, on the other hand, I don't really know. I've read about it and played with the rustlings exercises. I somewhat understand the borrow checker but I've never shipped any Rust code. Choosing Rust means relying on the dreaded "vibe coding" approach, which is intellectually dishonest and downright irresponsible for a compiler. Besides, I dislike this trend of blindly rewriting large portions of C code in Rust (or even have Claude do it), so I'd rather avoid giving this trend more momentum.
Considering C++
I am currently working on the CUDA C++ certification from Nvidia and diving deeper into IREE, which means a lot of exposure to "modern" C++, with the functional expressions and iterators.
I like the functional approach that modern C++ offers, reminiscent of Lisp.
map, reduce, transform, or tabulate are nice. But C++ still fights me on
the fundamentals. Memory management isn't just about new and delete anymore.
It's unique_ptr vs shared_ptr vs weak_ptr, move semantics, and trying to
remember if this API takes ownership or just borrows. The type system doesn't
help: it compiles, runs, and then segfaults three weeks later because some
object got freed and I didn't notice. I don't want to go through that.
I also dislike CMake, it is its own language. CMakeFiles can be buggy and let me down 90% down the build process (just submitted a PR for IREE). You vendor LLVM (hello IREE)? Great, now your build takes 40 minutes. You link dynamically? Good luck with RPATH on macOS.
I can work in C++, although reluctantly. But maintaining an MLIR compiler in C++ over years is something different, and every contributor will have to navigate the same landmines. Every refactor will trigger a fear that it will break something subtle, and that is going to happen anyway.
This is not a general argument against C++, but a local optimum for Sheaf’s constraints.
Golang
I mentioned I like Go a lot. It has Thompson and Pike behind it, which I both
have a lot of respect for, and it is simple at heart. The tooling is
excellent. go build "just works", it has no lifetime annotations, no borrow
checker fights...
For many compilers, Go would be a great choice, but Sheaf's compiler has a specific constraint: exhaustive pattern matching matters more than simplicity.
A compiler is a series of transformations on an AST. At each step, every
possible node type must be handled properly. Forget one case, and get a subtle
bug that only surfaces when someone writes [1 2 3] instead of '[1 2 3].
In Go, type switches are not exhaustive:
switch v := expr.(type) {
case Integer: ...
case Symbol: ...
// Forget something -> compiles, crashes at runtime
}
In Rust, on the other hand, the compiler enforces this:
match expr {
SheafValue::Integer(n) => ...,
SheafValue::Symbol(s) => ...,
// Forget something -> compilation error
}
For a project that will evolve over years, with contributions from both humans and LLMs, I need the compiler to catch these errors. That's the trade-off: Go's simplicity vs Rust's safety guarantees. For Sheaf, safety won, even though Go would be easier. So, Rust it is.
Rust
C sucks: too low-level, too many UBs for such a project. C++ sucks: too noisy, complex and bloated. And Rust sucks as well: too "hype-y" and large. Yet I will have to choose Rust. Not because it's trendy, but because it's the only language I felt glimpses of elegance when prototyping the Sheaf parser.
For instance, the Sheaf algebraic data types match how compilers think. This is what an AST node in Sheaf currently looks like:
enum SheafValue {
Integer(i64),
Float(f64),
Symbol(String),
List(Vec<SheafValue>),
Vector(Vec<SheafValue>),
}
With a simple match expr, the compiler forces to handle every case. It will
not allow to forget Vector and have it crash at runtime. Rust actually has a
beautiful type system, it makes the data structure explicit.
Of course, C++ has ways to perform the same, using variant or inheritance with
dynamic_cast, but it's clunkier.
Rust also famously checks ownership at compile time, which removes the fear
during refactor. If the borrow checker accepts it, then the data flow is
correct. I still hit panics, but there's no need for valgrind like in my past
projects. And no need to trace through pointer lifetimes.
Finally, like go build, cargo and its dependencies management just work. No
CMakeLists.txt that breaks on a different version of LLVM. No hunting for
where libmlir.so got installed.
The Real Cost
Choosing Rust also has real costs, which I am paying for right now.
The main one, obviously: I don't know Rust well. I tried asking an LLM to write me some Rust code, and it could have embedded an entire rootkit inside without me noticing. I'm picking up steam, slowly. The main grief is ironically the borrow checker I was praising above: it likes to reject things that feel like they should work. I hit lifetime errors I don't fully understand and while Claude helps, relying on it can undertake the foundation of Sheaf: reducing bloat and entropy.
Side note: writing code is trivial for an LLM, so code piles up and quickly turns into technical debt. This is easy to miss with vibe coding because of the massive amount of code generated in short times. This is the antithesis of what I want with Sheaf: suckless ML, that is, the least amount of code necessary for elegance and performance. Every byte should be justified.
Choosing Rust could make MLIR interop a problem later. Right now, Sheaf emits
text MLIR, so there's no FFI - and this will have to change for performance
reasons. But if I ever need to manipulate MLIR IR programmatically, I'd need
Rust bindings to MLIR's C++ APIs. That's manual work. bindgen would help, but
MLIR's template-heavy headers seem like a nightmare to wrap.
The ML systems community speaks C++. Should I need contributions from IREE, XLA, or TensorFlow teams, they will know C++ more than Rust.
But here's the thing: these costs are upfront. I only pay them now, during initial development, to have an easier life later. It's the opposite for C++.
The Entropy Principle
One of the goal of Sheaf is to reduce entropy in ML code. That's part the design filters: does this choice add noise or remove it?
Using C++ would add entropy:
- Build complexity (CMake, dependencies, platform quirks)
- Memory management (manual reasoning about ownership)
- Language cruft (40 years of legacy, multiple ways to do everything)
Using Rust removes entropy:
- One build tool (Cargo)
- Ownership checked by compiler
- ADTs that match AST structure
- Exhaustive pattern matching
If I'm building a language about clarity, the compiler should embody that. Not "do as I say, not as I do."
What Exists So Far
Over the past week, I built a small proof-of-concept to generate MLIR from S-expressions. What I have so far:
- A parser that handles S-expressions, vectors, symbols, literals
- A compiler stub that tracks function definitions and resolves symbols
- A code generator that emits StableHLO text for
(+ 1 2)and(* x y) - Error handling with source locations and meaningful messages
Everything builds and tests pass. The foundation is there. Now comes the real work: function calls, control flow, and tensors to make it an actual compiler.
I'll document the implementation as it unfolds.