On the Future of Sheaf

Feb 2nd 2026

When I started Sheaf in 2025, it came from a frustration with how we express machine learning. I summarized it in the rationale but will expand on it here.

PyTorch has become the default way to write machine learning code, but over time has also become a dense stack of abstractions, conventions, and implicit machinery. To me, JAX is a clear improvement. It makes the mathematical structure more explicit and pushes important concepts such as pure functions and transformations closer to the surface. Still, at the end of the day, the center of gravity remained the same: Python as the control plane, requiring a growing amount of machinery to compensate for what the language itself does not express.

The issue is not performance per se, but semantic noise and lack of elegance. Too much of what matters in a model ends up encoded indirectly, through imperative control flow, decorators, and framework-specific plumbing.

A systems perspective

I come from systems engineering and I tend to think in fairly old-fashioned terms. Unix ideas still resonate with me: explicit structure, minimal runtime assumptions, and components that do one thing and do it well.

Modern ML frameworks tend to violate these principles by necessity. They grow large because they must compensate for the host language. They accumulate features because there is no single place where the structure of computation is represented directly.

At some point, I started wondering what it would look like to approach differentiable programming from the opposite direction: start with a language that represents computation explicitly, and build the numerical machinery underneath it, not around it. by The first mental model was simple: Scheme with tensors and Suckless meets machine learning.

At least two projects have attempted bringing Lisp to modern "AI" (connexionist rather than symbolic):

Uncomplicate, from Dragan Djuric. A very nice set of linear algebra and machine learning libraries for Clojure. One of them, Neanderthal, brings a complete BLAS-like matrix/vector/linear algebra on top of Clojure, somewhat like Numpy does with Python.
AIscm, aiming at bringing accelerated numerical processing to Guile through TensorFlow and LLVM.

I love Scheme for its mathematical elegance. It brings an entire language from very few primitives, but its concision, ironically, can become noisy, and the lack of native types makes it also a burden to run a model. Besides, AIscm cannot be called a framework or a library, it's more like a Guile wrapper for TensorFlow. This is something I want to avoid with the design of Sheaf: a nice dress for a clunky framework.

Clojure has a lot of elegance: type V1 Lisp (common namespace for variables and functions), clean, terse syntax relying on metacharacters, and data immutability. This, in turn, is clearly what I want for Sheaf... But the JVM's weight stands in the way. I would rather have a lean, small runtime than a JVM. In an age of agentic development pouring oceans of unreviewed code every day, the promess of removing code has already become the value proposal.

Sure, Clojure can be compiled natively with graalVM, and Jank looks promising, but none of those options feel solid.

Instead, Sheaf V1 has focused on providing something that works, and for that purpose, JAX was the better option. That also means I still have Python in the way, but at least, JAX is a sane framework. In fact, JAX is brilliant, and I will certainly use it as a reference implementation as Sheaf matures.

Sheaf V1

Currently, Sheaf is essentially a functional layer for model description, inspired by Clojure, and delegating numerical execution to XLA via JAX.

While this still is a proof of concept for the Sheaf language, it already does some things well:

It offers a more mathematical and readable syntax for describing forward passes than most JAX and PyTorch code.
It provides a high density of context for agent-like systems, which is a novelty
It makes observability a first-class concern, through the trace and guard modes.
Being implemented in Python, it integrates directly with the ML ecosystem (NumPy, data loaders, visualization tools).

These are real advantages, but they are also narrow. Much of Sheaf v1’s value proposition is framed around Python interoperability: low boilerplate, easy integration, seamless use alongside existing JAX code.

But this is also the main issue...

If Python remains responsible for initialization, data loading, training loops, checkpointing, and orchestration, then Sheaf is just "yet another layer" to hide the underlying noise instead of removing it. It adds bloat instead of addressing it at the root.

Envisioning Sheaf v2

The real value of Sheaf lies in what happens when Python can be removed from the loop. Sheaf 1.2, releasing this week, is a step forward in this direction. A lot of boilerplate and imperative code has been added, even functionally impure forms I was initially reluctant to implement such as a do and while. All examples are now standalone and can do weight loading, checkpointing and training.

But Sheaf remains a DSL on top of Python.

The main goal for the next major release is to break free from Python. To achieve this, the plan is to emit StableHLO and let IREE lower it to Linalg, where Enzyme-MLIR can apply compile-time differentiation. This avoids reinventing the most difficult part: autodiff. Existing MLIR infrastructure will handle it.

Current architecture: Sheaf V1 architecture

Planned architecture for Sheaf v2: Sheaf V2 architecture

Why IREE and not XLA? XLA is too embedded in JAX's ecosystem, and thus Python, from which I want to get rid. IREE is more modular, and while it depends on LLVM for compilation, LLVM is not required at runtime for inference or training. I am fine with having LLVM part of the SDK, as long as it's not part of the runtime.

Moving to StableHLO and IREE will not be just an optimization but potentially the game changing aspect of Sheaf v2: on-device training, if I manage to keep the framework in the "TinyML" territory, like TinyGrad, but free of Python.

With v2, a model will ship as both .shf files and optional .vmfb files. Sheaf files will contain orchestration code (dataset paths, checkpointing configuration, I/O, logging) as the source of truth. The optional .vmfb files will contain compiled training computation (forward pass, backward pass, optimizer updates).

The (use ...) directive will become smarter: if a compiled file exists for the import, it runs it on the IREE runtime. Otherwise, it runs the interprated Sheaf file. I definitely will want a clean syntax with zero decorator, the opposite of what I have to suffer with Python.

The runtime, as envisioned, will contain:

sheaf-parser.so: parser (~2MB)
sheaf-interpreter.so: orchestration (NCCL/Gloo for distributed training, I/O for datasets and checkpoints, debug & trace)
iree-runtime.a: VMFB execution (~1MB)

Unlike PyTorch Mobile, which only supports inference, the VMFB will also support training. A robot, a satellite, or edge device can then train locally without Python, TorchInductor or XLA.

Sheaf v2 will be a completion of vision. The syntax stays, existing Sheaf code works the same, but the Python bloat goes. Of course, the removal of Python for the sake of cleanliness and elegance will drive some users away, but I suspect they are not the target audience for a Lisp. Sheaf is aimed at those who use and love Emacs, Guile, or Clojure, and wish to have for Lisp what JAX is for Python.

What's next

Sheaf 1.2 is coming this week, but I am already thinking heavily about Sheaf v2, notably about how to implement the parser and compiler. Should I write them in C++ or Rust? Should Sheaf remain interoperable with Python for data preprocessing? If so, how will I design the FFI?

Should Sheaf emit pure StableHLO, or inject higher-level structured ops for better optimization? How will I handle differentiation for the interpreter? How deeply should the compiler integrate with IREE's lowering pipeline?