Planned evolutions for Sheaf
Parallelism and distribution
Sheaf does not currently allow for explicit workload distribution. Instead, it relies on JAX to handle how tensors are partitioned across hardware, and does not expose its low-level sharding primitives such as pmap.
This works well for single-machine training, but limits multi-node scaling. A promising approach would be to leverage the "macros as architecture" to rewrite functions during compilation, potentially using Equinox as a backend. This could give us parallelization while keeping the high-level mathematical definitions clean from hardware concerns.
Language modularity and Namespaces
Keeping Sheaf as a "pure", mathematical language has created a paradox: it relies on imperative Python code for even the most basic stateful tasks. The mid-term goal is to restrict Python's role to non-trivial plumbing and orchestration. However, reducing this dependency would require adding stateful functions that would bloat the core language.
To solve this, future versions will introduce namespaces, which will allow the standard library to grow without cluttering the core language. Tools like file I/O or SafeTensors support could then be added without polluting the global symbol table.
Long term reflections
Whether Sheaf should remain interoperable with Python (like Clojure with Java), or do a paradigm break in the long term is still uncertain.
Keeping Sheaf interoperable avoids reinventing the wheel, but a "sovereign" version of Sheaf could eliminate a lot of runtime overhead and compile directly to XLA's StableHLO (dealing with the full MLIR would be a nightmare...), without the need for Python. This would turn it into a standalone, high-performance, and homoiconic differentiable framework.
This would be a lot of work however, as it would require implementing an auto-differentiation engine, something I'd much rather leave to someone else for now.
However, targeting XLA directly while breaking free from Python and JAX could open new opportunities. With a Sheaf compiler in Rust with AOT compilation on IREE, Sheaf could run on 5-50MB footprint at runtime and 50-100MB at compile-time, versus 500-600MB with Python+JAX. This would align the entire stack with Sheaf's philosophy of elegance and concision from the language down to the runtime.
For now, Sheaf remains within the Python ecosystem, the benefits of reusing existing tooling far outweigh the overhead.