Planned evolutions for Sheaf

Sheaf currently compiles through JAX to XLA, inheriting Python's runtime overhead. The next major version will target StableHLO directly, eliminating the Python dependency for compiled programs.

Upcoming architecture:

Sheaf compiler rewritten in Rust
Direct StableHLO emission (bypassing JAX)
AOT compilation via IREE
Autodiff via Enzyme or IREE's gradient primitives

Benefits:

Runtime footprint: 1-2MB (vs 500-700MB with Python+JAX or PyTorch)
Compile-time footprint: 30-50MB
No Python runtime required for inference or training
Standalone executables for deployment

This will align the technical stack with the language: elegance and concision from syntax to runtime. The transition will also preserve Sheaf's homoiconic nature and functional purity yet eliminate accidental complexity inherited from Python.

Parallelism and distribution

Sheaf does not currently expose low-level sharding primitives like pmap. Instead, it relies on JAX's automatic partitioning, which works well for single-machine training but limits multi-node scaling.

In V2, distribution could be expressed through:

StableHLO's native sharding annotations
Macro-based rewrites during compilation
IREE's multi-device runtime capabilities

This would give explicit control over parallelization while keeping mathematical definitions clean from hardware concerns.