Open-source edge AI
The first open-source ExecuTorch backend for Nordic's AXON NPU — running on Zephyr RTOS.
Write the model in PyTorch, deploy it to Nordic silicon in minutes. The gap between data scientist and embedded developer just got small — and you can even be both!
Standard PyTorch workflow — no custom framework. The same tools ML engineers already use.
A 192k-parameter Mamba1 language model optimized from 1,550 ms to 27.6 ms per token on the nRF54LM20B — a 56× speedup through progressive optimizations. A case study of what the backend enables on production workloads.
| Stage | Latency | Speedup | What changed |
|---|---|---|---|
| fp32 portable baseline | 1,550 ms | 1× | ExecuTorch portable runtime |
| AXON NPU delegation | 1,000 ms | 1.5× | Linear layers offloaded to the NPU |
| Fused custom operators | 55 ms | 28× | 22 op-fusion patterns (PTSiLU, PTSoftplus, RMSNorm) |
| q11.12 fixed-point | 35 ms | 44× | Power-of-two bit-shifts replace FPU ops |
| Persistent scan state | 27.6 ms | 56× | Zero-copy q12 state between tokens |
Each one is a complete PyTorch model — training notebook, quantization recipe, and deployment to the nRF54LM20 DK in a single flow. The kind of thing a data scientist can run end-to-end without needing a separate embedded team.
Industrial IoT predictive maintenance
Visual inspection
Always-on wake word
Delegate pattern. ExecuTorch partitions a model graph and hands supported subgraphs to backend delegates. Our AXON delegate claims the ops the NPU can accelerate, and ExecuTorch falls back to the portable CPU runtime for everything else. No model surgery required.
TOSA composition.
PyTorch ops are lowered to the Tensor Operator Set Architecture —
the same stable IR used by Arm's Ethos-U toolchain. The backend
composes with ExecuTorch's shared TOSABackend
infrastructure, reusing roughly 80% of the Ethos-U code path. New
PyTorch ops that decompose to TOSA primitives AXON already
supports get coverage automatically; expanding beyond that is a
matter of hardware op support plus a short converter, not a new
frontend.
Delegated op set. Fully Connected, Conv1D / Conv2D, Depthwise Conv, Average / Max Pool, Add, Multiply, and the ReLU family run directly on the NPU. Op extensions cover Sigmoid, Tanh, and Softmax. Anything outside this set falls back to the portable CPU runtime.
Op-fusion passes. The biggest wins come from pattern-matching fused operators (PTSiLU, PTSoftplus, RMSNorm, and 19 others) that collapse multiple TOSA ops into single NPU kernels. Combined with q11.12 fixed-point arithmetic — where scaling becomes a bit-shift — the compiler can emit hand-tuned kernels from a standard PyTorch model.
Zephyr RTOS.
Inference runs inside a Zephyr application — ExecuTorch and
Nordic's sdk-edge-ai
are both Zephyr modules pulled in via west.
The result coexists with the rest of the firmware (BLE stack,
sensors, peripherals) using Zephyr's cooperative and preemptive
scheduling, no bare-metal gymnastics required.
One Docker command. Everything pre-installed: NCS toolchain, ExecuTorch, PyTorch, Jupyter.
docker build -t axon-ai:latest docker/
docker run --rm -it \
-v $(pwd):/workspace/axon-ai \
-v ~/sdk-edge-ai:/opt/sdk-edge-ai:ro \
-p 8888:8888 axon-ai:latest