The problem
Real-time MPC on the Crazyflie 2.1 is constrained by two things at once: the Cortex-M4 has roughly 0.5 ms to solve a QP per loop, and the platform's tiny power budget rules out anything that pulls in BLAS. Off-the-shelf solvers like OSQP clock in around 850 µs on this class of board for the same problem — too slow to leave headroom for state estimation and PWM mixing.
Approach
Wrote a from-scratch ADMM solver in plain C that exploits MPC's banded KKT structure. Each iteration's (A^T A + ρ I)^-1 factor is precomputed once via a cached Riccati recursion along the horizon — the inner loop is a sequence of triangular solves and projections, no heap allocation, no library dependencies.
For benchmarking I implemented three baselines on the same problem: OSQP (warm-started), a hand-rolled interior-point solver, and ReLU-QP (the unrolled-network approximation). All four were measured on the same Cortex-M4 with the same warm-start seed.
Once the ADMM solver was validated, I distilled its policy via DAgger into a small two-layer feedforward net. The student is exported as a single header — zero dependencies, fixed-point friendly — and runs in 2.1 µs on the same chip.

Result
- 63 µs median solve over the 20-step horizon, 13× faster than OSQP on the same problem
- 3.7 mm RMS position tracking on a figure-8 reference at 2 Hz, MuJoCo round-trip validated against the real airframe
- The DAgger-distilled neural controller hits 2.1 µs with no measurable degradation in tracking — small enough to run inside the existing 1 kHz attitude loop without a dedicated solver thread
Tech notes
- Solver: custom C ADMM, banded KKT, cached Riccati factor, ~120 SLoC of inner loop
- Distillation: DAgger over 8 hours of MPC rollouts; 2-layer ReLU MLP, 64 hidden units
- Sim: MuJoCo for differential dynamics + actuator delays; matched against bench logs
- Compute: STM32F405 Cortex-M4 @ 168 MHz, FPU on
