The problem
Hybrid aerial-ground robots like DoubleBee classically need three controllers and a finite-state machine to glue them together: one for stand-up, one for wheeled traversal, one for bicopter flight. The mode-switching logic is brittle, the gains never quite generalize across modes, and the seams between modes are where the platform fails.
Approach
A single PPO policy in Isaac Lab trained against all three modes simultaneously — observation includes the configuration of both wheels and rotors, action space is the union of motor torques and thrust commands, and the reward shapes mode emergence rather than legislating it. Beta-distributed actor, RSL-RL as the trainer, Isaac Sim for the digital twin.
For waypoint navigation in clutter, the trained tracker is frozen and a hierarchical raycaster planner sits on top — coarse occupancy from a 2-D scan, refined feasibility checks against the tracker's known dynamics. The whole stack is deployed through ROS 2 to the real airframe.

Result
- 3× tighter tracking than a careful PID baseline tuned per-mode, measured on the same reference trajectories
- 90% goal-reach rate at 4% collision rate across 200 randomized cluttered scenes
- Mode transitions happen smoothly inside the policy — no observable seam between e.g. stand-up and wheel commands
Tech notes
- Trainer: PPO via RSL-RL, beta-distribution actor for bounded actions
- Sim: Isaac Lab over Isaac Sim; physics tuned against bench identification
- Planner: hierarchical raycaster operating on a 2-D occupancy refresh
- Deploy: ROS 2 nodes, sim-to-real transfer via domain randomization on motor delays + thrust gain
