DoubleBee Hybrid Aerial-Ground Locomotion

The problem

Hybrid aerial-ground robots like DoubleBee classically need three controllers and a finite-state machine to glue them together: one for stand-up, one for wheeled traversal, one for bicopter flight. The mode-switching logic is brittle, the gains never quite generalize across modes, and the seams between modes are where the platform fails.

Approach

A single PPO policy in Isaac Lab trained against all three modes simultaneously — observation includes the configuration of both wheels and rotors, action space is the union of motor torques and thrust commands, and the reward shapes mode emergence rather than legislating it. Beta-distributed actor, RSL-RL as the trainer, Isaac Sim for the digital twin.

For waypoint navigation in clutter, the trained tracker is frozen and a hierarchical raycaster planner sits on top — coarse occupancy from a 2-D scan, refined feasibility checks against the tracker's known dynamics. The whole stack is deployed through ROS 2 to the real airframe.

DoubleBee transitioning between wheeled and flight modes in sim

Result

3× tighter tracking than a careful PID baseline tuned per-mode, measured on the same reference trajectories
90% goal-reach rate at 4% collision rate across 200 randomized cluttered scenes
Mode transitions happen smoothly inside the policy — no observable seam between e.g. stand-up and wheel commands

Tech notes

Trainer: PPO via RSL-RL, beta-distribution actor for bounded actions
Sim: Isaac Lab over Isaac Sim; physics tuned against bench identification
Planner: hierarchical raycaster operating on a 2-D occupancy refresh
Deploy: ROS 2 nodes, sim-to-real transfer via domain randomization on motor delays + thrust gain