This repository trains and evaluates a 3D quadcopter that flies through gates using Stable-Baselines3 and sb3-contrib.
drone_ppo_sb3.py: main script for training, continuing from checkpoints, evaluating, and rendering.quadcopter_envs.py: primary gate environment implementation (Quadcopter3DGates) with physics, reward, observations, and actions.quadcopter_hover_envs.py: alternate environment for a hover task.quadcopter_animation/animation.py: OpenCV-based animation viewer for simulation playback.quadcopter_animation/graphics.py: drawing utilities for the drone, gates, camera, and scene.requirements.txt: Python dependencies.rl_ckpt/: saved checkpoints from training runs.Quadcopter3DGatesGym-v0/: pretrained checkpoint folders.runs/: TensorBoard logs.
Create and activate a Python virtual environment, then install dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt- Activate virtualenv:
source .venv/bin/activate- Evaluate the provided checkpoint (this exact command worked in this repo):
python drone_ppo_sb3.py --render --policy-type recurrent_ppo --cont Quadcopter3DGatesGym-v0/recurrent_ppo_cfc_no_vel/recurrent_ppo_cfc_no_vel.zipDo NOT add --no-vel to the command above unless the checkpoint was trained with that flag. Using incompatible observation flags causes an "Observation spaces do not match" error.
- Quick train test (small run):
python drone_ppo_sb3.py --total-timesteps 10000 --num-envs 4python drone_ppo_sb3.py --policy-type recurrent_ppo --cont rl_ckpt/Quadcopter3DGatesGym-v0/recurrent_ppo_mlp_test/recurrent_ppo_mlp_test.zip --total-timesteps 1000000Keep the same observation flags used in the original training run (e.g. --no-vel, --no-ang-vel, --low-obs).
The script supports an OpenCV animation viewer, but it is not auto-enabled by default. To show the animation during evaluation:
- Open
drone_ppo_sb3.pyand find therender_policyfunction. - Uncomment the line:
# animate_policy(model, env)so it becomes:
animate_policy(model, env)- Run the evaluation command from above. The OpenCV window requires a desktop environment (a DISPLAY); it will not appear on headless servers unless you forward X11 or use a virtual display (e.g. Xvfb).
- Observation spaces do not match: this happens when you pass flags (
--no-vel,--low-obs, etc.) that were not used when the model was saved. Solution: run evaluation without those flags or match the original training flags. Could not deserialize object policy_class/ missingpolicy_classkeys: the loader may need the local custom policy available. The script was updated to pass the custom policy class when loading; ensure you run the script from this repository so the custom policy definitions indrone_ppo_sb3.pyare importable.- Animation not appearing: ensure you are running on a machine with a display or use X forwarding / Xvfb.
drone_ppo_sb3.pyparses command-line options and either trains or evaluates a model.- Training uses PPO or RecurrentPPO from Stable-Baselines3 / sb3-contrib, with custom recurrent CfC policies available in the script.
- The environment is a vectorized gym-style simulation of a quadcopter flying through gates. Observations contain drone state, gate info, and optional parameter inputs.
- Actions are motor commands; rewards encourage passing gates while avoiding collisions and leaving bounds.
- Evaluation runs a trained model in a test environment, collects diagnostics, and prints averages. Optionally the animation can be enabled as described above.
--policy-type:recurrent_ppoorppo--total-timesteps: total number of training steps--num-envs: number of parallel environments--cont: checkpoint file path (zip)--render: evaluation mode--low-obs,--no-vel,--no-ang-vel: observation ablations (must match training)--param-input,--param-input-noise: include physical parameter encoding in the observation
If you want, I can add an --animate flag so animation can be toggled from the CLI instead of editing the file. Want me to add that now?
After generating the C exports under C_codes/, run the simulator with a C controller:
python simulate_rl_c.py --model-dir C_codes/recurrent_ppo_cfc_no_vel --episodes 20 --verboseTo open the animation viewer instead of batch evaluation:
python simulate_rl_c.py --model-dir C_codes/recurrent_ppo_cfc_no_vel --animateThe simulator compiles nn_operations.c and nn_parameters.c into libcontroller.so, calls nn_control() through ctypes, and resets the C recurrent state with nn_reset() whenever an episode ends.
To compare the Python reference actor against a native C executable that runs nn_control() directly, use:
python benchmark_rl_c_pure.py --model-dir C_codes/recurrent_ppo_cfc_no_velFor all exported controllers:
python benchmark_rl_c_pure.py --allThe benchmark compiles a temporary benchmark_controller executable with gcc -O3, reads observations from a binary file once, and measures only the repeated C inference loop. For LTC models, the default benchmark size is smaller because the Python scalar ODE reference is much slower.