Skip to content

gustavokpc/RL_DroneControl

Repository files navigation

Quadcopter RL

This repository trains and evaluates a 3D quadcopter that flies through gates using Stable-Baselines3 and sb3-contrib.

Files

  • drone_ppo_sb3.py: main script for training, continuing from checkpoints, evaluating, and rendering.
  • quadcopter_envs.py: primary gate environment implementation (Quadcopter3DGates) with physics, reward, observations, and actions.
  • quadcopter_hover_envs.py: alternate environment for a hover task.
  • quadcopter_animation/animation.py: OpenCV-based animation viewer for simulation playback.
  • quadcopter_animation/graphics.py: drawing utilities for the drone, gates, camera, and scene.
  • requirements.txt: Python dependencies.
  • rl_ckpt/: saved checkpoints from training runs.
  • Quadcopter3DGatesGym-v0/: pretrained checkpoint folders.
  • runs/: TensorBoard logs.

Install

Create and activate a Python virtual environment, then install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Exact commands to run (use these)

  1. Activate virtualenv:
source .venv/bin/activate
  1. Evaluate the provided checkpoint (this exact command worked in this repo):
python drone_ppo_sb3.py --render --policy-type recurrent_ppo --cont Quadcopter3DGatesGym-v0/recurrent_ppo_cfc_no_vel/recurrent_ppo_cfc_no_vel.zip

Do NOT add --no-vel to the command above unless the checkpoint was trained with that flag. Using incompatible observation flags causes an "Observation spaces do not match" error.

  1. Quick train test (small run):
python drone_ppo_sb3.py --total-timesteps 10000 --num-envs 4

Continue training from a checkpoint

python drone_ppo_sb3.py --policy-type recurrent_ppo --cont rl_ckpt/Quadcopter3DGatesGym-v0/recurrent_ppo_mlp_test/recurrent_ppo_mlp_test.zip --total-timesteps 1000000

Keep the same observation flags used in the original training run (e.g. --no-vel, --no-ang-vel, --low-obs).

Enable animation (visual playback)

The script supports an OpenCV animation viewer, but it is not auto-enabled by default. To show the animation during evaluation:

  1. Open drone_ppo_sb3.py and find the render_policy function.
  2. Uncomment the line:
# animate_policy(model, env)

so it becomes:

animate_policy(model, env)
  1. Run the evaluation command from above. The OpenCV window requires a desktop environment (a DISPLAY); it will not appear on headless servers unless you forward X11 or use a virtual display (e.g. Xvfb).

Troubleshooting: common errors and fixes

  • Observation spaces do not match: this happens when you pass flags (--no-vel, --low-obs, etc.) that were not used when the model was saved. Solution: run evaluation without those flags or match the original training flags.
  • Could not deserialize object policy_class / missing policy_class keys: the loader may need the local custom policy available. The script was updated to pass the custom policy class when loading; ensure you run the script from this repository so the custom policy definitions in drone_ppo_sb3.py are importable.
  • Animation not appearing: ensure you are running on a machine with a display or use X forwarding / Xvfb.

How the code works (brief)

  • drone_ppo_sb3.py parses command-line options and either trains or evaluates a model.
  • Training uses PPO or RecurrentPPO from Stable-Baselines3 / sb3-contrib, with custom recurrent CfC policies available in the script.
  • The environment is a vectorized gym-style simulation of a quadcopter flying through gates. Observations contain drone state, gate info, and optional parameter inputs.
  • Actions are motor commands; rewards encourage passing gates while avoiding collisions and leaving bounds.
  • Evaluation runs a trained model in a test environment, collects diagnostics, and prints averages. Optionally the animation can be enabled as described above.

Key CLI options

  • --policy-type: recurrent_ppo or ppo
  • --total-timesteps: total number of training steps
  • --num-envs: number of parallel environments
  • --cont: checkpoint file path (zip)
  • --render: evaluation mode
  • --low-obs, --no-vel, --no-ang-vel: observation ablations (must match training)
  • --param-input, --param-input-noise: include physical parameter encoding in the observation

If you want, I can add an --animate flag so animation can be toggled from the CLI instead of editing the file. Want me to add that now?

Simulate with the exported C RL controller

After generating the C exports under C_codes/, run the simulator with a C controller:

python simulate_rl_c.py --model-dir C_codes/recurrent_ppo_cfc_no_vel --episodes 20 --verbose

To open the animation viewer instead of batch evaluation:

python simulate_rl_c.py --model-dir C_codes/recurrent_ppo_cfc_no_vel --animate

The simulator compiles nn_operations.c and nn_parameters.c into libcontroller.so, calls nn_control() through ctypes, and resets the C recurrent state with nn_reset() whenever an episode ends.

Benchmark Python vs pure C inference

To compare the Python reference actor against a native C executable that runs nn_control() directly, use:

python benchmark_rl_c_pure.py --model-dir C_codes/recurrent_ppo_cfc_no_vel

For all exported controllers:

python benchmark_rl_c_pure.py --all

The benchmark compiles a temporary benchmark_controller executable with gcc -O3, reads observations from a binary file once, and measures only the repeated C inference loop. For LTC models, the default benchmark size is smaller because the Python scalar ODE reference is much slower.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors