fix: resolve training startup bugs for walking_clf_sym and lip_clf_ec tasks by fanyahao1 · Pull Request #23 · Zolkin1/robot_rl

fanyahao1 · 2026-02-26T07:34:02Z

Summary

This PR fixes a series of cascading runtime errors that prevented the walking_clf_sym and lip_clf_ec training tasks from starting up. The bugs were discovered sequentially as training initialization progressed further with each fix.

Changes

1. `trajectory_manager.py` — Skip `ori_w` when building `output_names`

Problem: The new YAML trajectory files (e.g., hf/trajectories/walking/*.yaml) store orientation as a full quaternion (ori_w, ori_x, ori_y, ori_z) inside bezier_coeffs.frames, but frame_vels only contains the 3-component angular velocity (ori_x, ori_y, ori_z). The previous code naively iterated over all axes in frames, causing a KeyError: 'ori_w' when looking up the velocity.

Fix: In _verify_consistent_outputs_and_get_info, filter output_names to only include axes that exist in both frames and frame_vels for each frame.

2. `clf.py` — Make `ordered_output_names` optional; support legacy numpy array weights

Problem: CLF.__init__ was refactored to require ordered_output_names as a positional argument and to expect dict-keyed Q/R weights. However, HLIPCommandTerm still instantiates CLF with positional numpy arrays for Q_weights/R_weights and no ordered_output_names, causing a TypeError.

Fix: Made ordered_output_names optional (None default). Added isinstance(np.ndarray) branches so that legacy diagonal numpy array weights are accepted directly without needing named keys.

3. `hlip_cmd.py` — Add missing properties required by rewards and observations

Problem: Several reward functions and observation terms accessed attributes on HLIPCommandTerm that did not exist:

y_des / dy_des — observations expected these names; the internal fields were y_out / dy_out
desired_contact_poses — used by holonomic_constraint reward
current_contact_poses — used by holonomic_constraint reward
current_contact_vels — used by holonomic_constraint_vel reward

Fix: Added the following properties to HLIPCommandTerm:

y_des / dy_des: simple aliases for y_out / dy_out
desired_contact_poses → [B, 6] tensor concatenating stance_foot_pos_0 and stance_foot_ori_0; returns zeros before initialization
current_contact_poses → [B, 6] real-time stance foot pose (position + Euler angles) from robot body data; returns zeros when stance_idx is None
current_contact_vels → [B, 4] concatenation of stance foot linear velocity and yaw rate; returns zeros when stance_idx is None

4. `rewards.py` — Fix tensor shape bugs in holonomic constraint rewards

Problem: Two reward functions were incorrectly collapsing their output to a scalar instead of returning a [B] per-environment tensor:

holonomic_constraint: pose_err was overwritten with wrap_to_pi(pose_err[:, -1]) (shape [B]), then .sum(dim=-1) collapsed it to a scalar.
holonomic_constraint_vel: v.sum(dim=-1).sum(dim=-1) ** 2 double-reduced to a scalar.

Fix:

holonomic_constraint: Clone pose_err, update only the last column in-place with wrap_to_pi, then compute (pose_err**2).sum(dim=-1) to retain [B] shape.
holonomic_constraint_vel: Replace with (v**2).sum(dim=-1) to correctly return [B].

5. `g1_lip_clf_env_cfg.py` / `g1_vanilla_walking_env_cfg.py` — Minor config corrections

Small fixes to env config files for lip_clf_ec and vanilla walking tasks.

Testing

These fixes were validated by running:

python scripts/rsl_rl/train_policy.py --env_type=walking_clf_sym --headless
CUDA_VISIBLE_DEVICES=1 python scripts/rsl_rl/train_policy.py --env_type=lip_clf_ec --headless

Both tasks now progress past the initialization phase without errors.

- trajectory_manager: skip ori_w axis when building output_names since new YAML files store full quaternion in frames but only 3D angular velocity in frame_vels - clf: make ordered_output_names optional and support legacy numpy array format for Q/R weights (used by HLIPCommandTerm) - hlip_cmd: add y_des/dy_des aliases and current_contact_vels, desired_contact_poses, current_contact_poses properties required by reward functions and observations - rewards: fix holonomic_constraint shape bug (was collapsing to scalar instead of returning [B] tensor); fix holonomic_constraint_vel shape bug (same issue) - g1_lip_clf_env_cfg, g1_vanilla_walking_env_cfg: minor config fixes

fanyahao1 added 2 commits February 25, 2026 12:13

fix: correct tensor shape bugs in holonomic constraint reward functions

7d4fcf3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve training startup bugs for walking_clf_sym and lip_clf_ec tasks#23

fix: resolve training startup bugs for walking_clf_sym and lip_clf_ec tasks#23
fanyahao1 wants to merge 2 commits into
Zolkin1:mainfrom
fanyahao1:fix/training-startup-bugs

fanyahao1 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fanyahao1 commented Feb 26, 2026

Summary

Changes

1. trajectory_manager.py — Skip ori_w when building output_names

2. clf.py — Make ordered_output_names optional; support legacy numpy array weights

3. hlip_cmd.py — Add missing properties required by rewards and observations

4. rewards.py — Fix tensor shape bugs in holonomic constraint rewards

5. g1_lip_clf_env_cfg.py / g1_vanilla_walking_env_cfg.py — Minor config corrections

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `trajectory_manager.py` — Skip `ori_w` when building `output_names`

2. `clf.py` — Make `ordered_output_names` optional; support legacy numpy array weights

3. `hlip_cmd.py` — Add missing properties required by rewards and observations

4. `rewards.py` — Fix tensor shape bugs in holonomic constraint rewards

5. `g1_lip_clf_env_cfg.py` / `g1_vanilla_walking_env_cfg.py` — Minor config corrections