Skip to content

Fix: cartpole training#155

Merged
yangchen73 merged 1 commit intomainfrom
yc/fix_cartpole
Mar 2, 2026
Merged

Fix: cartpole training#155
yangchen73 merged 1 commit intomainfrom
yc/fix_cartpole

Conversation

@yangchen73
Copy link
Collaborator

Description

This PR fix the reward computation and success condition of cartpole rl task.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

XOh7U2wqmB

Checklist

  • I have run the black . command to format the code base.
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • Dependencies have been updated, if applicable.

Copilot AI review requested due to automatic review settings March 2, 2026 03:58
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix CartPole RL training by correcting (1) how success is determined for the CartPole task and (2) how reward-manager terms are combined with the task’s base reward.

Changes:

  • Update CartPole success logic to only report success when balance conditions are met at the end of an episode.
  • Fix reward aggregation so RewardManager terms are added on top of the task’s get_reward() output (instead of replacing it).
  • Adjust CartPole PPO training config naming and evaluation frequency.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
embodichain/lab/gym/envs/tasks/rl/basic/cart_pole.py Changes CartPole success condition to depend on final-step + balance criteria.
embodichain/lab/gym/envs/embodied_env.py Fixes reward extension to preserve base task reward and add reward-manager contributions.
configs/agents/rl/basic/cart_pole/train_config.json Updates experiment name and evaluation frequency for CartPole PPO training.
Comments suppressed due to low confidence (1)

embodichain/lab/gym/envs/embodied_env.py:356

  • This reward change is a behavioral fix but isn’t covered by a regression test. Since BaseEnv.step() routes all rewards through _extend_reward(), it would be good to add a unit test that configures a simple reward_manager term and asserts the final reward equals get_reward() + manager reward (and that info['rewards'] contains the term breakdown).
            extra_rewards, reward_info = self.reward_manager.compute(
                obs=obs, action=action, info=info
            )
            info["rewards"] = reward_info
            # Add manager terms to base reward from get_reward() so task reward is kept
            rewards = rewards + extra_rewards

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

upward_distance = torch.abs(qpos)
is_success = torch.logical_and(upward_distance < 0.02, torch.abs(qvel) < 0.05)
balance = torch.logical_and(upward_distance < 0.02, torch.abs(qvel) < 0.05)
at_final_step = self._elapsed_steps >= self.episode_length - 1
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at_final_step is checked against self.episode_length - 1, but check_truncated() times out at self.episode_length and BaseEnv.step() terminates immediately on info['success']. This makes the episode end one step early (and can also make success impossible when using gymnasium.make + TimeLimitWrapper if max_episode_steps != episode_length, e.g. CartPole is registered with max_episode_steps=50 while the cart_pole gym config sets episode_length=500). Consider deriving the final-step condition from the actual timeout condition (same threshold as truncation) and/or aligning episode_length with the env time limit so success can be reached in all supported entrypoints.

Suggested change
at_final_step = self._elapsed_steps >= self.episode_length - 1
at_final_step = self._elapsed_steps >= self.episode_length

Copilot uses AI. Check for mistakes.
@yangchen73 yangchen73 merged commit e83a646 into main Mar 2, 2026
5 of 6 checks passed
@yangchen73 yangchen73 deleted the yc/fix_cartpole branch March 2, 2026 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants