Fix: cartpole training by yangchen73 · Pull Request #155 · DexForce/EmbodiChain

yangchen73 · 2026-03-02T03:58:31Z

Description

This PR fix the reward computation and success condition of cartpole rl task.

Type of change

Bug fix (non-breaking change which fixes an issue)

Screenshots

Checklist

I have run the black . command to format the code base.
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
Dependencies have been updated, if applicable.

Copilot

Pull request overview

This PR aims to fix CartPole RL training by correcting (1) how success is determined for the CartPole task and (2) how reward-manager terms are combined with the task’s base reward.

Changes:

Update CartPole success logic to only report success when balance conditions are met at the end of an episode.
Fix reward aggregation so RewardManager terms are added on top of the task’s get_reward() output (instead of replacing it).
Adjust CartPole PPO training config naming and evaluation frequency.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`embodichain/lab/gym/envs/tasks/rl/basic/cart_pole.py`	Changes CartPole success condition to depend on final-step + balance criteria.
`embodichain/lab/gym/envs/embodied_env.py`	Fixes reward extension to preserve base task reward and add reward-manager contributions.
`configs/agents/rl/basic/cart_pole/train_config.json`	Updates experiment name and evaluation frequency for CartPole PPO training.

Comments suppressed due to low confidence (1)

embodichain/lab/gym/envs/embodied_env.py:356

This reward change is a behavioral fix but isn’t covered by a regression test. Since BaseEnv.step() routes all rewards through _extend_reward(), it would be good to add a unit test that configures a simple reward_manager term and asserts the final reward equals get_reward() + manager reward (and that info['rewards'] contains the term breakdown).

            extra_rewards, reward_info = self.reward_manager.compute(
                obs=obs, action=action, info=info
            )
            info["rewards"] = reward_info
            # Add manager terms to base reward from get_reward() so task reward is kept
            rewards = rewards + extra_rewards

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-02T04:02:38Z

embodichain/lab/gym/envs/tasks/rl/basic/cart_pole.py

        upward_distance = torch.abs(qpos)
-        is_success = torch.logical_and(upward_distance < 0.02, torch.abs(qvel) < 0.05)
+        balance = torch.logical_and(upward_distance < 0.02, torch.abs(qvel) < 0.05)
+        at_final_step = self._elapsed_steps >= self.episode_length - 1


at_final_step is checked against self.episode_length - 1, but check_truncated() times out at self.episode_length and BaseEnv.step() terminates immediately on info['success']. This makes the episode end one step early (and can also make success impossible when using gymnasium.make + TimeLimitWrapper if max_episode_steps != episode_length, e.g. CartPole is registered with max_episode_steps=50 while the cart_pole gym config sets episode_length=500). Consider deriving the final-step condition from the actual timeout condition (same threshold as truncation) and/or aligning episode_length with the env time limit so success can be reached in all supported entrypoints.

Suggested change

at_final_step = self._elapsed_steps >= self.episode_length - 1

at_final_step = self._elapsed_steps >= self.episode_length

Fix: cartpole training

c27fe43

Copilot AI review requested due to automatic review settings March 2, 2026 03:58

Copilot started reviewing on behalf of yangchen73 March 2, 2026 03:58 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

yuecideng approved these changes Mar 2, 2026

View reviewed changes

yangchen73 merged commit e83a646 into main Mar 2, 2026
5 of 6 checks passed

yangchen73 deleted the yc/fix_cartpole branch March 2, 2026 04:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: cartpole training#155

Fix: cartpole training#155
yangchen73 merged 1 commit intomainfrom
yc/fix_cartpole

yangchen73 commented Mar 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	at_final_step = self._elapsed_steps >= self.episode_length - 1
	at_final_step = self._elapsed_steps >= self.episode_length

Conversation

yangchen73 commented Mar 2, 2026

Description

Type of change

Screenshots

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants