Skip to content

Debug navigation tests#420

Merged
geurto merged 66 commits intomainfrom
hotfix-lidar-navigation-test
Apr 10, 2026
Merged

Debug navigation tests#420
geurto merged 66 commits intomainfrom
hotfix-lidar-navigation-test

Conversation

@rosalievanark
Copy link
Copy Markdown
Collaborator

@rosalievanark rosalievanark commented Mar 30, 2026

Bug fix of the navigation tests

Some hidden bugs had entered our code, and this PR has focused on solving these issues.

Original error: constant timeouts during the navigation tests. Examples of these tests:

An important change here was to use assert instead of raise, both in terms of how a raise seems to crash all tests afterwards, as well as that a raise in a pytest does not guarantee that no more code of the test itself will be executed (for example, the assert after the raise was still triggered, which was not the desired behaviour). These raise statements are still all over the tests, it is better to start replacing these with assert statements (then the test workflow will still continue with testing instead of stopping immediately).

Additionally found error (see diff_drive_controller for origin error) in the gazebo container: Command message contains NaNs. Not updating reference interfaces.
This was solved using the twist_mux package as a filter between the drive_controller of the Husarion package, and the behavior_server, controller_server, and collision_monitor of the nav2 package. Apparently during startup in the Github CI, nav2 would send invalid cmd_vel messages to the drive_controller, getting stuck in the NaN error. Adding the twist_mux filter fixed it.

Incorrect time usage by gazebo: https://github.com/alliander-opensource/alliander-robotics/actions/runs/24189542218/job/70603291944?pr=420
Fixed by adding use_sim_time to the Husarion launch file.

Additional improvements:

  • Add a login to our Docker account to the workflows for "unlimited" image pulling.
  • Run each test on it's own ROS domain (and as a safety precaution, also wait before starting all containers for no nodes to be active to avoid "ghost nodes").
  • Fix the pub_topic remapping in the nav2.launch.py file.

Testing

Explain how you tested your changes.

Documentation

  • I have updated the documentation (if necessary)

Additional Notes

Any relevant screenshots, logs, or context.

@rosalievanark rosalievanark force-pushed the hotfix-lidar-navigation-test branch from a2eb1f4 to 9bd9779 Compare March 30, 2026 08:50
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
…o back to pytest.fail and trigger more often

Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
… flow

Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
@rosalievanark rosalievanark force-pushed the hotfix-lidar-navigation-test branch from 30b8eeb to 8e4ff76 Compare April 1, 2026 10:32
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
…cmd_vel_final to cause no confusion

Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Comment thread alliander_core/src/alliander_description/husarion/urdf/original/gazebo.urdf.xacro Outdated
Comment thread alliander_husarion/src/alliander_husarion/launch/controllers.launch.py Outdated
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
…pute_path_to_pose would otherwise time out

Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
default_nav_to_pose_bt_xml: substitute_me
robot_base_frame: substitute_me
odom_topic: substitute_me
default_server_timeout: 10000
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, these are the timeouts in ms:

default_server_timeout: 20
default_cancel_timeout: 50
wait_for_service_timeout: 1000

On github, this irregularly could cause the following error:

[panther.bt_navigator_navigate_to_pose_rclcpp_node]: Timed out while waiting for action server to acknowledge goal request for compute_path_to_pose

Which would then result in failing to navigate to the goal during the test.

Not sure whether the increase to 10s each is too extreme, but maybe we can also find a neat way to only use these increased values for the github CI?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a direct problem of having a 10s timeout. If this appears to be a problem when using hardware, we can make a change to use different timeouts for hardware and simulation.

Comment thread conftest.py Outdated
…meters

Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
executable="twist_mux",
name="twist_mux",
namespace=vehicle_config.namespace,
parameters=[
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want these parameters to be moved to its own yaml file? And would that then be placed in alliander_description?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is fine to keep it like this for now. If we need to use the parameters add more places or change them often, we can start using a yaml file.

Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
…_vel by default

Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
sleep_infinity = ExecuteProcess(cmd=["sleep", "infinity"])

return [
SetParameter(name="use_sim_time", value=vehicle_config.simulation),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to add this to the controllers.launch.py file?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SetParameter sets the parameter for all nodes, also nodes defined in the sub launch files included. So it should not be necessary to add this to the controllers.launch.py.

…cstrings

Signed-off-by: Rosalie <rosalie.van.ark@alliander.com>
@rosalievanark rosalievanark marked this pull request as ready for review April 9, 2026 14:38
@rosalievanark rosalievanark requested review from Jelmerdw and geurto April 9, 2026 14:39
Copy link
Copy Markdown
Collaborator

@Jelmerdw Jelmerdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

executable="twist_mux",
name="twist_mux",
namespace=vehicle_config.namespace,
parameters=[
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is fine to keep it like this for now. If we need to use the parameters add more places or change them often, we can start using a yaml file.

sleep_infinity = ExecuteProcess(cmd=["sleep", "infinity"])

return [
SetParameter(name="use_sim_time", value=vehicle_config.simulation),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SetParameter sets the parameter for all nodes, also nodes defined in the sub launch files included. So it should not be necessary to add this to the controllers.launch.py.

default_nav_to_pose_bt_xml: substitute_me
robot_base_frame: substitute_me
odom_topic: substitute_me
default_server_timeout: 10000
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a direct problem of having a 10s timeout. If this appears to be a problem when using hardware, we can make a change to use different timeouts for hardware and simulation.

f"/{namespace_vehicle}/cmd_vel"
f"/{namespace_vehicle}/cmd_vel_nav"
if not nav2.collision_monitor
else f"/{namespace_vehicle}/cmd_vel_raw"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the naming of cmd_vel_nav and 'cmd_vel_joy. Only cmd_vel_raw` feels odd now. But I am not sure if another name would clarify it...

cmd_vel_raw is actually the input for the collision monitor, so we could call it something like cmd_vel_cm_in. But again, this name can also be vague, especially due to the abbreviation of cm for collision monitor. Any other thoughts?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_raw is used more often in ROS context to indicate something that is fed into a different node for processing, so it feels okay to me. Could also use cmd_vel_in to indicate it will be used by another node

vehicle_platform.nav2_config.navigation = True
test_class = type(
f"Test{vehicle.capitalize()}{lidar.capitalize()}Navigation",
f"Test{vehicle.capitalize()}{lidar.capitalize()}LidarNavigation",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this rename also for GPS? So Navgiation -> GPSNavigation

Comment thread conftest.py
LAUNCH_TIMEOUT = 90 # seconds
COMPOSE_FILE = "/alliander_robotics/compose_pytest.yml"
HOST_COMPOSE_FILE = "/alliander_robotics/compose.yml"
MAX_ROS_DOMAIN_ID = (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these additions still needed with the twist_mux fix? I guess it is also cleaner this way as we separate the environments more between tests. Are there any downsides to adding this, e.g. increased test time?

@geurto geurto merged commit 2cb34a2 into main Apr 10, 2026
43 checks passed
@geurto geurto deleted the hotfix-lidar-navigation-test branch April 10, 2026 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants