Skip to content

InternRobotics/Robo3R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation


Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction

Sizhe Yang, Linning Xu, Hao Li, Juncheng Mu, Jia Zeng, Dahua Lin, Jiangmiao Pang
Shanghai AI Laboratory, The Chinese University of Hong Kong, University of Science and Technology of China, Tsinghua University

RSS 2026

arXiv Homepage

🔥 Highlight

Robo3R enables manipulation-ready 3D reconstruction from RGB frames in real time.

By achieving accurate metric-scale 3D geometry in the canonical robot frame, Robo3R eliminates the need for depth sensors and calibration, while improving accuracy and robustness in challenging manipulation scenarios.

These features lead to notable improvements in downstream applications such as imitation learning, sim-to-real transfer, grasp synthesis, and collision-free motion planning.

framework

📁 Dataset

Our curated large-scale dataset is available at Robo3R-4M Dataset - Huggingface.

The dataset is generated with the Franka FR3 robot and contains two subsets:

  • 100kScenes_dtc-objaverse_not-in-gripper: 100k scenes where objects are randomly placed on the tabletop.
  • 20kScenes_dtc-objaverse_in-gripper: 20k scenes where one object is grasped by the gripper, and the remaining objects are randomly placed on the tabletop.

The dataset is split into multiple .tar.gz.part* files for upload. After downloading, concatenate the parts and extract them with the following commands:

# 100kScenes_dtc-objaverse_not-in-gripper
cd 100kScenes_dtc-objaverse_not-in-gripper
cat 100kScenes_dtc-objaverse_not-in-gripper.tar.gz.part* > 100kScenes_dtc-objaverse_not-in-gripper.tar.gz
tar -xzvf 100kScenes_dtc-objaverse_not-in-gripper.tar.gz
cd ..

# 20kScenes_dtc-objaverse_in-gripper
cd 20kScenes_dtc-objaverse_in-gripper
cat 20kScenes_dtc-objaverse_in-gripper.tar.gz.part* > 20kScenes_dtc-objaverse_in-gripper.tar.gz
tar -xzvf 20kScenes_dtc-objaverse_in-gripper.tar.gz
cd ..

The structure of the dataset is detailed below:

scene_{str(scene_idx).zfill(8)}
├── rgb
│   ├── {str(frame_idx).zfill(4)}_{str(camera_idx).zfill(2)}.jpg
│   └── ...
├── depth
│   ├── {str(frame_idx).zfill(4)}_{str(camera_idx).zfill(2)}.png
│   └── ...
├── mask
│   ├── {str(frame_idx).zfill(4)}_{str(camera_idx).zfill(2)}.png
│   └── ...
├── qpos
│   ├── {str(frame_idx).zfill(4)}.npy
│   └── ...
├── ee_pose
│   ├── {str(frame_idx).zfill(4)}.npy
│   └── ...
├── keypoint_3d
│   ├── {str(frame_idx).zfill(4)}.npy
│   └── ...
├── keypoint_2d
│   ├── {str(frame_idx).zfill(4)}_{str(camera_idx).zfill(2)}.npy
│   └── ...
└── cam_param.npy

Notes:

  • rgb/: RGB images captured from each camera.
  • depth/: Depth maps in metric units.
    • Background pixels have a depth value of 0.
    • When saved as PNG, depth is scaled by 10.0 * 2**16:
      depth = (depth / 10.0 * 2**16).astype(np.uint16)
      from PIL import Image
      Image.fromarray(depth).save('depth.png')
  • mask/: Segmentation masks. Values for table, robot, and object are 50, 100, and 150, respectively.
  • qpos/: Joint positions of the robot.
  • ee_pose/: End-effector pose of the robot.
  • keypoint_3d/: Coordinates of keypoints in the robot frame.
  • keypoint_2d/: Projection of keypoint_3d onto the image plane.
  • cam_param.npy: Camera intrinsics and extrinsics for all cameras.
    • Shape: (2, num_cameras, 4, 4).
    • The first dimension indexes intrinsics ([0]) and extrinsics ([1]).
    • The original (3, 3) intrinsics matrix is padded with an extra row and column so it shares the same shape as the extrinsics, allowing both to be stored in a single array.
  • Camera axes: +Z up, +X forward.

📢 The code will be released soon. Stay tuned!

🔗 Citation

If you find our work helpful, please cite:

@article{yang2026robo3r,
  title={Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction},
  author={Yang, Sizhe and Xu, Linning and Li, Hao and Mu, Juncheng and Zeng, Jia and Lin, Dahua and Pang, Jiangmiao},
  journal={arXiv preprint arXiv:2602.10101},
  year={2026}
}

📄 License

This repository is released under the Apache 2.0 license.

👏 Acknowledgements

Our code is built upon Pi3 and VGGT. We thank the authors for open-sourcing their code and for their significant contributions to the community.

About

Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors