Skip to content

WangYuLin-SEU/HCCEPose

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

135 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HccePose (BF)

arXiv HuggingFace

English | δΈ­ζ–‡

🧩 Introduction

HccePose(BF) introduces a Hierarchical Continuous Coordinate Encoding (HCCE) mechanism that encodes the three coordinate components of object surface points into hierarchical continuous codes. Through this hierarchical encoding scheme, the neural network can effectively learn the correspondence between 2D image features and 3D surface coordinates of the object, while significantly enhancing its capability to learn accurate object masks. Unlike traditional methods that only learn the visible front surface of objects, HccePose(BF) additionally learns the 3D coordinates of the back surface, thereby establishing denser 2D–3D correspondences and substantially improving pose estimation accuracy.

HccePose(BF) overview figure

✨ Update


  • ⚠️ Note: All paths must be absolute paths to avoid runtime errors.
  • 2025.10.27: We’ve released cc0textures-512, a lightweight alternative to the original 44GB CC0Textures library β€” now only 600MB! πŸ‘‰ Download here
  • 2025.10.28: s4_p1_gen_bf_labels.py has been updated. If the dataset does not contain a camera.json, the script will automatically create a default one.
  • 2026.04.04: RGB-D refinement with FoundationPose / MegaPose (Refinement/, example s4_p3_test_mi10_bin_picking_RGBD_*.py); optional ONNX / TensorRT acceleration for HccePose and FoundationPose (hccepose_acceleration, foundationpose_acceleration); per-stage timings via results_dict['time_dict'] and print_stage_time_breakdown. Sample RGB-D frames 000000–000003 are on Hugging Face β€” test_imgs_RGBD (each stem: {stem}_rgb.png, {stem}_depth.png, {stem}_camK.json). A git checkout may still ship only 000003_* under test_imgs_RGBD/ to keep the repo small; download that folder for all four stems (see scripts/download_hf_assets.py, hf-dataset-card/README.md).
  • 2026.04.04 (docs): Quick Start and RGB-D sections now state the OpenCV BGR convention (cv2.imread / VideoCapture β†’ pass unchanged to Tester.predict); removed erroneous COLOR_RGB2BGR after imread. Code comments aligned with BGR training norms, FoundationPose BGRβ†’RGB at the refiner entry, and MegaPose BGR debug panels.
  • 2026.04.06 (docs): Minimal inference checklist, first-run time expectations, troubleshooting, optional requirements-inference.txt, and a note to use a dedicated venv/conda env when ONNX scripts may auto-install packages.

πŸ”§ Environment Setup

Configuration Details

Download the HccePose(BF) Project and Unzip BOP-related Toolkits

# Clone the project
git clone https://github.com/WangYuLin-SEU/HCCEPose.git
cd HCCEPose

# Unzip toolkits
unzip bop_toolkit.zip
unzip blenderproc.zip

Configure Ubuntu System Environment (Python 3.10)

⚠️ A GPU driver with EGL support must be pre-installed.

Version pins below match a reference Conda py310 interpreter (Python 3.10.19) after pip install (PyTorch wheels report suffixes such as +cu128 in pip list).

apt-get update && apt-get install -y wget software-properties-common gnupg2 python3-pip

apt-get update && apt-get install -y libegl1-mesa-dev libgles2-mesa-dev libx11-dev libxext-dev libxrender-dev

pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128

apt-get update && apt-get install pkg-config libglvnd0 libgl1 libglx0 libegl1 libgles2 libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev cmake curl ninja-build

pip install ultralytics==8.3.233 fvcore==0.1.5.post20221221 pybind11==2.12.0 trimesh==4.2.2 ninja==1.11.1.1 kornia==0.6.9 open3d==0.19.0 transformations==2024.6.1 numpy==1.26.4 opencv-python==4.9.0.80 opencv-contrib-python==4.9.0.80

pip install scipy==1.15.3 kiwisolver==1.4.9 matplotlib==3.10.7 imageio==2.37.2 pypng Cython==3.2.1 PyOpenGL triangle glumpy Pillow==11.3.0 vispy imgaug mathutils pyrender pytz tqdm tensorboard kasal-6d==0.1.3 rich==14.2.0 h5py==3.15.1 diffrp-nvdiffrast

pip install bpy==3.6.0 --extra-index-url https://download.blender.org/pypi/

apt-get install libsm6 libxrender1 libxext-dev

python -c "import imageio; imageio.plugins.freeimage.download()"

pip install -U "huggingface_hub[hf_transfer]==0.36.0"

One-file shortcut (inference stack, excludes bpy): after installing torch / torchvision / torchaudio from the PyTorch CUDA index above, run pip install -r requirements-inference.txt (see comments in that file). Full BlenderProc / training users should still follow the bpy line when needed.

Optional: RGB-D refinement & acceleration
  • BOP toolkit path: keep bop_toolkit/ at the project root (unzip bop_toolkit.zip here) so imports match the training and test scripts.
  • FoundationPose (RGB-D refinement): nvdiffrast (import nvdiffrast.torch, Refinement/foundationpose.py) is provided by diffrp-nvdiffrast, already listed in the main pip install commands above. To build from source instead, see NVlabs/nvdiffrast. Weights are not included in this repo. Download the official bundle from NVlabs/FoundationPose (Data prepare) β€” Google Drive folder β€” and place the refiner and scorer under the project root as 2023-10-28-18-33-37/ and 2024-01-11-20-02-45/ (each with config.yml, model_best.pth), matching this repository’s scripts. Optional mirror (not NVIDIA-hosted): gpue/foundationpose-weights. License: use of FoundationPose weights is subject to the official FoundationPose license; do not assume unrestricted commercial use.
  • ONNX Runtime GPU / TensorRT: HccePose and FoundationPose can use HccePose.hccepose_acceleration and Refinement.foundationpose_acceleration for ONNX or TensorRT backends when enabled from the test scripts (see s4_p3_test_mi10_bin_picking_onnx.py, s4_p3_test_mi10_bin_picking_tensorrt.py, and RGB-D examples). On first use, HccePose.tester.Tester calls ensure_acceleration_backend_environment to install/check onnx and onnxruntime-gpu when needed; TensorRT additionally requires matching libnvinfer (e.g. pip install tensorrt or NVIDIA’s tarball on LD_LIBRARY_PATH). Example pins from the same reference py310 env: onnx==1.21.0, onnxruntime-gpu==1.23.2, tensorrt 10.16.x (variant depends on your CUDA stackβ€”align with your ORT build).
  • MegaPose: on first register_megapose() / first MegaPose refinement path, the code can automatically clone megapose6d into third_party_megapose6d/, create a dedicated Python 3.9 prefix under .envs/megapose/ (via conda create -p … python=3.9), install MegaPose’s own PyTorch/torchvision stack there, and download models (e.g. under local_data/megapose-models). Inference runs in subprocesses using .envs/megapose/bin/python, not your main HccePose interpreter (e.g. Python 3.10): do not install MegaPose’s torch stack into the py310 envβ€”keep the two environments separate. Requires conda on PATH, network, and a writable project directory. License: MegaPose code and models follow the megapose6d upstream license.

πŸ“₯ Bulk download from Hugging Face (optional)

Hosts such as AutoDL often need the platform VPN / academic accelerator before talking to Hugging Face (official site or API). If your image provides it:

source /etc/network_turbo

Then run the download commands in the same shell session. After that, prefer --endpoint hf (official huggingface.co) for both scripts; third-party mirrors may return HTTP 403 to the tree API from some regions.

The helper scripts/download_hf_assets.py lives only in this GitHub repo, not in the Hugging Face dataset file list. Run it from the repository root: it downloads into the same relative paths as Quick Start / RGB-D examples (test_imgs/, test_videos/, test_imgs_RGBD/, demo-bin-picking/, demo-tex-objs/ beside HccePose/, etc.) β€” same layout as copying those folders from the dataset browser or using a fully prepared dev tree.

source /etc/network_turbo   # when available
python scripts/download_hf_assets.py --preset test --endpoint hf

Default --dest is the repo root. --endpoint auto tries the official hub, then https://hf-mirror.com. See python scripts/download_hf_assets.py --help for presets and optional --foundationpose (community mirror β€” check license). Dataset card text: hf-dataset-card/README.md β†’ publish as README.md on the HF dataset repo (it links back to this script and explains paths).

Fallback: per-file wget (when snapshot_download hangs or drops) β€” scripts/wget_hf_demo_assets.py lists files via the Hub tree API, downloads each with resumable wget -c, and checks byte sizes against the API. It skips generated *_show_* artifacts under test_imgs/ / test_videos/. It also pulls the four FoundationPose weight files into 2023-10-28-18-33-37/ and 2024-01-11-20-02-45/ at the repo root (same layout as the manual / Drive setup). On constrained networks (e.g. AutoDL), run source /etc/network_turbo first if available; point pip and temp dirs at a large disk to avoid filling the root overlay, e.g. export PIP_CACHE_DIR=/path/to/big/pip-cache TMPDIR=/path/to/big/tmp.

source /etc/network_turbo   # when available
cd HCCEPose
python scripts/wget_hf_demo_assets.py --endpoint hf              # parallel wget (default ~8 workers)
python scripts/wget_hf_demo_assets.py --endpoint hf -j 4        # cap concurrency if the hub rate-limits
python scripts/wget_hf_demo_assets.py --endpoint hf -j 1          # strictly sequential
python scripts/wget_hf_demo_assets.py --endpoint hf --verify-only # size check only

If the official hub is unreachable without the accelerator, enable network_turbo (or your own VPN) first, then retry with --endpoint hf. Use --endpoint mirror only if your mirror exposes the Hub tree API without 403.


🎯 Minimal setup (inference demo)

Use this if you only want to run the Bin-Picking RGB example (s4_p3_test_mi10_bin_picking.py) without the full BlenderProc training stack.

Must have at the repository root

Item Why
HccePose/ Core package (from git).
bop_toolkit/ HccePose.bop_loader imports bop_toolkit β€” unzip bop_toolkit.zip here (or keep an equivalent tree).
demo-bin-picking/ models/, yolo11/, HccePose/ weights (Hugging Face layout).
test_imgs/ Sample images for the script loop.

Python environment: follow the pip pins in Environment Setup (Python 3.10 recommended). For a one-shot install of the non-PyTorch pins, see requirements-inference.txt after installing torch / torchvision / torchaudio from the CUDA wheel index.

Not needed for that script alone

  • blenderproc.zip and bpy β€” only for BlenderProc synthesis / training-related workflows.
  • FoundationPose weight folders, test_imgs_RGBD/, MegaPose / ONNX / TensorRT β€” only when you run the matching s4_p3_test_*.py scripts.

Recommended: use a dedicated conda env or venv for HccePose. The ONNX test path may call ensure_acceleration_backend_environment and run pip install for onnx / onnxruntime-gpu; an isolated env avoids surprising your system Python.

⏱ First-run time (what to expect)

Step Typical note
MegaPose (s4_p3_test_mi10_bin_picking_RGBD_megapose.py, etc.) First run can take tens of minutes (clone megapose6d, conda env under .envs/megapose/, PyTorch stack, model download). Later runs on the same machine are usually on the order of one minute for the same script, if caches remain.
download_hf_assets.py / wget_hf_demo_assets.py Depends on bandwidth; use source /etc/network_turbo on hosts like AutoDL when available.
ONNX First Tester construction with ONNX may trigger a one-time pip of ONNX Runtime GPU wheels.

🩹 Troubleshooting (common)

  1. ValueError: All ufuncs must have type numpy.ufunc (often under scipy / imgaug): your NumPy / SciPy pair does not match. Reinstall the versions from Environment Setup (e.g. numpy==1.26.4, scipy==1.15.3) on Python 3.10. Avoid mixing Python 3.12 with the reference pin set unless you re-validate the stack.
  2. ImportError: numpy.core.multiarray / AttributeError: _ARRAY_API on import cv2: OpenCV’s wheel was built against a different NumPy major than the one installed. Align opencv and numpy with the pinned versions in the environment section (reinstall both in the same env).
  3. TensorRT script fails (s4_p3_test_mi10_bin_picking_tensorrt.py): TensorRT needs a matching libnvinfer / driver stack. To verify the core pipeline first, run hccepose_acceleration='pytorch' or 'onnx' only; treat TensorRT as optional.

🧱 Custom Dataset and Training

🎨 Object Preprocessing

Click to expand

Using the demo-bin-picking dataset as an example, we first designed the object in SolidWorks and exported it as an STL mesh file.
STL file link: πŸ”— https://huggingface.co/datasets/SEU-WYL/HccePose/blob/main/raw-demo-models/multi-objs/board.STL

Then, the STL file was imported into MeshLab, and surface colors were filled using the Vertex Color Filling tool.

After coloring, the object was exported as a non-binary PLY file containing vertex colors and normals.

The exported model center might not coincide with the coordinate origin, as shown below:

To align the model center with the origin, use the script s1_p1_obj_rename_center.py.
This script loads the PLY file, aligns the model center, and renames it following BOP conventions.
The obj_id must be set manually as a unique non-negative integer for each object.
Example:

input_ply obj_id output_ply
board.ply 1 obj_000001.ply
board.ply 2 obj_000002.ply

After centering and renaming all objects, place them into a folder named models with the following structure:

Dataset_Name
|--- models
      |--- obj_000001.ply
      ...
      |--- obj_000015.ply

πŸŒ€ Rotational Symmetry Analysis

Click to expand

In 6D pose estimation tasks, many objects exhibit various types of rotational symmetry, such as cylindrical, conical, or polyhedral symmetry. For such objects, the KASAL tool is used to analyze and export symmetry priors in BOP format.

KASAL project: πŸ”— https://github.com/WangYuLin-SEU/KASAL

Installation:

pip install kasal-6d

Launch the KASAL GUI with:

from kasal.app.polyscope_app import app
mesh_path = 'demo-bin-picking'
app(mesh_path)

KASAL automatically scans all PLY or OBJ files under mesh_path (excluding generated _sym.ply files).

In the interface:

  • Use Symmetry Type to select the symmetry category
  • For n-fold pyramidal or prismatic symmetry, set N (n-fold)
  • Enable ADI-C for texture-symmetric objects
  • If the result is inaccurate, use axis xyz for manual fitting

KASAL defines 8 symmetry types. Selecting the wrong one will result in visual anomalies, helping verify your choice.

Click Cal Current Obj to compute the object’s symmetry axis. Symmetry priors will be saved as:

  • Symmetry prior file: obj_000001_sym_type.json
  • Visualization file: obj_000001_sym.ply

🧾 Generating BOP-Format Model Information

Click to expand

Run s1_p3_obj_infos.py to traverse all ply files and their symmetry priors in the models folder. This script generates a standard models_info.json file in BOP format.

Example structure:

Dataset_Name
|--- models
      |--- models_info.json
      |--- obj_000001.ply
      ...
      |--- obj_000015.ply

This file serves as the foundation for PBR rendering, YOLOv11 training, and HccePose(BF) model training.


πŸ”₯ Rendering the PBR Dataset

Click to expand

Based on BlenderProc, we modified a rendering script β€” s2_p1_gen_pbr_data.py β€” for generating new datasets. Running this script directly in Python may cause a memory leak, which accumulates over time and gradually degrades rendering performance. To address this issue, we provide a Shell script β€” s2_p1_gen_pbr_data.sh β€” that repeatedly invokes s2_p1_gen_pbr_data.py, effectively preventing memory accumulation and improving efficiency. Additionally, several adjustments were made to BlenderProc to better support PBR dataset generation for new object sets.


Preparation Before Rendering

Before rendering, use s2_p0_download_cc0textures.py to download the CC0Textures material library.
After downloading, the directory structure should look like this:

HCCEPose
|--- s2_p0_download_cc0textures.py
|--- cc0textures

The cc0textures library occupies about 44GB of disk space, which is quite large. To reduce storage requirements, we provide a lightweight alternative called cc0textures-512, with a size of approximately 600MB. Download link: πŸ‘‰ https://huggingface.co/datasets/SEU-WYL/HccePose/blob/main/cc0textures-512.zip

When running the rendering script, simply replace the cc0textures path with cc0textures-512 to use the lightweight material library. (It is sufficient to download only cc0textures-512; the original cc0textures is not required.)


Running the Renderer

The s2_p1_gen_pbr_data.py script is responsible for PBR data generation, and it is adapted from BlenderProc2.

Run the following commands:

cd HCCEPose
chmod +x s2_p1_gen_pbr_data.sh
nohup ./s2_p1_gen_pbr_data.sh 0 42 xxx/xxx/cc0textures xxx/xxx/demo-bin-picking xxx/xxx/s2_p1_gen_pbr_data.py > s2_p1_gen_pbr_data.log 2>&1 &

Folder Structure

After executing the above process, the program will:

  • Use materials from xxx/xxx/cc0textures;
  • Load 3D object models from xxx/xxx/demo-bin-picking/models;
  • Generate 42 folders, each containing 1000 PBR-rendered frames, under xxx/xxx/demo-bin-picking.

The resulting structure will be:

demo-bin-picking
|--- models
|--- train_pbr
      |--- 000000
      |--- 000001
      ...
      |--- 000041

πŸš€ Training the 2D Detector

Click to expand

In 6D pose estimation tasks, a 2D detector is typically used to locate the object’s bounding box, from which cropped image regions are used for 6D pose estimation. Compared with directly regressing 6D poses from the entire image, the two-stage approach (2D detection β†’ 6D pose estimation) offers better accuracy and stability. Therefore, HccePose(BF) is equipped with a 2D detector based on YOLOv11.

The following sections describe how to convert BOP-format PBR training data into YOLO-compatible data and how to train YOLOv11.


Converting BOP PBR Data to YOLO Format

To automate the conversion from BOP-style PBR data to YOLO training data, we provide the s3_p1_prepare_yolo_label.py script. After specifying the dataset path xxx/xxx/demo-bin-picking and running the script, the program will create a new folder named yolo11 inside demo-bin-picking.

The generated directory structure is as follows:

demo-bin-picking
|--- models
|--- train_pbr
|--- yolo11
      |--- train_obj_s
            |--- images
            |--- labels
            |--- yolo_configs
                |--- data_objs.yaml
            |--- autosplit_train.txt
            |--- autosplit_val.txt

Explanation:

  • images β†’ Folder containing 2D training images
  • labels β†’ Folder containing 2D bounding box (BBox) annotations
  • data_objs.yaml β†’ YOLO configuration file
  • autosplit_train.txt β†’ List of training samples
  • autosplit_val.txt β†’ List of validation samples

Training the YOLOv11 Detector

To train the YOLOv11 detector, use the s3_p2_train_yolo.py script. After specifying the dataset path xxx/xxx/demo-bin-picking, run the script to train YOLOv11 and save the best model weights as yolo11-detection-obj_s.pt.

The final directory structure after training is shown below:

demo-bin-picking
|--- models
|--- train_pbr
|--- yolo11
      |--- train_obj_s
            |--- detection
                |--- obj_s
                    |--- yolo11-detection-obj_s.pt
            |--- images
            |--- labels
            |--- yolo_configs
                |--- data_objs.yaml
            |--- autosplit_train.txt
            |--- autosplit_val.txt

⚠️ Important Note

s3_p2_train_yolo.py continuously scans the detection folder for the file yolo11-detection-obj_s.pt. This mechanism allows the training process to automatically resume after unexpected interruptions, which is particularly useful for cloud servers or environments where training progress cannot be easily monitored β€” helping to prevent idle GPU time and reduce unnecessary costs. However, if you intend to restart training from scratch, you must delete the file yolo11-detection-obj_s.pt first; otherwise, the program will resume from the previous checkpoint instead of reinitializing.


🧩 Preparation of Front–Back Surface Labels

Click to expand

In HccePose(BF), the network simultaneously learns the front-surface and back-surface 3D coordinates of each object. To generate these labels, separate depth maps are rendered for the front and back surfaces.

During front-surface rendering, gl.glDepthFunc(gl.GL_LESS) is applied to preserve the smallest depth values, corresponding to the points closest to the camera. These are defined as the front surfaces, following the β€œfront-face” concept used in traditional back-face culling. Similarly, for back-surface rendering, gl.glDepthFunc(gl.GL_GREATER) is used to retain the largest depth values, corresponding to the farthest visible surfaces. Finally, the 3D coordinate label maps are generated based on these depth maps and the ground-truth 6D poses.


Symmetry Handling and Pose Correction

For symmetric objects, both discrete and continuous rotational symmetries are represented as a unified set of symmetry matrices.
Using these matrices and the ground-truth pose, a new set of valid ground-truth poses is computed.
To ensure pose label uniqueness, the pose with the minimum L2 distance from the identity matrix is selected as the final label.

Moreover, due to the imaging principle, when an object undergoes translation without rotation, a visual rotation can occur from a fixed viewpoint. For symmetric objects, this apparent rotation can cause erroneous 3D label maps. To correct this effect, we reconstruct 3D coordinates from the rendered depth maps and apply RANSAC PnP to refine the rotation.


Batch Label Generation

Based on the above procedure, we implement s4_p1_gen_bf_labels.py, a multi-process rendering script for generating front and back 3D coordinate label maps in batches. After specifying the dataset path /root/xxxxxx/demo-bin-picking and the subfolder train_pbr, running the script produces two new folders:

  • train_pbr_xyz_GT_front β€” Front-surface 3D label maps
  • train_pbr_xyz_GT_back β€” Back-surface 3D label maps

Directory structure:

demo-bin-picking
|--- models
|--- train_pbr
|--- train_pbr_xyz_GT_back
|--- train_pbr_xyz_GT_front

The following example shows three corresponding images:
the original rendering, the front-surface label map, and the back-surface label map.


πŸš€ Training HccePose(BF)

Click to expand

When training HccePose(BF), a separate weight model must be trained for each object.
The s4_p2_train_bf_pbr.py script supports multi-GPU batch training across multiple objects.

Taking the demo-tex-objs dataset as an example, the directory structure after training is as follows:

demo-tex-objs
|--- HccePose
    |--- obj_01
    ...
    |--- obj_10
|--- models
|--- train_pbr
|--- train_pbr_xyz_GT_back
|--- train_pbr_xyz_GT_front

The ide_debug flag controls whether the script runs in single-GPU or multi-GPU (DDP) mode:

  • ide_debug=True β†’ Single-GPU mode, ideal for debugging in IDEs.
  • ide_debug=False β†’ Enables DDP (Distributed Data Parallel) training.

Note that directly running DDP training within IDEs such as VSCode may cause communication issues.
Hence, we recommend launching multi-GPU training in a detached session:

screen -S train_ddp
nohup python -u -m torch.distributed.launch --nproc_per_node 6 /root/xxxxxx/s4_p2_train_bf_pbr.py > log4.file 2>&1 &

For single-GPU execution or debugging, use:

nohup python -u /root/xxxxxx/s4_p2_train_bf_pbr.py > log4.file 2>&1 &

Setting Training Ranges

To train multiple objects, specify the range of object IDs using start_obj_id and end_obj_id. For example, setting start_obj_id=1 and end_obj_id=5 trains objects obj_000001.ply through obj_000005.ply. To train a single object, set both values to the same number.

You may also adjust total_iteration according to training needs (default: 50000). For DDP training, the total number of training samples is computed as:

total samples = total iteration Γ— batch size Γ— GPU number


✏️ Quick Start

Smallest path to a first result: see Minimal setup (inference demo) β€” Bin-Picking RGB demo only, without BlenderProc / bpy.

This project provides a simple HccePose(BF)-based application example for the Bin-Picking task.
To reduce reproduction difficulty, both the objects (3D printed with standard white PLA material) and the camera (Xiaomi smartphone) are easily accessible devices.

You can:

  • Print the sample object multiple times
  • Randomly place the printed objects
  • Capture photos freely using your phone
  • Directly perform 2D detection, 2D segmentation, and 6D pose estimation using the pretrained weights provided in this project

Please keep the folder hierarchy unchanged.

Type Resource Link
🎨 Object 3D Models demo-bin-picking/models
πŸ“ YOLOv11 Weights demo-bin-picking/yolo11
πŸ“‚ HccePose Weights demo-bin-picking/HccePose
πŸ–ΌοΈ Test Images test_imgs
πŸŽ₯ Test Videos test_videos
πŸ“· RGB-D (Hugging Face) test_imgs_RGBD β€” 000000–000003 ({stem}_rgb.png, {stem}_depth.png, {stem}_camK.json). Example: hf download dataset SEU-WYL/HccePose --repo-type dataset --include "test_imgs_RGBD/*" or scripts/download_hf_assets.py --preset test
πŸ“· RGB-D (minimal git tree) test_imgs_RGBD/ may ship 000003_* only in git for a minimal clone; copy or download the Hugging Face folder into test_imgs_RGBD/ for multi-frame RGB-D scripts

⚠️ Note:
Files beginning with train_ are only required for training.
For this Quick Start section, only the above test files are needed.


⏳ Model and Loader

During testing, import the following modules:

  • HccePose.tester β†’ Integrated testing module covering 2D detection, segmentation, and 6D pose estimation.
  • HccePose.bop_loader β†’ BOP-format dataset loader for loading object models and training data.

πŸ“Έ Example Test

The following image shows the experimental setup:

Click to expand

Several white 3D-printed objects are placed inside a bowl on a white table, then photographed with a mobile phone.

Example input image πŸ‘‡

Source image: Example Link

You can directly use the following script for 6D pose estimation and visualization:

Color layout: cv2.imread / VideoCapture.read yield BGR uint8. Pass them unchanged to Tester.predict (same as s4_p3_test_mi10_bin_picking.py). HccePose normalizes with IMAGENET_MEAN_BGR / IMAGENET_STD_BGR. FoundationPose converts BGR→RGB inside Refinement_FP.inference_batch. MegaPose feeds RGB to the upstream estimator; MegaPose debug panels are BGR for cv2.imwrite.

Click to expand code
import cv2, os, sys
import numpy as np
from HccePose.bop_loader import bop_dataset
from HccePose.test_script_utils import print_stage_time_breakdown, save_visual_artifacts
from HccePose.tester import Tester

if __name__ == '__main__':
    sys.path.insert(0, os.getcwd())
    current_dir = os.path.dirname(sys.argv[0])
    dataset_path = os.path.join(current_dir, 'demo-bin-picking')
    test_img_path = os.path.join(current_dir, 'test_imgs')
    bop_dataset_item = bop_dataset(dataset_path)
    obj_id = 1
    CUDA_DEVICE = '0'
    hccepose_vis = True
    save_visualizations = hccepose_vis
    print_stage_timing = False
    hccepose_acceleration = 'pytorch'

    Tester_item = Tester(
        bop_dataset_item,
        hccepose_vis=hccepose_vis,
        CUDA_DEVICE=CUDA_DEVICE,
        hccepose_acceleration=hccepose_acceleration,
    )
    for name in ['IMG_20251007_165718']:
        file_name = os.path.join(test_img_path, '%s.jpg' % name)
        image = cv2.imread(file_name)
        cam_K = np.array([
            [2.83925618e+03, 0.00000000e+00, 2.02288638e+03],
            [0.00000000e+00, 2.84037288e+03, 1.53940473e+03],
            [0.00000000e+00, 0.00000000e+00, 1.00000000e+00],
        ])
        results_dict = Tester_item.predict(
            cam_K, image, [obj_id], conf=0.85, confidence_threshold=0.85,
        )
        print_stage_time_breakdown(results_dict, enabled=print_stage_timing, prefix=name)
        save_visual_artifacts([
            (file_name.replace('.jpg', '_show_2d.jpg'), results_dict.get('show_2D_results')),
            (file_name.replace('.jpg', '_show_6d_vis0.jpg'), results_dict.get('show_6D_vis0')),
            (file_name.replace('.jpg', '_show_6d_vis1.jpg'), results_dict.get('show_6D_vis1')),
            (file_name.replace('.jpg', '_show_6d_vis2.jpg'), results_dict.get('show_6D_vis2')),
        ], enabled=save_visualizations)

Full options (acceleration, missing-image skips, output prefixes) match the repository script s4_p3_test_mi10_bin_picking.py.

Video demos: s4_p3_test_mi10_bin_picking_video.py and s4_p3_test_mi10_tex_objs_video.py iterate whole clips under test_videos/ and can take a long time. For routine validation, prefer the single-image scripts above; use the *_video.py files when you need full-sequence outputs.


🎯 Visualization Results

2D Detection Result (_show_2d.jpg):

Network Outputs:

  • HCCE-based front and back surface coordinate encodings

  • Object mask

  • Decoded 3D coordinate visualizations


πŸ’« If you find this tutorial helpful

Please consider giving it a ⭐️ on GitHub! Your support motivates us to keep improving and updating the project πŸ™Œ


πŸŽ₯ 6D Pose Estimation in Videos

Detailed Content

The single-frame pose estimation pipeline can be easily extended to video sequences, enabling continuous-frame 6D pose estimation, as shown in the following example:

Click to expand code
import cv2, os, sys
import numpy as np
from HccePose.bop_loader import bop_dataset
from HccePose.test_script_utils import print_stage_time_breakdown
from HccePose.tester import Tester

if __name__ == '__main__':
    sys.path.insert(0, os.getcwd())
    current_dir = os.path.dirname(sys.argv[0])
    dataset_path = os.path.join(current_dir, 'demo-bin-picking')
    test_video_path = os.path.join(current_dir, 'test_videos')
    bop_dataset_item = bop_dataset(dataset_path)
    obj_id = 1
    CUDA_DEVICE = '0'
    hccepose_vis = True
    hccepose_acceleration = 'pytorch'
    save_visualizations = hccepose_vis
    print_stage_timing = False

    Tester_item = Tester(
        bop_dataset_item,
        hccepose_vis=hccepose_vis,
        CUDA_DEVICE=CUDA_DEVICE,
        hccepose_acceleration=hccepose_acceleration,
    )
    for name in ['VID_20251009_141247']:
        file_name = os.path.join(test_video_path, '%s.mp4' % name)
        cap = cv2.VideoCapture(file_name)
        fps = cap.get(cv2.CAP_PROP_FPS)
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out_1 = None
        out_2 = None
        cam_K = np.array([
            [1.63235512e+03, 0.00000000e+00, 9.74032712e+02],
            [0.00000000e+00, 1.64159967e+03, 5.14229781e+02],
            [0.00000000e+00, 0.00000000e+00, 1.00000000e+00],
        ])
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
            # Frames are already BGR (same convention as imread).
            results_dict = Tester_item.predict(
                cam_K, frame, [obj_id], conf=0.85, confidence_threshold=0.85,
            )
            print_stage_time_breakdown(results_dict, enabled=print_stage_timing, prefix='%s frame' % name)
            fps_hccepose = 1 / results_dict['time']
            if not save_visualizations:
                continue
            show_6D_vis1 = results_dict['show_6D_vis1']
            show_6D_vis1[show_6D_vis1 < 0] = 0
            show_6D_vis1[show_6D_vis1 > 255] = 255
            if out_1 is None:
                out_1 = cv2.VideoWriter(
                    file_name.replace('.mp4', '_show_1.mp4'),
                    fourcc,
                    fps,
                    (show_6D_vis1.shape[1], show_6D_vis1.shape[0]),
                )
            out_1.write(show_6D_vis1.astype(np.uint8))
            show_6D_vis2 = results_dict['show_6D_vis2']
            show_6D_vis2[show_6D_vis2 < 0] = 0
            show_6D_vis2[show_6D_vis2 > 255] = 255
            if out_2 is None:
                out_2 = cv2.VideoWriter(
                    file_name.replace('.mp4', '_show_2.mp4'),
                    fourcc,
                    fps,
                    (show_6D_vis2.shape[1], show_6D_vis2.shape[0]),
                )
            cv2.putText(
                show_6D_vis2,
                'FPS: {0:.2f}'.format(fps_hccepose),
                (20, 60),
                cv2.FONT_HERSHEY_SIMPLEX,
                2,
                (0, 255, 0),
                4,
                cv2.LINE_AA,
            )
            out_2.write(show_6D_vis2.astype(np.uint8))
        cap.release()
        if out_1 is not None:
            out_1.release()
        if out_2 is not None:
            out_2.release()

See s4_p3_test_mi10_bin_picking_video.py for multi-video loops and the same flags.


🎯 Visualization Results

Original Video:

Detection Results:


In addition, by passing a list of multiple object IDs to HccePose.tester, multi-object 6D pose estimation can also be achieved.

Please keep the folder hierarchy unchanged.

Type Resource Link
🎨 Object 3D Models demo-tex-objs/models
πŸ“ YOLOv11 Weights demo-tex-objs/yolo11
πŸ“‚ HccePose Weights demo-tex-objs/HccePose
πŸ–ΌοΈ Test Images test_imgs
πŸŽ₯ Test Videos test_videos
πŸ“· RGB-D (Hugging Face) test_imgs_RGBD β€” 000000–000003 (scripts/download_hf_assets.py --preset test)
πŸ“· RGB-D (minimal git tree) test_imgs_RGBD/ may ship 000003_* only; download the Hugging Face folder for all sample stems

⚠️ Note:
Files beginning with train_ are only required for training.
For this Quick Start section, only the above test files are needed.

Original Video:

Detection Results:


πŸ“· RGB-D refinement (FoundationPose / MegaPose)

This section mirrors the single-image and video tutorials above: the RGB-D capture figure (RGB + colorized depth) is shown directly, then FoundationPose, MegaPose (RGB-D), and a three-way comparison with collapsible code blocks and show_vis/ figures.

Place each frame under test_imgs_RGBD/ as {stem}_rgb.png, {stem}_depth.png, {stem}_camK.json. The JSON stores fx, fy, cx, cy; depth scaling follows convert_depth_to_meter in Refinement/refinement_test_utils.py. Download the sample pack from Hugging Face β€” test_imgs_RGBD (000000–000003) so it matches that logic. A git snapshot may include only 000003; add 000000–000002 from Hugging Face for multi-frame tables in s4_p3_test_mi10_bin_picking_RGBD_FP_vs_MP.py (e.g. python scripts/download_hf_assets.py --preset test --endpoint auto).

FoundationPose needs 2023-10-28-18-33-37/ and 2024-01-11-20-02-45/ at the repo root (see Optional: RGB-D refinement & acceleration above). MegaPose may auto-setup on first use.

Color: load_capture_frame reads *_rgb.png with cv2.imread β†’ BGR uint8, same as the single-image tutorial. save_visual_artifacts / cv2.imwrite expect BGR for color JPEG/PNG. FoundationPose and MegaPose refinement paths apply the internal RGB/BGR conversions described in the Quick Start note above.


⏳ Modules (RGB-D path)

  • HccePose.tester.Tester β€” pass meter-scale depth as depth=depth_m, and set use_foundationpose=True or use_megapose=True together with the corresponding foundationpose_* / megapose_* kwargs.
  • Refinement.refinement_test_utils β€” list_capture_frame_names, load_capture_frame, and (for the comparison script) build_depth_comparison_visual.
  • HccePose.test_script_utils β€” save_visual_artifacts, print_stage_time_breakdown (driven by print_stage_timing in the scripts).
  • results_dict['time_dict'] β€” optional per-stage timings (YOLO, HccePose, FoundationPose, MegaPose, visualization, …).

ONNX / TensorRT: HccePose uses s4_p3_test_mi10_bin_picking_onnx.py / s4_p3_test_mi10_bin_picking_tensorrt.py; FoundationPose refinement can set foundationpose_acceleration in the RGB-D scripts. The comparison demo sets foundationpose_acceleration='onnx' by default in the repository file.


πŸ“Έ Example: RGB-D input (frame 000003)

Example RGB-D view: left = RGB (000003_rgb.png), right = pseudo-color depth (000003_depth.png) with a compact TURBO color bar embedded on the right edge of the depth panel; numeric ticks (min / mid / max, meters) sit immediately to the left of that bar. Depth uses convert_depth_to_meter from Refinement/refinement_test_utils.py (same as load_capture_frame), then linear mapping on valid pixels; invalid/zero depth stays black. RGB and depth panels are the same size and concatenated side by side.

Example raw inputs (also mirrored on Hugging Face β€” test_imgs_RGBD for 000000–000003): test_imgs_RGBD/000003_rgb.png, 000003_depth.png, 000003_camK.json.


πŸ“Έ Example: FoundationPose RGB-D refinement

Run s4_p3_test_mi10_bin_picking_RGBD_foundationpose.py after placing FoundationPose weights. Toggle hccepose_vis / foundationpose_vis when you need HccePose or FoundationPose debug renders (files are written next to the captures unless you change the paths).

Click to expand code
import cv2, os, sys
from HccePose.bop_loader import bop_dataset
from HccePose.test_script_utils import print_stage_time_breakdown, save_visual_artifacts
from HccePose.tester import Tester
from Refinement.refinement_test_utils import load_capture_frame, list_capture_frame_names


if __name__ == '__main__':

    sys.path.insert(0, os.getcwd())
    current_dir = os.path.dirname(sys.argv[0])
    dataset_path = os.path.join(current_dir, 'demo-bin-picking')
    capture_dir = os.path.join(current_dir, 'test_imgs_RGBD')
    foundationpose_refine_dir = os.path.join(current_dir, '2023-10-28-18-33-37')
    foundationpose_score_dir = os.path.join(current_dir, '2024-01-11-20-02-45')
    bop_dataset_item = bop_dataset(dataset_path)
    obj_id = 1
    CUDA_DEVICE = '0'
    hccepose_vis = False
    foundationpose_vis = False
    foundationpose_vis_stages = [1, 2, 3, 4, 5, 'score']
    hccepose_acceleration = 'pytorch'
    foundationpose_acceleration = 'pytorch'
    save_visualizations = hccepose_vis or foundationpose_vis
    print_stage_timing = False

    tester_item = Tester(
        bop_dataset_item,
        hccepose_vis=hccepose_vis,
        CUDA_DEVICE=CUDA_DEVICE,
        foundationpose_refine_dir=foundationpose_refine_dir,
        foundationpose_score_dir=foundationpose_score_dir,
        hccepose_acceleration=hccepose_acceleration,
        foundationpose_acceleration=foundationpose_acceleration,
    )
    frame_names = list_capture_frame_names(capture_dir)
    for name in frame_names:
        image, depth, depth_m, cam_K = load_capture_frame(capture_dir, name)

        results_dict = tester_item.predict(
            cam_K,
            image,
            [obj_id],
            conf=0.85,
            confidence_threshold=0.85,
            depth=depth_m,
            use_foundationpose=True,
            foundationpose_vis=foundationpose_vis,
            foundationpose_vis_stages=foundationpose_vis_stages,
        )
        print_stage_time_breakdown(results_dict, enabled=print_stage_timing, prefix=name)
        save_visual_artifacts([
            (os.path.join(capture_dir, '%s_show_2d.jpg' % name), results_dict.get('show_2D_results')),
            (os.path.join(capture_dir, '%s_show_6d_vis0.jpg' % name), results_dict.get('show_6D_vis0')),
            (os.path.join(capture_dir, '%s_show_6d_vis1.jpg' % name), results_dict.get('show_6D_vis1')),
            (os.path.join(capture_dir, '%s_show_6d_vis2.jpg' % name), results_dict.get('show_6D_vis2')),
            (os.path.join(capture_dir, '%s_show_foundationpose.jpg' % name), results_dict.get('show_foundationpose')),
        ], enabled=save_visualizations)

This block matches the checked-in s4_p3_test_mi10_bin_picking_RGBD_foundationpose.py.


🎯 Visualization Results (FoundationPose)

FoundationPose fusion view and decoded HccePose-style maps after refinement (exported from frame 000003; figures live under show_vis/ for the documentation):


πŸ“Έ Example: MegaPose refinement (RGB-D branch)

s4_p3_test_mi10_bin_picking_RGBD_megapose.py sets megapose_use_depth=True so megapose_variant_name='rgbd'. Switch to RGB-only by setting megapose_use_depth=False ('rgb' suffix in filenames).

Click to expand code
import os, sys
import cv2
from HccePose.bop_loader import bop_dataset
from HccePose.test_script_utils import print_stage_time_breakdown, save_visual_artifacts
from HccePose.tester import Tester
from Refinement.refinement_test_utils import load_capture_frame, list_capture_frame_names


if __name__ == '__main__':

    sys.path.insert(0, os.getcwd())
    current_dir = os.path.dirname(sys.argv[0])
    dataset_path = os.path.join(current_dir, 'demo-bin-picking')
    capture_dir = os.path.join(current_dir, 'test_imgs_RGBD')
    bop_dataset_item = bop_dataset(dataset_path)
    obj_id = 1
    CUDA_DEVICE = '0'
    hccepose_vis = True
    hccepose_acceleration = 'pytorch'
    megapose_use_depth = True
    megapose_vis = True
    megapose_vis_stages = [1, 2, 3, 4, 5]
    megapose_variant_name = 'rgbd' if megapose_use_depth else 'rgb'
    save_visualizations = hccepose_vis or megapose_vis
    print_stage_timing = False

    tester_item = Tester(
        bop_dataset_item,
        hccepose_vis=hccepose_vis,
        CUDA_DEVICE=CUDA_DEVICE,
        hccepose_acceleration=hccepose_acceleration,
    )

    frame_names = list_capture_frame_names(capture_dir)
    for name in frame_names:
        image, depth, depth_m, cam_K = load_capture_frame(capture_dir, name)
        megapose_depth = depth_m if megapose_use_depth else None

        results_mp = tester_item.predict(
            cam_K,
            image,
            [obj_id],
            conf=0.85,
            confidence_threshold=0.85,
            depth=megapose_depth,
            use_megapose=True,
            megapose_vis=megapose_vis,
            megapose_vis_stages=megapose_vis_stages,
        )
        print_stage_time_breakdown(results_mp, enabled=print_stage_timing, prefix='%s | MegaPose %s' % (name, megapose_variant_name.upper()))

        save_visual_artifacts([
            (os.path.join(capture_dir, '%s_show_megapose.jpg' % name), results_mp.get('show_megapose')),
            (os.path.join(capture_dir, '%s_megapose_%s_show_2d.jpg' % (name, megapose_variant_name)), results_mp.get('show_2D_results')),
            (os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis0.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis0')),
            (os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis1.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis1')),
            (os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis2.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis2')),
        ], enabled=save_visualizations)

This block matches s4_p3_test_mi10_bin_picking_RGBD_megapose.py.


🎯 Visualization Results (MegaPose RGB-D)

MegaPose debug canvas and decoded maps for the rgbd variant on 000003:

RGB-only MegaPose (megapose_use_depth=False) for the same stem:


πŸ“Έ Example: HccePose vs FoundationPose vs MegaPose (comparison & depth fusion)

s4_p3_test_mi10_bin_picking_RGBD_FP_vs_MP.py runs, for every frame, (1) HccePose with depth, (2) FoundationPose refinement, (3) MegaPose rgb and rgbd variants, saves side-by-side overlays, and builds build_depth_comparison_visual panels. The file also defines print_foundationpose_benchmark to sweep FoundationPose backends on the first frameβ€”omitted below; open the script for the full listing. It defaults foundationpose_acceleration='onnx'; this README’s PyTorch 2.8.0+cu128 line includes torch.backends.mha, which ONNX export expects. On much older PyTorch (e.g. 2.2) you may see AttributeError: module 'torch.backends' has no attribute 'mha'β€”upgrade PyTorch or switch that flag to a non-ONNX mode if appropriate.

Click to expand code (per-frame pipeline)
import cv2, os, sys
import numpy as np
from HccePose.bop_loader import bop_dataset
from HccePose.test_script_utils import print_stage_time_breakdown, save_visual_artifacts
from HccePose.tester import Tester
from Refinement.refinement_test_utils import build_depth_comparison_visual, load_capture_frame, list_capture_frame_names

# The same repository file defines print_foundationpose_benchmark (and related helpers)
# above __main__. Copy that file verbatim to run the optional first-frame backend sweep.

if __name__ == '__main__':

    sys.path.insert(0, os.getcwd())
    current_dir = os.path.dirname(sys.argv[0])
    dataset_path = os.path.join(current_dir, 'demo-bin-picking')
    capture_dir = os.path.join(current_dir, 'test_imgs_RGBD')
    foundationpose_refine_dir = os.path.join(current_dir, '2023-10-28-18-33-37')
    foundationpose_score_dir = os.path.join(current_dir, '2024-01-11-20-02-45')
    bop_dataset_item = bop_dataset(dataset_path)
    obj_id = 1
    obj_index = list(bop_dataset_item.obj_id_list).index(obj_id)
    obj_model_path = bop_dataset_item.obj_model_list[obj_index]
    CUDA_DEVICE = '0'
    hccepose_vis = True
    hccepose_acceleration = 'pytorch'
    foundationpose_vis = False
    foundationpose_vis_stages = [1, 2, 3, 4, 5, 'score']
    foundationpose_acceleration = 'onnx'
    megapose_vis = True
    megapose_vis_stages = [1, 2, 3, 4, 5]
    megapose_variants = [('rgbd', True), ('rgb', False)]
    save_visualizations = hccepose_vis or foundationpose_vis or megapose_vis
    print_stage_timing = False

    foundationpose_runner = Tester(
        bop_dataset_item,
        hccepose_vis=hccepose_vis,
        CUDA_DEVICE=CUDA_DEVICE,
        foundationpose_refine_dir=foundationpose_refine_dir,
        foundationpose_score_dir=foundationpose_score_dir,
        hccepose_acceleration=hccepose_acceleration,
        foundationpose_acceleration=foundationpose_acceleration,
    )
    megapose_runner = Tester(
        bop_dataset_item,
        hccepose_vis=hccepose_vis,
        CUDA_DEVICE=CUDA_DEVICE,
        hccepose_acceleration=hccepose_acceleration,
    )

    frame_names = list_capture_frame_names(capture_dir)
    # print_foundationpose_benchmark(...) optional warm-up on frame_names[0]; see repository file.

    for name in frame_names:
        image, depth, depth_m, cam_K = load_capture_frame(capture_dir, name)

        results_hccepose = foundationpose_runner.predict(
            cam_K,
            image,
            [obj_id],
            conf=0.85,
            confidence_threshold=0.85,
            depth=depth_m,
        )
        results_fp = foundationpose_runner.predict(
            cam_K,
            image,
            [obj_id],
            conf=0.85,
            confidence_threshold=0.85,
            depth=depth_m,
            use_foundationpose=True,
            foundationpose_vis=foundationpose_vis,
            foundationpose_vis_stages=foundationpose_vis_stages,
        )

        print_stage_time_breakdown(results_hccepose, enabled=print_stage_timing, prefix='%s | HccePose' % name)
        print_stage_time_breakdown(results_fp, enabled=print_stage_timing, prefix='%s | FoundationPose' % name)

        save_visual_artifacts([
            (os.path.join(capture_dir, '%s_hccepose_show_2d.jpg' % name), results_hccepose.get('show_2D_results')),
            (os.path.join(capture_dir, '%s_hccepose_show_6d_vis0.jpg' % name), results_hccepose.get('show_6D_vis0')),
            (os.path.join(capture_dir, '%s_hccepose_show_6d_vis1.jpg' % name), results_hccepose.get('show_6D_vis1')),
            (os.path.join(capture_dir, '%s_hccepose_show_6d_vis2.jpg' % name), results_hccepose.get('show_6D_vis2')),
            (os.path.join(capture_dir, '%s_foundationpose_show_2d.jpg' % name), results_fp.get('show_2D_results')),
            (os.path.join(capture_dir, '%s_foundationpose_show_6d_vis0.jpg' % name), results_fp.get('show_6D_vis0')),
            (os.path.join(capture_dir, '%s_foundationpose_show_6d_vis1.jpg' % name), results_fp.get('show_6D_vis1')),
            (os.path.join(capture_dir, '%s_foundationpose_show_6d_vis2.jpg' % name), results_fp.get('show_6D_vis2')),
            (os.path.join(capture_dir, '%s_show_foundationpose.jpg' % name), results_fp.get('show_foundationpose')),
        ], enabled=save_visualizations)

        for megapose_variant_name, megapose_use_depth in megapose_variants:
            megapose_depth = depth_m if megapose_use_depth else None
            results_mp = megapose_runner.predict(
                cam_K,
                image,
                [obj_id],
                conf=0.85,
                confidence_threshold=0.85,
                depth=megapose_depth,
                use_megapose=True,
                megapose_vis=megapose_vis,
                megapose_vis_stages=megapose_vis_stages,
            )

            print_stage_time_breakdown(results_mp, enabled=print_stage_timing, prefix='%s | MegaPose %s' % (name, megapose_variant_name.upper()))

            save_visual_artifacts([
                (os.path.join(capture_dir, '%s_show_megapose_%s.jpg' % (name, megapose_variant_name)), results_mp.get('show_megapose')),
                (os.path.join(capture_dir, '%s_megapose_%s_show_2d.jpg' % (name, megapose_variant_name)), results_mp.get('show_2D_results')),
                (os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis0.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis0')),
                (os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis1.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis1')),
                (os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis2.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis2')),
            ], enabled=save_visualizations)

            pose_sets_mm = {}
            if obj_id in results_hccepose and 'Rts' in results_hccepose[obj_id]:
                pose_sets_mm['HccePose'] = results_hccepose[obj_id]['Rts']
            if obj_id in results_fp and 'Rts' in results_fp[obj_id]:
                pose_sets_mm['FoundationPose'] = results_fp[obj_id]['Rts']
            if obj_id in results_mp and 'Rts' in results_mp[obj_id]:
                pose_sets_mm['MegaPose'] = results_mp[obj_id]['Rts']
            if save_visualizations:
                depth_compare_vis, depth_compare_summary = build_depth_comparison_visual(
                    depth,
                    cam_K,
                    obj_model_path,
                    pose_sets_mm,
                    device=str(foundationpose_runner.device),
                    max_items=4,
                )
                save_visual_artifacts([
                    (os.path.join(capture_dir, '%s_compare_depth_hccepose_foundationpose_megapose_%s.jpg' % (name, megapose_variant_name)), depth_compare_vis),
                ], enabled=True)

The authoritative version (including imports for print_foundationpose_benchmark) is s4_p3_test_mi10_bin_picking_RGBD_FP_vs_MP.py.


🎯 Visualization Results (depth comparison)

Depth-aligned comparison panels for MegaPose RGB vs MegaPose RGB-D branches (000003):


πŸ§ͺ BOP Challenge Testing

You can use the script s4_p2_test_bf_pbr_bop_challenge.py to evaluate HccePose(BF) across the seven core BOP datasets.

Pretrained Weights

Dataset Weights Link
LM-O Hugging Face - LM-O
YCB-V Hugging Face - YCB-V
T-LESS Hugging Face - T-LESS
TUD-L Hugging Face - TUD-L
HB Hugging Face - HB
ITODD Hugging Face - ITODD
IC-BIN Hugging Face - IC-BIN

Example: LM-O Dataset

As an example, we evaluated HccePose(BF) on the widely used LM-O dataset from the BOP benchmark. We adopted the default 2D detector (GDRNPP) from the BOP 2023 Challenge and obtained the following output files:

These two files were submitted on October 20, 2025. The results are shown below.
The 6D localization score remains consistent with the 2024 submission,
while the 2D segmentation score improved by 0.002, thanks to the correction of minor implementation bugs.

Click to expand
BOP LM-O results

βš™οΈ Notes

  • If some pretrained weights show an iteration count of 0, this is not an error. All HccePose(BF) weights are fine-tuned from the standard HccePose model trained using only the front surface. In some cases, the initial weights already achieve optimal performance.

πŸ“… Update Plan

We are currently organizing and updating the following modules:

  • πŸ“ HccePose(BF) weights for the seven core BOP datasets

  • πŸ§ͺ BOP Challenge testing pipeline

  • πŸ” 6D pose inference via inter-frame tracking

  • 🏷️ Real-world 6D pose dataset preparation based on HccePose(BF)

  • βš™οΈ PBR + Real training workflow

  • πŸ“˜ Tutorials on object preprocessing, data rendering, YOLOv11 label preparation and training, as well as HccePose(BF) label preparation and training

All components are expected to be completed by the end of 2025, with continuous daily updates whenever possible.


πŸ† BOP LeaderBoards

BOP 6D localization leaderboard BOP 2D segmentation leaderboard

Acknowledgments

This project builds on public datasets, benchmarks, and methods that the code and tutorials reference directly, including: the BOP benchmark and bop_toolkit; BlenderProc for rendering; Ultralytics YOLO for detection; FoundationPose and MegaPose for RGB-D refinement integrations; and KASAL where used in the dependency stack.


If you find our work useful, please cite it as follows:

@InProceedings{HccePose_BF,
    author    = {Wang, Yulin and Hu, Mengting and Li, Hongli and Luo, Chen},
    title     = {HccePose(BF): Predicting Front \& Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {7166-7175}
}

About

[ICCV'25 Highlight] HccePose (BF): Predicting Front & Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors