HccePose(BF) introduces a Hierarchical Continuous Coordinate Encoding (HCCE) mechanism that encodes the three coordinate components of object surface points into hierarchical continuous codes. Through this hierarchical encoding scheme, the neural network can effectively learn the correspondence between 2D image features and 3D surface coordinates of the object, while significantly enhancing its capability to learn accurate object masks. Unlike traditional methods that only learn the visible front surface of objects, HccePose(BF) additionally learns the 3D coordinates of the back surface, thereby establishing denser 2Dβ3D correspondences and substantially improving pose estimation accuracy.
β οΈ Note: All paths must be absolute paths to avoid runtime errors.- 2025.10.27: Weβve released cc0textures-512, a lightweight alternative to the original 44GB CC0Textures library β now only 600MB! π Download here
- 2025.10.28: s4_p1_gen_bf_labels.py has been updated. If the dataset does not contain a camera.json, the script will automatically create a default one.
- 2026.04.04: RGB-D refinement with FoundationPose / MegaPose (
Refinement/, examples4_p3_test_mi10_bin_picking_RGBD_*.py); optional ONNX / TensorRT acceleration for HccePose and FoundationPose (hccepose_acceleration,foundationpose_acceleration); per-stage timings viaresults_dict['time_dict']andprint_stage_time_breakdown. Sample RGB-D frames000000β000003are on Hugging Face β test_imgs_RGBD (each stem:{stem}_rgb.png,{stem}_depth.png,{stem}_camK.json). A git checkout may still ship only000003_*undertest_imgs_RGBD/to keep the repo small; download that folder for all four stems (seescripts/download_hf_assets.py,hf-dataset-card/README.md). - 2026.04.04 (docs): Quick Start and RGB-D sections now state the OpenCV BGR convention (
cv2.imread/VideoCaptureβ pass unchanged toTester.predict); removed erroneousCOLOR_RGB2BGRafterimread. Code comments aligned with BGR training norms, FoundationPose BGRβRGB at the refiner entry, and MegaPose BGR debug panels. - 2026.04.06 (docs): Minimal inference checklist, first-run time expectations, troubleshooting, optional
requirements-inference.txt, and a note to use a dedicated venv/conda env when ONNX scripts may auto-install packages.
Configuration Details
Download the HccePose(BF) Project and Unzip BOP-related Toolkits
# Clone the project
git clone https://github.com/WangYuLin-SEU/HCCEPose.git
cd HCCEPose
# Unzip toolkits
unzip bop_toolkit.zip
unzip blenderproc.zipConfigure Ubuntu System Environment (Python 3.10)
β οΈ A GPU driver with EGL support must be pre-installed.Version pins below match a reference Conda
py310interpreter (Python 3.10.19) afterpip install(PyTorch wheels report suffixes such as+cu128inpip list).
apt-get update && apt-get install -y wget software-properties-common gnupg2 python3-pip
apt-get update && apt-get install -y libegl1-mesa-dev libgles2-mesa-dev libx11-dev libxext-dev libxrender-dev
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
apt-get update && apt-get install pkg-config libglvnd0 libgl1 libglx0 libegl1 libgles2 libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev cmake curl ninja-build
pip install ultralytics==8.3.233 fvcore==0.1.5.post20221221 pybind11==2.12.0 trimesh==4.2.2 ninja==1.11.1.1 kornia==0.6.9 open3d==0.19.0 transformations==2024.6.1 numpy==1.26.4 opencv-python==4.9.0.80 opencv-contrib-python==4.9.0.80
pip install scipy==1.15.3 kiwisolver==1.4.9 matplotlib==3.10.7 imageio==2.37.2 pypng Cython==3.2.1 PyOpenGL triangle glumpy Pillow==11.3.0 vispy imgaug mathutils pyrender pytz tqdm tensorboard kasal-6d==0.1.3 rich==14.2.0 h5py==3.15.1 diffrp-nvdiffrast
pip install bpy==3.6.0 --extra-index-url https://download.blender.org/pypi/
apt-get install libsm6 libxrender1 libxext-dev
python -c "import imageio; imageio.plugins.freeimage.download()"
pip install -U "huggingface_hub[hf_transfer]==0.36.0"
One-file shortcut (inference stack, excludes bpy): after installing torch / torchvision / torchaudio from the PyTorch CUDA index above, run pip install -r requirements-inference.txt (see comments in that file). Full BlenderProc / training users should still follow the bpy line when needed.
Optional: RGB-D refinement & acceleration
- BOP toolkit path: keep
bop_toolkit/at the project root (unzipbop_toolkit.ziphere) so imports match the training and test scripts. - FoundationPose (RGB-D refinement): nvdiffrast (
import nvdiffrast.torch,Refinement/foundationpose.py) is provided bydiffrp-nvdiffrast, already listed in the mainpip installcommands above. To build from source instead, see NVlabs/nvdiffrast. Weights are not included in this repo. Download the official bundle from NVlabs/FoundationPose (Data prepare) β Google Drive folder β and place the refiner and scorer under the project root as2023-10-28-18-33-37/and2024-01-11-20-02-45/(each withconfig.yml,model_best.pth), matching this repositoryβs scripts. Optional mirror (not NVIDIA-hosted): gpue/foundationpose-weights. License: use of FoundationPose weights is subject to the official FoundationPose license; do not assume unrestricted commercial use. - ONNX Runtime GPU / TensorRT: HccePose and FoundationPose can use
HccePose.hccepose_accelerationandRefinement.foundationpose_accelerationfor ONNX or TensorRT backends when enabled from the test scripts (sees4_p3_test_mi10_bin_picking_onnx.py,s4_p3_test_mi10_bin_picking_tensorrt.py, and RGB-D examples). On first use,HccePose.tester.Testercallsensure_acceleration_backend_environmentto install/check onnx and onnxruntime-gpu when needed; TensorRT additionally requires matching libnvinfer (e.g.pip install tensorrtor NVIDIAβs tarball onLD_LIBRARY_PATH). Example pins from the same reference py310 env: onnx==1.21.0, onnxruntime-gpu==1.23.2, tensorrt 10.16.x (variant depends on your CUDA stackβalign with your ORT build). - MegaPose: on first
register_megapose()/ first MegaPose refinement path, the code can automatically clone megapose6d intothird_party_megapose6d/, create a dedicated Python 3.9 prefix under.envs/megapose/(viaconda create -p β¦ python=3.9), install MegaPoseβs own PyTorch/torchvision stack there, and download models (e.g. underlocal_data/megapose-models). Inference runs in subprocesses using.envs/megapose/bin/python, not your main HccePose interpreter (e.g. Python 3.10): do not install MegaPoseβs torch stack into the py310 envβkeep the two environments separate. Requires conda onPATH, network, and a writable project directory. License: MegaPose code and models follow the megapose6d upstream license.
Hosts such as AutoDL often need the platform VPN / academic accelerator before talking to Hugging Face (official site or API). If your image provides it:
source /etc/network_turboThen run the download commands in the same shell session. After that, prefer --endpoint hf (official huggingface.co) for both scripts; third-party mirrors may return HTTP 403 to the tree API from some regions.
The helper scripts/download_hf_assets.py lives only in this GitHub repo, not in the Hugging Face dataset file list. Run it from the repository root: it downloads into the same relative paths as Quick Start / RGB-D examples (test_imgs/, test_videos/, test_imgs_RGBD/, demo-bin-picking/, demo-tex-objs/ beside HccePose/, etc.) β same layout as copying those folders from the dataset browser or using a fully prepared dev tree.
source /etc/network_turbo # when available
python scripts/download_hf_assets.py --preset test --endpoint hfDefault --dest is the repo root. --endpoint auto tries the official hub, then https://hf-mirror.com. See python scripts/download_hf_assets.py --help for presets and optional --foundationpose (community mirror β check license). Dataset card text: hf-dataset-card/README.md β publish as README.md on the HF dataset repo (it links back to this script and explains paths).
Fallback: per-file wget (when snapshot_download hangs or drops) β scripts/wget_hf_demo_assets.py lists files via the Hub tree API, downloads each with resumable wget -c, and checks byte sizes against the API. It skips generated *_show_* artifacts under test_imgs/ / test_videos/. It also pulls the four FoundationPose weight files into 2023-10-28-18-33-37/ and 2024-01-11-20-02-45/ at the repo root (same layout as the manual / Drive setup). On constrained networks (e.g. AutoDL), run source /etc/network_turbo first if available; point pip and temp dirs at a large disk to avoid filling the root overlay, e.g. export PIP_CACHE_DIR=/path/to/big/pip-cache TMPDIR=/path/to/big/tmp.
source /etc/network_turbo # when available
cd HCCEPose
python scripts/wget_hf_demo_assets.py --endpoint hf # parallel wget (default ~8 workers)
python scripts/wget_hf_demo_assets.py --endpoint hf -j 4 # cap concurrency if the hub rate-limits
python scripts/wget_hf_demo_assets.py --endpoint hf -j 1 # strictly sequential
python scripts/wget_hf_demo_assets.py --endpoint hf --verify-only # size check onlyIf the official hub is unreachable without the accelerator, enable network_turbo (or your own VPN) first, then retry with --endpoint hf. Use --endpoint mirror only if your mirror exposes the Hub tree API without 403.
Use this if you only want to run the Bin-Picking RGB example (s4_p3_test_mi10_bin_picking.py) without the full BlenderProc training stack.
Must have at the repository root
| Item | Why |
|---|---|
HccePose/ |
Core package (from git). |
bop_toolkit/ |
HccePose.bop_loader imports bop_toolkit β unzip bop_toolkit.zip here (or keep an equivalent tree). |
demo-bin-picking/ |
models/, yolo11/, HccePose/ weights (Hugging Face layout). |
test_imgs/ |
Sample images for the script loop. |
Python environment: follow the pip pins in Environment Setup (Python 3.10 recommended). For a one-shot install of the non-PyTorch pins, see requirements-inference.txt after installing torch / torchvision / torchaudio from the CUDA wheel index.
Not needed for that script alone
blenderproc.zipandbpyβ only for BlenderProc synthesis / training-related workflows.- FoundationPose weight folders,
test_imgs_RGBD/, MegaPose / ONNX / TensorRT β only when you run the matchings4_p3_test_*.pyscripts.
Recommended: use a dedicated conda env or venv for HccePose. The ONNX test path may call ensure_acceleration_backend_environment and run pip install for onnx / onnxruntime-gpu; an isolated env avoids surprising your system Python.
| Step | Typical note |
|---|---|
MegaPose (s4_p3_test_mi10_bin_picking_RGBD_megapose.py, etc.) |
First run can take tens of minutes (clone megapose6d, conda env under .envs/megapose/, PyTorch stack, model download). Later runs on the same machine are usually on the order of one minute for the same script, if caches remain. |
download_hf_assets.py / wget_hf_demo_assets.py |
Depends on bandwidth; use source /etc/network_turbo on hosts like AutoDL when available. |
| ONNX | First Tester construction with ONNX may trigger a one-time pip of ONNX Runtime GPU wheels. |
ValueError: All ufuncs must have type numpy.ufunc(often underscipy/imgaug): your NumPy / SciPy pair does not match. Reinstall the versions from Environment Setup (e.g.numpy==1.26.4,scipy==1.15.3) on Python 3.10. Avoid mixing Python 3.12 with the reference pin set unless you re-validate the stack.ImportError: numpy.core.multiarray/AttributeError: _ARRAY_APIonimport cv2: OpenCVβs wheel was built against a different NumPy major than the one installed. Align opencv and numpy with the pinned versions in the environment section (reinstall both in the same env).- TensorRT script fails (
s4_p3_test_mi10_bin_picking_tensorrt.py): TensorRT needs a matchinglibnvinfer/ driver stack. To verify the core pipeline first, runhccepose_acceleration='pytorch'or'onnx'only; treat TensorRT as optional.
Click to expand
Using the demo-bin-picking dataset as an example, we first designed the object in SolidWorks and exported it as an STL mesh file.
STL file link: π https://huggingface.co/datasets/SEU-WYL/HccePose/blob/main/raw-demo-models/multi-objs/board.STL
Then, the STL file was imported into MeshLab, and surface colors were filled using the Vertex Color Filling tool.
After coloring, the object was exported as a non-binary PLY file containing vertex colors and normals.
The exported model center might not coincide with the coordinate origin, as shown below:
To align the model center with the origin, use the script s1_p1_obj_rename_center.py.
This script loads the PLY file, aligns the model center, and renames it following BOP conventions.
The obj_id must be set manually as a unique non-negative integer for each object.
Example:
input_ply |
obj_id |
output_ply |
|---|---|---|
board.ply |
1 |
obj_000001.ply |
board.ply |
2 |
obj_000002.ply |
After centering and renaming all objects, place them into a folder named models with the following structure:
Dataset_Name
|--- models
|--- obj_000001.ply
...
|--- obj_000015.plyClick to expand
In 6D pose estimation tasks, many objects exhibit various types of rotational symmetry, such as cylindrical, conical, or polyhedral symmetry. For such objects, the KASAL tool is used to analyze and export symmetry priors in BOP format.
KASAL project: π https://github.com/WangYuLin-SEU/KASAL
Installation:
pip install kasal-6dLaunch the KASAL GUI with:
from kasal.app.polyscope_app import app
mesh_path = 'demo-bin-picking'
app(mesh_path)KASAL automatically scans all PLY or OBJ files under mesh_path (excluding generated _sym.ply files).
In the interface:
- Use
Symmetry Typeto select the symmetry category - For n-fold pyramidal or prismatic symmetry, set
N (n-fold) - Enable
ADI-Cfor texture-symmetric objects - If the result is inaccurate, use
axis xyzfor manual fitting
KASAL defines 8 symmetry types. Selecting the wrong one will result in visual anomalies, helping verify your choice.
Click Cal Current Obj to compute the objectβs symmetry axis.
Symmetry priors will be saved as:
- Symmetry prior file:
obj_000001_sym_type.json - Visualization file:
obj_000001_sym.ply
Click to expand
Run s1_p3_obj_infos.py to traverse all ply files and their symmetry priors in the models folder.
This script generates a standard models_info.json file in BOP format.
Example structure:
Dataset_Name
|--- models
|--- models_info.json
|--- obj_000001.ply
...
|--- obj_000015.plyThis file serves as the foundation for PBR rendering, YOLOv11 training, and HccePose(BF) model training.
Click to expand
Based on BlenderProc, we modified a rendering script β s2_p1_gen_pbr_data.py β for generating new datasets. Running this script directly in Python may cause a memory leak, which accumulates over time and gradually degrades rendering performance. To address this issue, we provide a Shell script β s2_p1_gen_pbr_data.sh β that repeatedly invokes s2_p1_gen_pbr_data.py, effectively preventing memory accumulation and improving efficiency. Additionally, several adjustments were made to BlenderProc to better support PBR dataset generation for new object sets.
Before rendering, use s2_p0_download_cc0textures.py to download the CC0Textures material library.
After downloading, the directory structure should look like this:
HCCEPose
|--- s2_p0_download_cc0textures.py
|--- cc0textures
The cc0textures library occupies about 44GB of disk space, which is quite large. To reduce storage requirements, we provide a lightweight alternative called cc0textures-512, with a size of approximately 600MB. Download link: π https://huggingface.co/datasets/SEU-WYL/HccePose/blob/main/cc0textures-512.zip
When running the rendering script, simply replace the cc0textures path with cc0textures-512 to use the lightweight material library.
(It is sufficient to download only cc0textures-512; the original cc0textures is not required.)
The s2_p1_gen_pbr_data.py script is responsible for PBR data generation, and it is adapted from BlenderProc2.
Run the following commands:
cd HCCEPose
chmod +x s2_p1_gen_pbr_data.sh
nohup ./s2_p1_gen_pbr_data.sh 0 42 xxx/xxx/cc0textures xxx/xxx/demo-bin-picking xxx/xxx/s2_p1_gen_pbr_data.py > s2_p1_gen_pbr_data.log 2>&1 &Folder Structure
After executing the above process, the program will:
- Use materials from
xxx/xxx/cc0textures; - Load 3D object models from
xxx/xxx/demo-bin-picking/models; - Generate
42 folders, each containing1000 PBR-rendered frames, underxxx/xxx/demo-bin-picking.
The resulting structure will be:
demo-bin-picking
|--- models
|--- train_pbr
|--- 000000
|--- 000001
...
|--- 000041
Click to expand
In 6D pose estimation tasks, a 2D detector is typically used to locate the objectβs bounding box, from which cropped image regions are used for 6D pose estimation. Compared with directly regressing 6D poses from the entire image, the two-stage approach (2D detection β 6D pose estimation) offers better accuracy and stability. Therefore, HccePose(BF) is equipped with a 2D detector based on YOLOv11.
The following sections describe how to convert BOP-format PBR training data into YOLO-compatible data and how to train YOLOv11.
To automate the conversion from BOP-style PBR data to YOLO training data, we provide the s3_p1_prepare_yolo_label.py script. After specifying the dataset path xxx/xxx/demo-bin-picking and running the script, the program will create a new folder named yolo11 inside demo-bin-picking.
The generated directory structure is as follows:
demo-bin-picking
|--- models
|--- train_pbr
|--- yolo11
|--- train_obj_s
|--- images
|--- labels
|--- yolo_configs
|--- data_objs.yaml
|--- autosplit_train.txt
|--- autosplit_val.txt
Explanation:
imagesβ Folder containing 2D training imageslabelsβ Folder containing 2D bounding box (BBox) annotationsdata_objs.yamlβ YOLO configuration fileautosplit_train.txtβ List of training samplesautosplit_val.txtβ List of validation samples
To train the YOLOv11 detector, use the s3_p2_train_yolo.py script. After specifying the dataset path xxx/xxx/demo-bin-picking, run the script to train YOLOv11 and save the best model weights as yolo11-detection-obj_s.pt.
The final directory structure after training is shown below:
demo-bin-picking
|--- models
|--- train_pbr
|--- yolo11
|--- train_obj_s
|--- detection
|--- obj_s
|--- yolo11-detection-obj_s.pt
|--- images
|--- labels
|--- yolo_configs
|--- data_objs.yaml
|--- autosplit_train.txt
|--- autosplit_val.txt
s3_p2_train_yolo.py continuously scans the detection folder for the file yolo11-detection-obj_s.pt.
This mechanism allows the training process to automatically resume after unexpected interruptions, which is particularly useful for cloud servers or environments where training progress cannot be easily monitored β helping to prevent idle GPU time and reduce unnecessary costs.
However, if you intend to restart training from scratch, you must delete the file yolo11-detection-obj_s.pt first;
otherwise, the program will resume from the previous checkpoint instead of reinitializing.
Click to expand
In HccePose(BF), the network simultaneously learns the front-surface and back-surface 3D coordinates of each object. To generate these labels, separate depth maps are rendered for the front and back surfaces.
During front-surface rendering, gl.glDepthFunc(gl.GL_LESS) is applied to preserve the smallest depth values, corresponding to the points closest to the camera. These are defined as the front surfaces, following the βfront-faceβ concept used in traditional back-face culling. Similarly, for back-surface rendering, gl.glDepthFunc(gl.GL_GREATER) is used to retain the largest depth values, corresponding to the farthest visible surfaces. Finally, the 3D coordinate label maps are generated based on these depth maps and the ground-truth 6D poses.
For symmetric objects, both discrete and continuous rotational symmetries are represented as a unified set of symmetry matrices.
Using these matrices and the ground-truth pose, a new set of valid ground-truth poses is computed.
To ensure pose label uniqueness, the pose with the minimum L2 distance from the identity matrix is selected as the final label.
Moreover, due to the imaging principle, when an object undergoes translation without rotation, a visual rotation can occur from a fixed viewpoint. For symmetric objects, this apparent rotation can cause erroneous 3D label maps. To correct this effect, we reconstruct 3D coordinates from the rendered depth maps and apply RANSAC PnP to refine the rotation.
Based on the above procedure, we implement s4_p1_gen_bf_labels.py, a multi-process rendering script for generating front and back 3D coordinate label maps in batches. After specifying the dataset path /root/xxxxxx/demo-bin-picking and the subfolder train_pbr, running the script produces two new folders:
train_pbr_xyz_GT_frontβ Front-surface 3D label mapstrain_pbr_xyz_GT_backβ Back-surface 3D label maps
Directory structure:
demo-bin-picking
|--- models
|--- train_pbr
|--- train_pbr_xyz_GT_back
|--- train_pbr_xyz_GT_front
The following example shows three corresponding images:
the original rendering, the front-surface label map, and the back-surface label map.
Click to expand
When training HccePose(BF), a separate weight model must be trained for each object.
The s4_p2_train_bf_pbr.py script supports multi-GPU batch training across multiple objects.
Taking the demo-tex-objs dataset as an example, the directory structure after training is as follows:
demo-tex-objs
|--- HccePose
|--- obj_01
...
|--- obj_10
|--- models
|--- train_pbr
|--- train_pbr_xyz_GT_back
|--- train_pbr_xyz_GT_front
The ide_debug flag controls whether the script runs in single-GPU or multi-GPU (DDP) mode:
ide_debug=Trueβ Single-GPU mode, ideal for debugging in IDEs.ide_debug=Falseβ Enables DDP (Distributed Data Parallel) training.
Note that directly running DDP training within IDEs such as VSCode may cause communication issues.
Hence, we recommend launching multi-GPU training in a detached session:
screen -S train_ddp
nohup python -u -m torch.distributed.launch --nproc_per_node 6 /root/xxxxxx/s4_p2_train_bf_pbr.py > log4.file 2>&1 &
For single-GPU execution or debugging, use:
nohup python -u /root/xxxxxx/s4_p2_train_bf_pbr.py > log4.file 2>&1 &
To train multiple objects, specify the range of object IDs using start_obj_id and end_obj_id. For example, setting start_obj_id=1 and end_obj_id=5 trains objects obj_000001.ply through obj_000005.ply. To train a single object, set both values to the same number.
You may also adjust total_iteration according to training needs (default: 50000). For DDP training, the total number of training samples is computed as:
total samples = total iteration Γ batch size Γ GPU number
Smallest path to a first result: see Minimal setup (inference demo) β Bin-Picking RGB demo only, without BlenderProc /
bpy.
This project provides a simple HccePose(BF)-based application example for the Bin-Picking task.
To reduce reproduction difficulty, both the objects (3D printed with standard white PLA material) and the camera (Xiaomi smartphone) are easily accessible devices.
You can:
- Print the sample object multiple times
- Randomly place the printed objects
- Capture photos freely using your phone
- Directly perform 2D detection, 2D segmentation, and 6D pose estimation using the pretrained weights provided in this project
Please keep the folder hierarchy unchanged.
| Type | Resource Link |
|---|---|
| π¨ Object 3D Models | demo-bin-picking/models |
| π YOLOv11 Weights | demo-bin-picking/yolo11 |
| π HccePose Weights | demo-bin-picking/HccePose |
| πΌοΈ Test Images | test_imgs |
| π₯ Test Videos | test_videos |
| π· RGB-D (Hugging Face) | test_imgs_RGBD β 000000β000003 ({stem}_rgb.png, {stem}_depth.png, {stem}_camK.json). Example: hf download dataset SEU-WYL/HccePose --repo-type dataset --include "test_imgs_RGBD/*" or scripts/download_hf_assets.py --preset test |
| π· RGB-D (minimal git tree) | test_imgs_RGBD/ may ship 000003_* only in git for a minimal clone; copy or download the Hugging Face folder into test_imgs_RGBD/ for multi-frame RGB-D scripts |
β οΈ Note:
Files beginning withtrain_are only required for training.
For this Quick Start section, only the above test files are needed.
During testing, import the following modules:
HccePose.testerβ Integrated testing module covering 2D detection, segmentation, and 6D pose estimation.HccePose.bop_loaderβ BOP-format dataset loader for loading object models and training data.
The following image shows the experimental setup:
Click to expand
Several white 3D-printed objects are placed inside a bowl on a white table, then photographed with a mobile phone.
Example input image π
Source image: Example Link
You can directly use the following script for 6D pose estimation and visualization:
Color layout:
cv2.imread/VideoCapture.readyield BGRuint8. Pass them unchanged toTester.predict(same ass4_p3_test_mi10_bin_picking.py). HccePose normalizes withIMAGENET_MEAN_BGR/IMAGENET_STD_BGR. FoundationPose converts BGRβRGB insideRefinement_FP.inference_batch. MegaPose feeds RGB to the upstream estimator; MegaPose debug panels are BGR forcv2.imwrite.
Click to expand code
import cv2, os, sys
import numpy as np
from HccePose.bop_loader import bop_dataset
from HccePose.test_script_utils import print_stage_time_breakdown, save_visual_artifacts
from HccePose.tester import Tester
if __name__ == '__main__':
sys.path.insert(0, os.getcwd())
current_dir = os.path.dirname(sys.argv[0])
dataset_path = os.path.join(current_dir, 'demo-bin-picking')
test_img_path = os.path.join(current_dir, 'test_imgs')
bop_dataset_item = bop_dataset(dataset_path)
obj_id = 1
CUDA_DEVICE = '0'
hccepose_vis = True
save_visualizations = hccepose_vis
print_stage_timing = False
hccepose_acceleration = 'pytorch'
Tester_item = Tester(
bop_dataset_item,
hccepose_vis=hccepose_vis,
CUDA_DEVICE=CUDA_DEVICE,
hccepose_acceleration=hccepose_acceleration,
)
for name in ['IMG_20251007_165718']:
file_name = os.path.join(test_img_path, '%s.jpg' % name)
image = cv2.imread(file_name)
cam_K = np.array([
[2.83925618e+03, 0.00000000e+00, 2.02288638e+03],
[0.00000000e+00, 2.84037288e+03, 1.53940473e+03],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00],
])
results_dict = Tester_item.predict(
cam_K, image, [obj_id], conf=0.85, confidence_threshold=0.85,
)
print_stage_time_breakdown(results_dict, enabled=print_stage_timing, prefix=name)
save_visual_artifacts([
(file_name.replace('.jpg', '_show_2d.jpg'), results_dict.get('show_2D_results')),
(file_name.replace('.jpg', '_show_6d_vis0.jpg'), results_dict.get('show_6D_vis0')),
(file_name.replace('.jpg', '_show_6d_vis1.jpg'), results_dict.get('show_6D_vis1')),
(file_name.replace('.jpg', '_show_6d_vis2.jpg'), results_dict.get('show_6D_vis2')),
], enabled=save_visualizations)Full options (acceleration, missing-image skips, output prefixes) match the repository script s4_p3_test_mi10_bin_picking.py.
Video demos: s4_p3_test_mi10_bin_picking_video.py and s4_p3_test_mi10_tex_objs_video.py iterate whole clips under test_videos/ and can take a long time. For routine validation, prefer the single-image scripts above; use the *_video.py files when you need full-sequence outputs.
2D Detection Result (_show_2d.jpg):
Network Outputs:
-
HCCE-based front and back surface coordinate encodings
-
Object mask
-
Decoded 3D coordinate visualizations
Please consider giving it a βοΈ on GitHub! Your support motivates us to keep improving and updating the project π
Detailed Content
The single-frame pose estimation pipeline can be easily extended to video sequences, enabling continuous-frame 6D pose estimation, as shown in the following example:
Click to expand code
import cv2, os, sys
import numpy as np
from HccePose.bop_loader import bop_dataset
from HccePose.test_script_utils import print_stage_time_breakdown
from HccePose.tester import Tester
if __name__ == '__main__':
sys.path.insert(0, os.getcwd())
current_dir = os.path.dirname(sys.argv[0])
dataset_path = os.path.join(current_dir, 'demo-bin-picking')
test_video_path = os.path.join(current_dir, 'test_videos')
bop_dataset_item = bop_dataset(dataset_path)
obj_id = 1
CUDA_DEVICE = '0'
hccepose_vis = True
hccepose_acceleration = 'pytorch'
save_visualizations = hccepose_vis
print_stage_timing = False
Tester_item = Tester(
bop_dataset_item,
hccepose_vis=hccepose_vis,
CUDA_DEVICE=CUDA_DEVICE,
hccepose_acceleration=hccepose_acceleration,
)
for name in ['VID_20251009_141247']:
file_name = os.path.join(test_video_path, '%s.mp4' % name)
cap = cv2.VideoCapture(file_name)
fps = cap.get(cv2.CAP_PROP_FPS)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out_1 = None
out_2 = None
cam_K = np.array([
[1.63235512e+03, 0.00000000e+00, 9.74032712e+02],
[0.00000000e+00, 1.64159967e+03, 5.14229781e+02],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00],
])
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Frames are already BGR (same convention as imread).
results_dict = Tester_item.predict(
cam_K, frame, [obj_id], conf=0.85, confidence_threshold=0.85,
)
print_stage_time_breakdown(results_dict, enabled=print_stage_timing, prefix='%s frame' % name)
fps_hccepose = 1 / results_dict['time']
if not save_visualizations:
continue
show_6D_vis1 = results_dict['show_6D_vis1']
show_6D_vis1[show_6D_vis1 < 0] = 0
show_6D_vis1[show_6D_vis1 > 255] = 255
if out_1 is None:
out_1 = cv2.VideoWriter(
file_name.replace('.mp4', '_show_1.mp4'),
fourcc,
fps,
(show_6D_vis1.shape[1], show_6D_vis1.shape[0]),
)
out_1.write(show_6D_vis1.astype(np.uint8))
show_6D_vis2 = results_dict['show_6D_vis2']
show_6D_vis2[show_6D_vis2 < 0] = 0
show_6D_vis2[show_6D_vis2 > 255] = 255
if out_2 is None:
out_2 = cv2.VideoWriter(
file_name.replace('.mp4', '_show_2.mp4'),
fourcc,
fps,
(show_6D_vis2.shape[1], show_6D_vis2.shape[0]),
)
cv2.putText(
show_6D_vis2,
'FPS: {0:.2f}'.format(fps_hccepose),
(20, 60),
cv2.FONT_HERSHEY_SIMPLEX,
2,
(0, 255, 0),
4,
cv2.LINE_AA,
)
out_2.write(show_6D_vis2.astype(np.uint8))
cap.release()
if out_1 is not None:
out_1.release()
if out_2 is not None:
out_2.release()See s4_p3_test_mi10_bin_picking_video.py for multi-video loops and the same flags.
In addition, by passing a list of multiple object IDs to HccePose.tester, multi-object 6D pose estimation can also be achieved.
Please keep the folder hierarchy unchanged.
| Type | Resource Link |
|---|---|
| π¨ Object 3D Models | demo-tex-objs/models |
| π YOLOv11 Weights | demo-tex-objs/yolo11 |
| π HccePose Weights | demo-tex-objs/HccePose |
| πΌοΈ Test Images | test_imgs |
| π₯ Test Videos | test_videos |
| π· RGB-D (Hugging Face) | test_imgs_RGBD β 000000β000003 (scripts/download_hf_assets.py --preset test) |
| π· RGB-D (minimal git tree) | test_imgs_RGBD/ may ship 000003_* only; download the Hugging Face folder for all sample stems |
β οΈ Note:
Files beginning withtrain_are only required for training.
For this Quick Start section, only the above test files are needed.
This section mirrors the single-image and video tutorials above: the RGB-D capture figure (RGB + colorized depth) is shown directly, then FoundationPose, MegaPose (RGB-D), and a three-way comparison with collapsible code blocks and show_vis/ figures.
Place each frame under test_imgs_RGBD/ as {stem}_rgb.png, {stem}_depth.png, {stem}_camK.json. The JSON stores fx, fy, cx, cy; depth scaling follows convert_depth_to_meter in Refinement/refinement_test_utils.py. Download the sample pack from Hugging Face β test_imgs_RGBD (000000β000003) so it matches that logic. A git snapshot may include only 000003; add 000000β000002 from Hugging Face for multi-frame tables in s4_p3_test_mi10_bin_picking_RGBD_FP_vs_MP.py (e.g. python scripts/download_hf_assets.py --preset test --endpoint auto).
FoundationPose needs 2023-10-28-18-33-37/ and 2024-01-11-20-02-45/ at the repo root (see Optional: RGB-D refinement & acceleration above). MegaPose may auto-setup on first use.
Color: load_capture_frame reads *_rgb.png with cv2.imread β BGR uint8, same as the single-image tutorial. save_visual_artifacts / cv2.imwrite expect BGR for color JPEG/PNG. FoundationPose and MegaPose refinement paths apply the internal RGB/BGR conversions described in the Quick Start note above.
HccePose.tester.Testerβ pass meter-scale depth asdepth=depth_m, and setuse_foundationpose=Trueoruse_megapose=Truetogether with the correspondingfoundationpose_*/megapose_*kwargs.Refinement.refinement_test_utilsβlist_capture_frame_names,load_capture_frame, and (for the comparison script)build_depth_comparison_visual.HccePose.test_script_utilsβsave_visual_artifacts,print_stage_time_breakdown(driven byprint_stage_timingin the scripts).results_dict['time_dict']β optional per-stage timings (YOLO, HccePose, FoundationPose, MegaPose, visualization, β¦).
ONNX / TensorRT: HccePose uses s4_p3_test_mi10_bin_picking_onnx.py / s4_p3_test_mi10_bin_picking_tensorrt.py; FoundationPose refinement can set foundationpose_acceleration in the RGB-D scripts. The comparison demo sets foundationpose_acceleration='onnx' by default in the repository file.
Example RGB-D view: left = RGB (000003_rgb.png), right = pseudo-color depth (000003_depth.png) with a compact TURBO color bar embedded on the right edge of the depth panel; numeric ticks (min / mid / max, meters) sit immediately to the left of that bar. Depth uses convert_depth_to_meter from Refinement/refinement_test_utils.py (same as load_capture_frame), then linear mapping on valid pixels; invalid/zero depth stays black. RGB and depth panels are the same size and concatenated side by side.
Example raw inputs (also mirrored on Hugging Face β test_imgs_RGBD for 000000β000003): test_imgs_RGBD/000003_rgb.png, 000003_depth.png, 000003_camK.json.
Run s4_p3_test_mi10_bin_picking_RGBD_foundationpose.py after placing FoundationPose weights. Toggle hccepose_vis / foundationpose_vis when you need HccePose or FoundationPose debug renders (files are written next to the captures unless you change the paths).
Click to expand code
import cv2, os, sys
from HccePose.bop_loader import bop_dataset
from HccePose.test_script_utils import print_stage_time_breakdown, save_visual_artifacts
from HccePose.tester import Tester
from Refinement.refinement_test_utils import load_capture_frame, list_capture_frame_names
if __name__ == '__main__':
sys.path.insert(0, os.getcwd())
current_dir = os.path.dirname(sys.argv[0])
dataset_path = os.path.join(current_dir, 'demo-bin-picking')
capture_dir = os.path.join(current_dir, 'test_imgs_RGBD')
foundationpose_refine_dir = os.path.join(current_dir, '2023-10-28-18-33-37')
foundationpose_score_dir = os.path.join(current_dir, '2024-01-11-20-02-45')
bop_dataset_item = bop_dataset(dataset_path)
obj_id = 1
CUDA_DEVICE = '0'
hccepose_vis = False
foundationpose_vis = False
foundationpose_vis_stages = [1, 2, 3, 4, 5, 'score']
hccepose_acceleration = 'pytorch'
foundationpose_acceleration = 'pytorch'
save_visualizations = hccepose_vis or foundationpose_vis
print_stage_timing = False
tester_item = Tester(
bop_dataset_item,
hccepose_vis=hccepose_vis,
CUDA_DEVICE=CUDA_DEVICE,
foundationpose_refine_dir=foundationpose_refine_dir,
foundationpose_score_dir=foundationpose_score_dir,
hccepose_acceleration=hccepose_acceleration,
foundationpose_acceleration=foundationpose_acceleration,
)
frame_names = list_capture_frame_names(capture_dir)
for name in frame_names:
image, depth, depth_m, cam_K = load_capture_frame(capture_dir, name)
results_dict = tester_item.predict(
cam_K,
image,
[obj_id],
conf=0.85,
confidence_threshold=0.85,
depth=depth_m,
use_foundationpose=True,
foundationpose_vis=foundationpose_vis,
foundationpose_vis_stages=foundationpose_vis_stages,
)
print_stage_time_breakdown(results_dict, enabled=print_stage_timing, prefix=name)
save_visual_artifacts([
(os.path.join(capture_dir, '%s_show_2d.jpg' % name), results_dict.get('show_2D_results')),
(os.path.join(capture_dir, '%s_show_6d_vis0.jpg' % name), results_dict.get('show_6D_vis0')),
(os.path.join(capture_dir, '%s_show_6d_vis1.jpg' % name), results_dict.get('show_6D_vis1')),
(os.path.join(capture_dir, '%s_show_6d_vis2.jpg' % name), results_dict.get('show_6D_vis2')),
(os.path.join(capture_dir, '%s_show_foundationpose.jpg' % name), results_dict.get('show_foundationpose')),
], enabled=save_visualizations)This block matches the checked-in s4_p3_test_mi10_bin_picking_RGBD_foundationpose.py.
FoundationPose fusion view and decoded HccePose-style maps after refinement (exported from frame 000003; figures live under show_vis/ for the documentation):
s4_p3_test_mi10_bin_picking_RGBD_megapose.py sets megapose_use_depth=True so megapose_variant_name='rgbd'. Switch to RGB-only by setting megapose_use_depth=False ('rgb' suffix in filenames).
Click to expand code
import os, sys
import cv2
from HccePose.bop_loader import bop_dataset
from HccePose.test_script_utils import print_stage_time_breakdown, save_visual_artifacts
from HccePose.tester import Tester
from Refinement.refinement_test_utils import load_capture_frame, list_capture_frame_names
if __name__ == '__main__':
sys.path.insert(0, os.getcwd())
current_dir = os.path.dirname(sys.argv[0])
dataset_path = os.path.join(current_dir, 'demo-bin-picking')
capture_dir = os.path.join(current_dir, 'test_imgs_RGBD')
bop_dataset_item = bop_dataset(dataset_path)
obj_id = 1
CUDA_DEVICE = '0'
hccepose_vis = True
hccepose_acceleration = 'pytorch'
megapose_use_depth = True
megapose_vis = True
megapose_vis_stages = [1, 2, 3, 4, 5]
megapose_variant_name = 'rgbd' if megapose_use_depth else 'rgb'
save_visualizations = hccepose_vis or megapose_vis
print_stage_timing = False
tester_item = Tester(
bop_dataset_item,
hccepose_vis=hccepose_vis,
CUDA_DEVICE=CUDA_DEVICE,
hccepose_acceleration=hccepose_acceleration,
)
frame_names = list_capture_frame_names(capture_dir)
for name in frame_names:
image, depth, depth_m, cam_K = load_capture_frame(capture_dir, name)
megapose_depth = depth_m if megapose_use_depth else None
results_mp = tester_item.predict(
cam_K,
image,
[obj_id],
conf=0.85,
confidence_threshold=0.85,
depth=megapose_depth,
use_megapose=True,
megapose_vis=megapose_vis,
megapose_vis_stages=megapose_vis_stages,
)
print_stage_time_breakdown(results_mp, enabled=print_stage_timing, prefix='%s | MegaPose %s' % (name, megapose_variant_name.upper()))
save_visual_artifacts([
(os.path.join(capture_dir, '%s_show_megapose.jpg' % name), results_mp.get('show_megapose')),
(os.path.join(capture_dir, '%s_megapose_%s_show_2d.jpg' % (name, megapose_variant_name)), results_mp.get('show_2D_results')),
(os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis0.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis0')),
(os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis1.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis1')),
(os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis2.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis2')),
], enabled=save_visualizations)This block matches s4_p3_test_mi10_bin_picking_RGBD_megapose.py.
MegaPose debug canvas and decoded maps for the rgbd variant on 000003:
RGB-only MegaPose (megapose_use_depth=False) for the same stem:
s4_p3_test_mi10_bin_picking_RGBD_FP_vs_MP.py runs, for every frame, (1) HccePose with depth, (2) FoundationPose refinement, (3) MegaPose rgb and rgbd variants, saves side-by-side overlays, and builds build_depth_comparison_visual panels. The file also defines print_foundationpose_benchmark to sweep FoundationPose backends on the first frameβomitted below; open the script for the full listing. It defaults foundationpose_acceleration='onnx'; this READMEβs PyTorch 2.8.0+cu128 line includes torch.backends.mha, which ONNX export expects. On much older PyTorch (e.g. 2.2) you may see AttributeError: module 'torch.backends' has no attribute 'mha'βupgrade PyTorch or switch that flag to a non-ONNX mode if appropriate.
Click to expand code (per-frame pipeline)
import cv2, os, sys
import numpy as np
from HccePose.bop_loader import bop_dataset
from HccePose.test_script_utils import print_stage_time_breakdown, save_visual_artifacts
from HccePose.tester import Tester
from Refinement.refinement_test_utils import build_depth_comparison_visual, load_capture_frame, list_capture_frame_names
# The same repository file defines print_foundationpose_benchmark (and related helpers)
# above __main__. Copy that file verbatim to run the optional first-frame backend sweep.
if __name__ == '__main__':
sys.path.insert(0, os.getcwd())
current_dir = os.path.dirname(sys.argv[0])
dataset_path = os.path.join(current_dir, 'demo-bin-picking')
capture_dir = os.path.join(current_dir, 'test_imgs_RGBD')
foundationpose_refine_dir = os.path.join(current_dir, '2023-10-28-18-33-37')
foundationpose_score_dir = os.path.join(current_dir, '2024-01-11-20-02-45')
bop_dataset_item = bop_dataset(dataset_path)
obj_id = 1
obj_index = list(bop_dataset_item.obj_id_list).index(obj_id)
obj_model_path = bop_dataset_item.obj_model_list[obj_index]
CUDA_DEVICE = '0'
hccepose_vis = True
hccepose_acceleration = 'pytorch'
foundationpose_vis = False
foundationpose_vis_stages = [1, 2, 3, 4, 5, 'score']
foundationpose_acceleration = 'onnx'
megapose_vis = True
megapose_vis_stages = [1, 2, 3, 4, 5]
megapose_variants = [('rgbd', True), ('rgb', False)]
save_visualizations = hccepose_vis or foundationpose_vis or megapose_vis
print_stage_timing = False
foundationpose_runner = Tester(
bop_dataset_item,
hccepose_vis=hccepose_vis,
CUDA_DEVICE=CUDA_DEVICE,
foundationpose_refine_dir=foundationpose_refine_dir,
foundationpose_score_dir=foundationpose_score_dir,
hccepose_acceleration=hccepose_acceleration,
foundationpose_acceleration=foundationpose_acceleration,
)
megapose_runner = Tester(
bop_dataset_item,
hccepose_vis=hccepose_vis,
CUDA_DEVICE=CUDA_DEVICE,
hccepose_acceleration=hccepose_acceleration,
)
frame_names = list_capture_frame_names(capture_dir)
# print_foundationpose_benchmark(...) optional warm-up on frame_names[0]; see repository file.
for name in frame_names:
image, depth, depth_m, cam_K = load_capture_frame(capture_dir, name)
results_hccepose = foundationpose_runner.predict(
cam_K,
image,
[obj_id],
conf=0.85,
confidence_threshold=0.85,
depth=depth_m,
)
results_fp = foundationpose_runner.predict(
cam_K,
image,
[obj_id],
conf=0.85,
confidence_threshold=0.85,
depth=depth_m,
use_foundationpose=True,
foundationpose_vis=foundationpose_vis,
foundationpose_vis_stages=foundationpose_vis_stages,
)
print_stage_time_breakdown(results_hccepose, enabled=print_stage_timing, prefix='%s | HccePose' % name)
print_stage_time_breakdown(results_fp, enabled=print_stage_timing, prefix='%s | FoundationPose' % name)
save_visual_artifacts([
(os.path.join(capture_dir, '%s_hccepose_show_2d.jpg' % name), results_hccepose.get('show_2D_results')),
(os.path.join(capture_dir, '%s_hccepose_show_6d_vis0.jpg' % name), results_hccepose.get('show_6D_vis0')),
(os.path.join(capture_dir, '%s_hccepose_show_6d_vis1.jpg' % name), results_hccepose.get('show_6D_vis1')),
(os.path.join(capture_dir, '%s_hccepose_show_6d_vis2.jpg' % name), results_hccepose.get('show_6D_vis2')),
(os.path.join(capture_dir, '%s_foundationpose_show_2d.jpg' % name), results_fp.get('show_2D_results')),
(os.path.join(capture_dir, '%s_foundationpose_show_6d_vis0.jpg' % name), results_fp.get('show_6D_vis0')),
(os.path.join(capture_dir, '%s_foundationpose_show_6d_vis1.jpg' % name), results_fp.get('show_6D_vis1')),
(os.path.join(capture_dir, '%s_foundationpose_show_6d_vis2.jpg' % name), results_fp.get('show_6D_vis2')),
(os.path.join(capture_dir, '%s_show_foundationpose.jpg' % name), results_fp.get('show_foundationpose')),
], enabled=save_visualizations)
for megapose_variant_name, megapose_use_depth in megapose_variants:
megapose_depth = depth_m if megapose_use_depth else None
results_mp = megapose_runner.predict(
cam_K,
image,
[obj_id],
conf=0.85,
confidence_threshold=0.85,
depth=megapose_depth,
use_megapose=True,
megapose_vis=megapose_vis,
megapose_vis_stages=megapose_vis_stages,
)
print_stage_time_breakdown(results_mp, enabled=print_stage_timing, prefix='%s | MegaPose %s' % (name, megapose_variant_name.upper()))
save_visual_artifacts([
(os.path.join(capture_dir, '%s_show_megapose_%s.jpg' % (name, megapose_variant_name)), results_mp.get('show_megapose')),
(os.path.join(capture_dir, '%s_megapose_%s_show_2d.jpg' % (name, megapose_variant_name)), results_mp.get('show_2D_results')),
(os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis0.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis0')),
(os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis1.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis1')),
(os.path.join(capture_dir, '%s_megapose_%s_show_6d_vis2.jpg' % (name, megapose_variant_name)), results_mp.get('show_6D_vis2')),
], enabled=save_visualizations)
pose_sets_mm = {}
if obj_id in results_hccepose and 'Rts' in results_hccepose[obj_id]:
pose_sets_mm['HccePose'] = results_hccepose[obj_id]['Rts']
if obj_id in results_fp and 'Rts' in results_fp[obj_id]:
pose_sets_mm['FoundationPose'] = results_fp[obj_id]['Rts']
if obj_id in results_mp and 'Rts' in results_mp[obj_id]:
pose_sets_mm['MegaPose'] = results_mp[obj_id]['Rts']
if save_visualizations:
depth_compare_vis, depth_compare_summary = build_depth_comparison_visual(
depth,
cam_K,
obj_model_path,
pose_sets_mm,
device=str(foundationpose_runner.device),
max_items=4,
)
save_visual_artifacts([
(os.path.join(capture_dir, '%s_compare_depth_hccepose_foundationpose_megapose_%s.jpg' % (name, megapose_variant_name)), depth_compare_vis),
], enabled=True)The authoritative version (including imports for print_foundationpose_benchmark) is s4_p3_test_mi10_bin_picking_RGBD_FP_vs_MP.py.
Depth-aligned comparison panels for MegaPose RGB vs MegaPose RGB-D branches (000003):
You can use the script s4_p2_test_bf_pbr_bop_challenge.py to evaluate HccePose(BF) across the seven core BOP datasets.
| Dataset | Weights Link |
|---|---|
| LM-O | Hugging Face - LM-O |
| YCB-V | Hugging Face - YCB-V |
| T-LESS | Hugging Face - T-LESS |
| TUD-L | Hugging Face - TUD-L |
| HB | Hugging Face - HB |
| ITODD | Hugging Face - ITODD |
| IC-BIN | Hugging Face - IC-BIN |
As an example, we evaluated HccePose(BF) on the widely used LM-O dataset from the BOP benchmark. We adopted the default 2D detector (GDRNPP) from the BOP 2023 Challenge and obtained the following output files:
- 2D segmentation results: seg2d_lmo.json
- 6D pose results: det6d_lmo.csv
These two files were submitted on October 20, 2025. The results are shown below.
The 6D localization score remains consistent with the 2024 submission,
while the 2D segmentation score improved by 0.002, thanks to the correction of minor implementation bugs.
- If some pretrained weights show an iteration count of
0, this is not an error. All HccePose(BF) weights are fine-tuned from the standard HccePose model trained using only the front surface. In some cases, the initial weights already achieve optimal performance.
We are currently organizing and updating the following modules:
-
π
HccePose(BF) weights for the seven core BOP datasets -
π§ͺ
BOP Challenge testing pipeline -
π 6D pose inference via inter-frame tracking
-
π·οΈ Real-world 6D pose dataset preparation based on HccePose(BF)
-
βοΈ PBR + Real training workflow
-
π Tutorials on
object preprocessing,data rendering,YOLOv11 label preparation and training, as well as HccePose(BF)label preparationandtraining
All components are expected to be completed by the end of 2025, with continuous daily updates whenever possible.
This project builds on public datasets, benchmarks, and methods that the code and tutorials reference directly, including: the BOP benchmark and bop_toolkit; BlenderProc for rendering; Ultralytics YOLO for detection; FoundationPose and MegaPose for RGB-D refinement integrations; and KASAL where used in the dependency stack.
If you find our work useful, please cite it as follows:
@InProceedings{HccePose_BF,
author = {Wang, Yulin and Hu, Mengting and Li, Hongli and Luo, Chen},
title = {HccePose(BF): Predicting Front \& Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {7166-7175}
}























