One unified interface for 5 UOIS datasets.
Load, augment, train, and evaluate — in 3 lines of code.
| Problem | Solution |
|---|---|
| Every UOIS dataset ships its own format and loader | Unified API — one interface across all 5 datasets |
| Evaluation setup eats up research time | Built-in metrics — F1, IoU, Precision, Recall in a single call |
| Mixing synthetic and real data takes custom wiring | Multi-dataset DataModule with balanced sampling out of the box |
| Reproducing baselines means writing glue code | Lightning-native — drop into any training loop |
pip install uois-toolkitfrom uois_toolkit import get_datamodule, cfg
dm = get_datamodule("ocid", "/path/to/OCID", batch_size=4, config=cfg)
dm.setup()
batch = next(iter(dm.train_dataloader()))
# batch["image_color"] → [B, 3, H, W]
# batch["depth"] → [B, C, H, W]
# batch["annotations"] → per-image bboxes + RLE masksSame API for all datasets — just swap the name: "tabletop", "osd", "robot_pushing", "iteach_humanplay".
| Dataset | Type | Images | Setting | Source |
|---|---|---|---|---|
| Tabletop (TOD) | Synthetic | ~280K | Rendered household scenes | Xiang et al., CoRL 2020 |
| OCID | Real | 2,390 | Cluttered tabletop | Suchi et al., ICRA 2019 |
| OSD | Real | 111 | Sparse tabletop | Richtsfeld et al., IROS 2012 |
| Robot Pushing | Real | 428 | Robot pushing objects | Lu et al., RSS 2023 |
| iTeach-HumanPlay | Real | 14K+ | Human-object interaction | P et al., arXiv 2024 |
from uois_toolkit.datasets import OCIDDataset
from uois_toolkit import cfg
dataset = OCIDDataset(image_set="test", data_path="/path/to/OCID", config=cfg)
sample = dataset[0]
# Keys: file_name, image_id, height, width, image_color, depth, raw_depth, annotationsimport numpy as np
from uois_toolkit.metrics import compute_metrics
gt_mask = ... # [H, W] binary
pred_mask = ... # [H, W] binary
results = compute_metrics(gt_mask, pred_mask, ["f1_score", "iou", "precision", "recall"])
# {'f1_score': 0.89, 'iou': 0.80, 'precision': 0.92, 'recall': 0.86}from uois_toolkit import get_datamodule, cfg
dm = get_datamodule("tabletop", "/data/tabletop", batch_size=8, config=cfg)
trainer = pl.Trainer(accelerator="auto", max_epochs=10)
trainer.fit(model, datamodule=dm)| Key | Shape | Description |
|---|---|---|
image_color |
[B, 3, H, W] |
RGB image (float) |
depth |
[B, C, H, W] |
Depth map |
annotations |
List[List[Dict]] |
Per-image object annotations |
Each annotation: {"bbox": [x1,y1,x2,y2], "segmentation": <RLE>, "category_id": 1}
git clone https://github.com/jishnujayakumar/uois_toolkit.git
cd uois_toolkit
pip install -e .Note: detectron2 is needed for mask utilities. Install a build that matches your PyTorch + CUDA version.
python -m pytest test/test_datamodule.py -v \
--dataset_path tabletop=/data/tabletop \
--dataset_path ocid=/data/OCID \
--dataset_path osd=/data/OSDCI runs on every push and PR via GitHub Actions.
@software{uois_toolkit,
author = {Jishnu Jaykumar P and Aggarwal, Avaya and Maheshwari, Animesh},
title = {uois_toolkit: A PyTorch Toolkit for Unseen Object Instance Segmentation},
year = {2025},
url = {https://github.com/jishnujayakumar/uois_toolkit}
}Dataset citations (click to expand)
Tabletop (TOD) — Xiang et al., "Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation", CoRL 2020
OCID — Suchi et al., "EasyLabel: A Semi-Automatic Pixel-wise Object Annotation Tool for Creating Robotic RGB-D Datasets", ICRA 2019
OSD — Richtsfeld et al., "Segmentation of Unknown Objects in Indoor Environments", IROS 2012
Robot Pushing — Lu et al., "Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot Interaction", RSS 2023
iTeach-HumanPlay — P et al., "iTeach: In the Wild Interactive Teaching for Failure-Driven Adaptation of Robot Perception", arXiv:2410.09072
Built and polished by:
@OnePunchMonk • @AnimeshMaheshwari22
PRs welcome! See open issues.


