diff --git a/docs/design/components/inferencekit.md b/docs/design/components/inferencekit.md index bd3efdf..4e285fa 100644 --- a/docs/design/components/inferencekit.md +++ b/docs/design/components/inferencekit.md @@ -26,35 +26,14 @@ - [Preprocessors and Postprocessors](#preprocessors-and-postprocessors) - [Manifest Format](#manifest-format) - [Extension \& Plugin System](#extension--plugin-system) - - [Backend Registry](#backend-registry) - - [Building a Custom Domain Layer](#building-a-custom-domain-layer) - - [Publishing to HuggingFace](#publishing-to-huggingface) - [Runners (Domain-Provided)](#runners-domain-provided) - - [Contrib Runners](#contrib-runners) - [Supported Backends](#supported-backends) - - [Domain Layer Examples](#domain-layer-examples) - - [Example 1: Vision (model_api)](#example-1-vision-model_api) - - [Example 2: Physical‑AI Plugins](#example-2-physicalai-plugins) - - [Example 3: Custom Domain](#example-3-custom-domain) + - [Domain Layer Example](#domain-layer-example) - [Usage Examples](#usage-examples) - [Basic usage](#basic-usage) - [With explicit backend](#with-explicit-backend) - [With callbacks](#with-callbacks) - [Context manager for resource cleanup](#context-manager-for-resource-cleanup) - - [API Reference](#api-reference) - - [Main Entry Point](#main-entry-point) - - [Runners](#runners) - - [Adapters](#adapters) - - [Callbacks](#callbacks) - - [Plugins](#plugins) - - [Extension Points](#extension-points) - - [Appendix: Design Rationale](#appendix-design-rationale) - - [Why a separate inference package?](#why-a-separate-inference-package) - - [Why inferencekit is a base layer, not a model_api replacement](#why-inferencekit-is-a-base-layer-not-a-model_api-replacement) - - [Migration path for model_api](#migration-path-for-model_api) - - [Why runners are separate from adapters?](#why-runners-are-separate-from-adapters) - - [Why callbacks instead of inheritance?](#why-callbacks-instead-of-inheritance) - - [Why a plugin system?](#why-a-plugin-system) - [Related Documents](#related-documents) --- @@ -438,195 +417,246 @@ class Postprocessor(ABC): ### Manifest Format -All exported models use a unified `manifest.json` format. The manifest uses `class_path` + `init_args` (following `jsonargparse` conventions) for component specification: +All exported models use a unified `manifest.json` format. The manifest uses a nested structure that mirrors the `InferenceModel` class hierarchy, with logical sections for policy identity, model configuration, hardware, and metadata: + +```text +manifest.json +├── format + version (envelope) +├── policy (identity — what policy is this?) +│ ├── name (human-readable name) +│ └── source (provenance: repo_id, class_path) +├── model (exported model — how to run it?) +│ ├── n_obs_steps (observation window) +│ ├── runner (execution pattern + params) +│ ├── artifacts (model files by named role) +│ ├── preprocessors (input transforms: normalize, etc.) +│ └── postprocessors (output transforms: denormalize, etc.) +├── hardware (deployment — what hardware?) +│ ├── robots (robot configurations) +│ └── cameras (camera configurations) +└── metadata (provenance — when/who created this?) +``` ```json { "format": "policy_package", "version": "1.0", - "robots": [ - { - "name": "main", - "type": "Koch v1.1", - "state": { "shape": [14], "dtype": "float32" }, - "action": { "shape": [14], "dtype": "float32" } - } - ], - "cameras": [ - { - "name": "top", - "shape": [3, 480, 640], - "dtype": "uint8" - }, - { - "name": "wrist", - "shape": [3, 480, 640], - "dtype": "uint8" - } - ], "policy": { "name": "my_model", - "kind": "single_pass" + "source": { + "repo_id": "user/my_model", + "class_path": "mypackage.policies.MyPolicy" + } }, - "artifacts": { - "onnx": "model.onnx" + "model": { + "n_obs_steps": 1, + "runner": { + "type": "action_chunking", + "chunk_size": 100, + "n_action_steps": 100 + }, + "artifacts": { + "model": "model.onnx" + }, + "preprocessors": [ + { + "type": "normalize", + "mode": "mean_std", + "artifact": "stats.safetensors", + "features": ["observation.state"] + } + ], + "postprocessors": [ + { + "type": "denormalize", + "mode": "mean_std", + "artifact": "stats.safetensors", + "features": ["action"] + } + ] }, - "runner": { - "class_path": "inferencekit.runners.SinglePassRunner", - "init_args": {} + "hardware": { + "robots": [ + { + "name": "main", + "type": "SO-100", + "state": { + "shape": [6], + "dtype": "float32", + "order": ["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll", "gripper"] + }, + "action": { + "shape": [6], + "dtype": "float32", + "order": ["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll", "gripper"] + } + } + ], + "cameras": [ + {"name": "top", "shape": [3, 480, 640], "dtype": "uint8"}, + {"name": "wrist", "shape": [3, 480, 640], "dtype": "uint8"} + ] }, - "adapter": { - "class_path": "inferencekit.adapters.ONNXAdapter", - "init_args": { - "providers": ["CUDAExecutionProvider", "CPUExecutionProvider"] + "metadata": { + "created_at": "2026-03-27T12:00:00Z", + "created_by": "mypackage.export" + } +} +``` + +The `hardware` section declares what the policy **expects** at inference time — logical names, tensor shapes, and dtypes. These are the names used during training (e.g., `"top"`, `"wrist"` for cameras; `"main"` for the robot). At deployment, the user maps logical names to physical devices (e.g., `"top"` → `/dev/video0`). The `order` field in robot specs declares joint ordering — critical for multi-arm setups where `[left, right]` vs `[right, left]` concatenation produces valid shapes with wrong semantics. + +> **Note:** For the full manifest schema reference (all runner variants, field descriptions, and design rationale), see [LeRobot Integration Design](../integrations/lerobot.md#2-converged-manifest-format). The format is shared by both PhysicalAI and LeRobot exports. + +**PhysicalAI-native format (`class_path` + `init_args`):** + +PhysicalAI can also write manifests using the full `class_path` + `init_args` format for components. This gives full power over component instantiation (custom classes, nested configs) while remaining loadable by PhysicalAI's `ComponentRegistry`: + +```json +{ + "format": "policy_package", + "version": "1.0", + "policy": { + "name": "act", + "source": { + "repo_id": "lerobot/act_aloha_sim_transfer_cube_human", + "class_path": "physicalai.policies.act.policy.ACT" } }, - "preprocessors": [ - { - "class_path": "mypackage.preprocessors.ImageResize", + "model": { + "n_obs_steps": 1, + "runner": { + "class_path": "physicalai.inference.runners.ActionChunkingRunner", "init_args": { - "target_size": [640, 640] + "chunk_size": 100, + "n_action_steps": 100 } - } - ], - "postprocessors": [ - { - "class_path": "mypackage.postprocessors.NMS", - "init_args": { - "confidence_threshold": 0.5 + }, + "artifacts": { + "model": "model.onnx" + }, + "preprocessors": [ + { + "class_path": "physicalai.inference.preprocessors.StatsNormalizer", + "init_args": { + "mode": "mean_std", + "stats_path": "stats.safetensors", + "features": ["observation.state"] + } } - } - ] + ], + "postprocessors": [ + { + "class_path": "physicalai.inference.postprocessors.StatsDenormalizer", + "init_args": { + "mode": "mean_std", + "stats_path": "stats.safetensors", + "features": ["action"] + } + } + ] + }, + "hardware": { + "robots": [ + { + "name": "main", + "type": "SO-100", + "state": { + "shape": [6], + "dtype": "float32", + "order": ["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll", "gripper"] + }, + "action": { + "shape": [6], + "dtype": "float32", + "order": ["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll", "gripper"] + } + } + ], + "cameras": [ + {"name": "top", "shape": [3, 480, 640], "dtype": "uint8"}, + {"name": "wrist", "shape": [3, 480, 640], "dtype": "uint8"} + ] + }, + "metadata": { + "created_at": "2026-03-27T12:00:00Z", + "created_by": "physicalai.export" + } } ``` -**How models are loaded:** +> **Both formats resolve identically.** The `type`-based example above (used by LeRobot) and this `class_path`-based example both resolve to the same runner and processor instances through the `ComponentRegistry`. See [Dual Component Resolution](../integrations/lerobot.md#dual-component-resolution) for the full resolution algorithm. -The framework reads `manifest.json` and resolves the model configuration: +**Naming mapping between the two formats:** -1. **Built‑in models** (physicalai-train, LeRobot): `policy.kind` maps to a built‑in runner. No `class_path` needed for the runner — the `kind` field is sufficient. -2. **Custom/exotic models**: `runner.class_path` points to the user's runner class. The framework instantiates it dynamically. -3. **Hardware validation**: `robots` and `cameras` sections declare expected shapes. The runtime validates observations against these on first contact. +| `type` (short name) | `class_path` (full Python path) | Purpose | +| --- | --- | --- | +| `"action_chunking"` | `"physicalai.inference.runners.ActionChunkingRunner"` | Runner: returns a chunk of future actions per step | +| `"single_pass"` | `"physicalai.inference.runners.SinglePassRunner"` | Runner: single forward pass per step | +| `"normalize"` | `"physicalai.inference.preprocessors.StatsNormalizer"` | Preprocessor: normalizes observations using dataset stats | +| `"denormalize"` | `"physicalai.inference.postprocessors.StatsDenormalizer"` | Postprocessor: denormalizes actions using dataset stats | -The `class_path` + `init_args` pattern allows domain layers to specify their own components in the manifest without inferencekit needing to know about them. +- **`type` + flat params**: LeRobot writes this. Both frameworks can read it. Short name resolved via `ComponentRegistry`. +- **`class_path` + `init_args`**: PhysicalAI writes this. Direct Python import — no registry lookup needed. Useful for custom/third-party components that aren't in the built-in registry. ---- +For example, these two `ComponentSpec`s resolve to the exact same object: -## Extension & Plugin System +```json +// type format (LeRobot-compatible): +{"type": "action_chunking", "chunk_size": 100, "n_action_steps": 100} -inferencekit only supports **backend adapters** as extensions. All domain plugins live above it (physical‑ai‑framework, model_api, custom layers). +// class_path format (PhysicalAI-native): +{"class_path": "physicalai.inference.runners.ActionChunkingRunner", "init_args": {"chunk_size": 100, "n_action_steps": 100}} -### Backend Registry +// Both → ActionChunkingRunner(chunk_size=100, n_action_steps=100) +``` -inferencekit exposes a backend registry for RuntimeAdapters. Domain plugins are not registered here. +**How models are loaded:** -### Building a Custom Domain Layer +The framework reads `manifest.json` and resolves the model configuration using **dual-path component resolution**: -Anyone can create a domain-specific inference layer on top of inferencekit. Here's the pattern: +1. **Manifest parsing**: `manifest.json` is parsed directly into nested Pydantic models --- no flattening or normalization step. +2. **Runner resolution**: Components support two formats that both resolve through the same `ComponentRegistry` + `instantiate_component()` pipeline: + - **`type` + flat params** (interoperable, written by LeRobot): `{"type": "action_chunking", "chunk_size": 100}` → registry lookup → `ComponentSpec` → `instantiate_component()` + - **`class_path` + `init_args`** (full-power, written by PhysicalAI): `{"class_path": "physicalai.inference.runners.ActionChunkingRunner", "init_args": {"chunk_size": 100}}` → `ComponentSpec` → `instantiate_component()` +3. **Backend selection**: `model.artifacts` maps named roles (e.g., `"model"`, `"encoder"`) to filenames. The first available backend is auto-selected, or the user can override at load time. +4. **I/O pipeline**: `model.preprocessors` and `model.postprocessors` declare input/output transforms (normalization, denormalization) resolved via the same dual-path mechanism. +5. **Hardware validation**: `hardware.robots` and `hardware.cameras` sections declare expected shapes. The runtime can validate observations against these. +6. **Custom components**: Domain layers can extend the manifest with custom processor types or runner parameters without modifying inferencekit. Any component with a `class_path` is instantiated directly; any component with a `type` goes through the registry. -**Step 1: Define your domain model** +> **See also**: [LeRobot Integration Design — Runner Resolution](../integrations/lerobot.md#runner-resolution) for the full resolution algorithm and examples. -```python -# my_domain_inference/model.py -from inferencekit import InferenceModel +--- -class MyDomainModel(InferenceModel): - """Domain-specific inference model. +## Extension & Plugin System - Extends InferenceModel with domain-specific methods, - preprocessing, and postprocessing. - """ +inferencekit supports **backend adapters** as extensions via a registry. Domain-specific plugins (runners, processors, models) live in their respective domain layers, not in inferencekit. +**Backend registry:** New backends implement `RuntimeAdapter` and register via Python entry points (`inferencekit.backends`). Domain layers register runners and processors via their own entry points (`inferencekit.runners`, `inferencekit.callbacks`). + +**Building a custom domain layer:** Subclass `InferenceModel`, implement domain-specific runners and pre/postprocessors, and register via entry points: + +```python +# my_domain/model.py — subclass InferenceModel +class MyDomainModel(InferenceModel): def __init__(self, path, **kwargs): super().__init__(path, **kwargs) - # Attach domain preprocessors/postprocessors - self.preprocessors = self._load_preprocessors(path) - self.postprocessors = self._load_postprocessors(path) + self.preprocessors = [MyPreprocessor()] def domain_predict(self, domain_inputs): - """Domain-specific prediction method.""" - # Preprocess domain inputs -> generic inputs inputs = self._preprocess(domain_inputs) - # Run generic inference - outputs = self(inputs) - # Postprocess generic outputs -> domain outputs - return self._postprocess(outputs) + return self(inputs) ``` -**Step 2: Define domain-specific runners (if needed)** - -```python -# my_domain_inference/runners.py -from inferencekit.runners import InferenceRunner - -class MyDomainRunner(InferenceRunner): - """Runner for domain-specific inference patterns.""" - - def run(self, adapter, inputs): - # Implement domain-specific execution logic - ... -``` - -**Step 3: Register via entry points** - ```toml -# my_domain_inference/pyproject.toml +# pyproject.toml — register custom runners [project.entry-points."inferencekit.runners"] -my_domain_runner = "my_domain_inference.runners:MyDomainRunner" -``` - -**Step 4: Package and distribute** - -```bash -# Publish to PyPI -pip install my-domain-inference - -# Or publish to HuggingFace (see below) +my_runner = "my_domain.runners:MyDomainRunner" ``` -### Publishing to HuggingFace - -Domain layers can publish model packages to HuggingFace that include: - -1. **Exported model artifacts** (ONNX, OpenVINO, etc.) -2. **Manifest** (`manifest.json`) specifying the inferencekit runner, preprocessors, etc. -3. **Domain package dependency** declared in the manifest - -```json -{ - "format": "policy_package", - "version": "1.0", - "robots": [...], - "cameras":[...], - "policy": { - "name": "my_model", - "kind": "custom" - }, - "domain_package": "my-domain-inference", - "artifacts": { - "onnx": "model.onnx" - }, - "runner": { - "class_path": "my_domain_inference.runners.MyDomainRunner", - "init_args": { - "param1": "value1" - } - }, - "preprocessors": [ - { - "class_path": "my_domain_inference.preprocessors.MyPreprocessor", - "init_args": {} - } - ] -} -``` - -**Loading from HuggingFace:** +**HuggingFace publishing:** Domain layers can publish model packages to HuggingFace containing exported artifacts + `manifest.json`. Loading is automatic: ```python -from inferencekit import InferenceModel - -# Auto-downloads model + resolves domain package model = InferenceModel("hf://username/my-model") outputs = model(inputs) ``` @@ -635,7 +665,13 @@ outputs = model(inputs) ## Runners (Domain-Provided) -inferencekit defines the `InferenceRunner` interface. Domain layers implement concrete runners. +inferencekit defines the `InferenceRunner` interface. Domain layers implement concrete runners: + +| Runner | Description | Stateful | +| --- | --- | --- | +| **SinglePassRunner** | Default. One forward pass per call. Covers 90% of use cases. | No | +| **BatchRunner** | Splits inputs into batches for throughput optimization. | No | +| **StreamingRunner** | Buffers inputs for real-time streaming applications. | Yes | ```python class SinglePassRunner(InferenceRunner): @@ -643,96 +679,16 @@ class SinglePassRunner(InferenceRunner): def run(self, adapter: RuntimeAdapter, inputs: dict) -> dict: return adapter.predict(inputs) - - def reset(self) -> None: - pass # No state - - -class BatchRunner(InferenceRunner): - """Batched inference for throughput optimization.""" - - def __init__(self, batch_size: int = 8): - self.batch_size = batch_size - - def run(self, adapter: RuntimeAdapter, inputs: dict) -> dict: - # Split inputs into batches, run, merge results - ... - - -class StreamingRunner(InferenceRunner): - """Streaming inference for real-time applications.""" - - def __init__(self, buffer_size: int = 1): - self.buffer_size = buffer_size - - def run(self, adapter: RuntimeAdapter, inputs: dict) -> dict: - # Process streaming inputs with buffering - ... ``` -### Contrib Runners - -If desired, inferencekit can host a small `contrib` module for reference implementations, but it does not own domain logic. +**Contrib runners** (`inferencekit.contrib`): Reference implementations for common patterns, shipped as optional extras: -```python -# inferencekit/contrib/iterative.py -class IterativeRunner(InferenceRunner): - """Runner for iterative/flow-matching inference. +| Runner | Description | Use Case | +| --- | --- | --- | +| **IterativeRunner** | Multi-step denoising with configurable scheduler | Diffusion, flow-matching policies | +| **TiledRunner** | Tile-based inference with overlap and merging | High-resolution images, satellite imagery | - Performs multiple forward passes with denoising steps. - Used by diffusion models, flow-matching policies, etc. - """ - - def __init__( - self, - num_steps: int = 10, - scheduler: str = "euler", - timestep_spacing: str = "linear", - ): - self.num_steps = num_steps - self.scheduler = scheduler - self.timestep_spacing = timestep_spacing - - def run(self, adapter: RuntimeAdapter, inputs: dict) -> dict: - x_t = np.random.randn(*self._infer_shape(inputs)).astype(np.float32) - timesteps = self._generate_timesteps() - dt = -1.0 / self.num_steps - - for t in timesteps: - step_inputs = {**inputs, "x_t": x_t, "timestep": np.array([t])} - v_t = adapter.predict(step_inputs)["v_t"] - x_t = self._step(x_t, v_t, dt) - - return {"output": x_t} -``` - -```python -# inferencekit/contrib/tiled.py -class TiledRunner(InferenceRunner): - """Runner for tile-based inference on large inputs. - - Splits large inputs into overlapping tiles, runs inference - on each tile, and merges results. Useful for high-resolution - images, satellite imagery, medical imaging, etc. - """ - - def __init__( - self, - tile_size: tuple[int, int] = (640, 640), - overlap: float = 0.25, - merge_strategy: str = "average", - ): - self.tile_size = tile_size - self.overlap = overlap - self.merge_strategy = merge_strategy - - def run(self, adapter: RuntimeAdapter, inputs: dict) -> dict: - tiles = self._split_into_tiles(inputs) - tile_results = [adapter.predict(tile) for tile in tiles] - return self._merge_results(tile_results) -``` - -Domain layers can also contribute runners back to `inferencekit.contrib` via pull request, or ship them in their own packages. +Domain layers can contribute runners back to `inferencekit.contrib` via pull request, or ship them in their own packages. --- @@ -749,103 +705,15 @@ Third-party backends can be added via the backend registry without modifying inf --- -## Domain Layer Examples - -These examples show how domain-specific libraries build on inferencekit's interfaces. Each example demonstrates the pattern; full implementations live in their respective packages. - -### Example 1: Vision (model_api) - -[model_api](https://github.com/open-edge-platform/model_api) provides vision-specific inference on top of inferencekit. It adds image preprocessing, task-specific model wrappers, and structured result types. - -```python -# model_api wrapping inferencekit for vision inference -from inferencekit import InferenceModel -from inferencekit.runners import InferenceRunner, SinglePassRunner -from inferencekit.preprocessors import Preprocessor -from inferencekit.postprocessors import Postprocessor - - -# Vision-specific preprocessor -class ImagePreprocessor(Preprocessor): - """Resize, normalize, and layout-transform images.""" - - def __init__(self, target_size, mean, std, layout="NCHW"): - self.target_size = target_size - self.mean = np.array(mean) - self.std = np.array(std) - self.layout = layout - - def __call__(self, inputs: dict) -> dict: - image = inputs["image"] - image = cv2.resize(image, self.target_size) - image = (image.astype(np.float32) / 255.0 - self.mean) / self.std - if self.layout == "NCHW": - image = image.transpose(2, 0, 1) - inputs["image"] = image[np.newaxis, ...] - return inputs - - -# Vision-specific postprocessor (e.g., NMS for detection) -class DetectionPostprocessor(Postprocessor): - """Decode detection outputs and apply NMS.""" - - def __init__(self, confidence_threshold=0.5, nms_threshold=0.45): - self.confidence_threshold = confidence_threshold - self.nms_threshold = nms_threshold - - def __call__(self, outputs: dict) -> dict: - boxes, scores, labels = self._decode(outputs) - keep = self._nms(boxes, scores) - return { - "boxes": boxes[keep], - "scores": scores[keep], - "labels": labels[keep], - } - - -# Vision model built on top of InferenceModel -class DetectionModel(InferenceModel): - """YOLO/SSD/etc. detection model.""" - - def __init__(self, path, confidence=0.5, **kwargs): - super().__init__(path, **kwargs) - self.preprocessors = [ - ImagePreprocessor( - target_size=(640, 640), - mean=[0.485, 0.456, 0.406], - std=[0.229, 0.224, 0.225], - ) - ] - self.postprocessors = [ - DetectionPostprocessor(confidence_threshold=confidence) - ] - - def detect(self, image: np.ndarray) -> dict: - """Convenience method for vision users.""" - return self({"image": image}) -``` - -**Usage:** - -```python -from model_api import DetectionModel - -model = DetectionModel("./exports/yolo_v8", backend="openvino") -detections = model.detect(image) -print(detections["boxes"], detections["scores"]) -``` - -### Example 2: Physical‑AI Plugins +## Domain Layer Example -physicalai hosts policy plugins for physicalai-train, LeRobot, and custom frameworks. Each plugin supplies preprocessors, runners, and optional wrappers. +This example shows how physicalai builds on inferencekit's interfaces. Policy-specific behavior (`select_action`, episode reset) is implemented in physical‑ai‑framework's `InferenceModel` wrapper: ```python -# physical‑ai‑framework plugin example (policy-specific) from inferencekit import InferenceModel from inferencekit.runners import InferenceRunner -# Policy-specific runner with action chunking class ActionChunkingRunner(InferenceRunner): """Runner that manages action chunk queues. @@ -869,65 +737,9 @@ class ActionChunkingRunner(InferenceRunner): def reset(self): self._action_queue = [] - - -``` - -Policy‑specific behavior (e.g., `select_action`, episode reset) is implemented in physical‑ai‑framework’s `InferenceModel` wrapper, which subclasses inferencekit’s base `InferenceModel`. - -### Example 3: Custom Domain - -Anyone can build a domain layer. Here's a minimal example for audio inference: - -```python -# audio_inference/model.py -from inferencekit import InferenceModel -from inferencekit.preprocessors import Preprocessor - - -class AudioPreprocessor(Preprocessor): - """Convert audio to mel spectrogram.""" - - def __init__(self, sample_rate=16000, n_mels=80): - self.sample_rate = sample_rate - self.n_mels = n_mels - - def __call__(self, inputs): - audio = inputs["audio"] - mel = librosa.feature.melspectrogram( - y=audio, sr=self.sample_rate, n_mels=self.n_mels - ) - inputs["mel_spectrogram"] = mel - return inputs - - -class AudioClassificationModel(InferenceModel): - """Audio classification on top of inferencekit.""" - - def __init__(self, path, **kwargs): - super().__init__(path, **kwargs) - self.preprocessors = [AudioPreprocessor()] - - def classify(self, audio: np.ndarray) -> dict: - return self({"audio": audio}) -``` - -**Package and publish:** - -```toml -# audio_inference/pyproject.toml -[project] -name = "audio-inference-kit" -dependencies = ["inferencekit", "librosa"] - -[project.entry-points."inferencekit.runners"] -audio_streaming = "audio_inference.runners:AudioStreamingRunner" ``` -```bash -pip install audio-inference-kit -# or publish to HuggingFace with model artifacts + metadata -``` +Other domain layers (model_api for vision, custom audio/NLP packages) follow the same pattern: subclass `InferenceModel`, implement domain runners and pre/postprocessors, register via entry points. --- @@ -980,136 +792,6 @@ with InferenceModel("./exports/my_model") as model: --- -## API Reference - -### Main Entry Point - -```python -from inferencekit import InferenceModel - -model = InferenceModel("./exports/my_model") -outputs = model(inputs) -``` - -### Runners - -```python -from inferencekit.runners import ( - InferenceRunner, # ABC - subclass for custom runners - SinglePassRunner, # Default - covers 90% of models - BatchRunner, # Throughput-optimized batching - StreamingRunner, # Real-time streaming -) - -# Contrib runners (install with inferencekit[contrib]) -from inferencekit.contrib import ( - IterativeRunner, # Multi-step denoising / flow matching - TiledRunner, # Tile-based for large inputs -) -``` - -### Adapters - -```python -from inferencekit.adapters import ( - RuntimeAdapter, # ABC - OpenVINOAdapter, # Intel devices - ONNXAdapter, # Cross-platform - TorchExportAdapter, # PyTorch - get_adapter, # Factory function -) -``` - -### Callbacks - -```python -from inferencekit.callbacks import ( - Callback, # ABC - TimingCallback, # Performance profiling - LoggingCallback, # Prediction logging -) -``` - -### Plugins - -```python -from inferencekit.plugins import registry - -# List available backends -print(registry.backends.list()) - -# Register custom backend -registry.backends.register("my_backend", MyBackend) - -# Get a backend by name -adapter = registry.backends.get("onnx", device="cuda") -``` - -### Extension Points - -| Extension | How to Extend | Registration | -| ---------------- | --------------------------- | ------------------------------------- | -| New backend | Implement `RuntimeAdapter` | Entry point: `inferencekit.backends` | -| New runner | Implement `InferenceRunner` | Entry point: `inferencekit.runners` | -| New model format | Implement format plugin | Entry point: `inferencekit.formats` | -| New callback | Subclass `Callback` | Entry point: `inferencekit.callbacks` | -| Preprocessing | Implement `Preprocessor` | Via metadata `class_path` | -| Postprocessing | Implement `Postprocessor` | Via metadata `class_path` | - ---- - -## Appendix: Design Rationale - -### Why a separate inference package? - -1. **Reusability**: Same core across vision (model_api), robotics (physicalai-train), audio, NLP, and custom domains -2. **Clear boundaries**: Generic concerns (backends, metadata, plugins) separated from domain concerns (images, robots, audio) -3. **Easier testing**: Domain-agnostic package has fewer dependencies -4. **Ecosystem growth**: Anyone can build and publish domain layers without modifying inferencekit - -### Why inferencekit is a base layer, not a model_api replacement - -model_api provides rich vision-specific functionality: image preprocessing embedded in model graphs, task-specific wrappers (YOLO, SSD, SAM), result types, parameter validation, and tiling. These are vision concerns that don't belong in a generic inference framework. - -Instead of replacing model_api, inferencekit provides the **foundation** that model_api can build on: - -| Concern | inferencekit provides | model_api adds | -| ----------------- | -------------------------------------- | ---------------------------------------- | -| Backend execution | RuntimeAdapter (OV, ONNX, TRT) | Wraps RuntimeAdapter in InferenceAdapter | -| Model loading | Manifest-driven `InferenceModel(path)` | Vision-specific `Model.create_model()` | -| Preprocessing | Preprocessor ABC | ImageResize, Normalize, LayoutTransform | -| Postprocessing | Postprocessor ABC | NMS, BoxDecoder, MaskDecoder | -| Runners | SinglePassRunner, BatchRunner | TiledRunner (via contrib or own impl) | -| Result types | `dict[str, Any]` | DetectionResult, ClassificationResult | - -### Migration path for model_api - -1. **Phase 1 (compatibility)**: model_api wraps inferencekit's RuntimeAdapter inside its existing InferenceAdapter. No public API change. -2. **Phase 2 (adoption)**: model_api adopts RuntimeAdapter directly, deprecates its own adapter layer. -3. **Phase 3 (simplification)**: model_api becomes a pure domain layer on top of inferencekit. - -### Why runners are separate from adapters? - -- **Adapters** handle backend-specific execution (ONNX vs OpenVINO) -- **Runners** handle algorithm-specific patterns (single-pass vs iterative) -- This separation allows N backends × M inference patterns without N×M implementations - -### Why callbacks instead of inheritance? - -- **Composability**: Mix and match (timing + logging + safety) -- **Reusability**: Same callback works across all models and domains -- **Maintainability**: Add cross-cutting concerns without changing core code -- **Familiarity**: Lightning users already understand this pattern - -### Why a plugin system? - -- **Ecosystem growth**: Third parties can extend without forking -- **Clean dependencies**: inferencekit doesn't depend on domain packages -- **Discoverability**: Entry points make extensions automatically available -- **Publishability**: Domain layers can be packaged and shared independently - ---- - ## Related Documents - **[Strategy](../architecture/strategy.md)** — Big-picture architecture and layering decisions @@ -1118,5 +800,5 @@ Instead of replacing model_api, inferencekit provides the **foundation** that mo --- -_Document Version: 3.0_ -_Last Updated: 2026-02-16_ +_Document Version: 6.0_ +_Last Updated: 2026-03-31_ diff --git a/docs/design/integrations/lerobot.md b/docs/design/integrations/lerobot.md index 84f8a92..f4f6741 100644 --- a/docs/design/integrations/lerobot.md +++ b/docs/design/integrations/lerobot.md @@ -1,595 +1,536 @@ -# physicalai: LeRobot Integration Design +# PhysicalAI: LeRobot Integration Design **Status**: Proposal -**Author**: [Your Name] -**Date**: 2026-01-13 -**Relates to**: [LeRobot Policy Export Design](./policy_export_design.md) - -> **Important: LeRobot export is our proposal, not an agreed standard.** -> The PolicyPackage format (`manifest.json`) described in this document is a design we have proposed to the LeRobot team. It has **not yet been reviewed or accepted** upstream. If the LeRobot team adopts a different export format or modifies the proposed schema, this integration design will need to adapt accordingly. The architectural approach (unified manifest format, no lerobot dependency at runtime) remains valid regardless of the final format — only the loader implementation would change. +**Author**: Samet Akcay +**Date**: 2026-03-31 +**Relates to**: [Inference Core Design](../components/inferencekit.md) --- ## Executive Summary -This document describes how **physicalai** integrates with LeRobot's proposed PolicyPackage format. The integration is seamless because both physicalai-train and LeRobot use the **same unified `manifest.json` format**. The runtime reads `manifest.json` (pure JSON, no lerobot import) and maps `policy.kind` to built‑in runners. No LeRobot dependency is needed at deployment time. - -**Key principle:** All packages (physicalai-train, LeRobot, custom) export models using the same `manifest.json` format. physicalai consumes them identically. No special-casing, no separate format loaders, no circular dependencies. +This document describes how **PhysicalAI** integrates with **LeRobot** exported models using a **single converged manifest format**. Both frameworks produce `manifest.json` files with the same schema, eliminating the need for format adapters or translation layers. -**Note on status**: The PolicyPackage export format is our proposal to the LeRobot team (see [LeRobot Export Suggestions](../internal/lerobot-export-suggestions.md)). The format details below reflect our proposed design. The integration approach is sound regardless of the final format the LeRobot team adopts. - ---- +**Key principles:** -## 1. Architecture Overview +1. **One schema, two expressiveness levels** --- The manifest supports two component formats: `type` + flat params (interoperable, used by LeRobot) and `class_path` + `init_args` (full-power, used by PhysicalAI). PhysicalAI reads both; LeRobot reads `type` only. +2. **LeRobot is standalone** --- LeRobot's export system works without PhysicalAI installed. No PhysicalAI imports, no PhysicalAI class paths in manifests. +3. **PhysicalAI loads LeRobot exports natively** --- `InferenceModel.load("./lerobot_export")` works out of the box. No adapter class, no special-casing. +4. **Dependency is strictly one-way** --- LeRobot does not depend on PhysicalAI. PhysicalAI reads LeRobot's output (pure JSON) without importing LeRobot. ```text -┌────────────────────────────────────────────────────────────────┐ -│ physicalai │ -│ │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ -│ │ Adapters │ │ Built‑in │ │ Callbacks │ │ -│ │ (backends) │ │ Runners │ │ (instrumentation) │ │ -│ └──────────────┘ └──────────────┘ └──────────────────────┘ │ -│ │ -│ ┌──────────────────────────────────────────────────────────┐ │ -│ │ Unified Manifest Loader │ │ -│ │ │ │ -│ │ manifest.json (same format for all model sources) │ │ -│ │ physicalai-train, LeRobot, custom — all use the same schema │ -│ └──────────────────────────────────────────────────────────┘ │ -└────────────────────────────────────────────────────────────────┘ - │ - │ reads (pure file I/O) - ▼ - ┌──────────────────────────┐ - │ Exported Model │ - │ (any source) │ - │ │ - │ manifest.json │ - │ model artifacts │ - └──────────────────────────┘ +LeRobot (standalone) PhysicalAI +-------------------- ---------- +policy.export("./out") --produces--> InferenceModel.load("./out") + | + Same manifest.json schema +-- reads manifest.json + Writes: type + flat params +-- resolves via type OR class_path + Own runners (numpy-only) +-- builds preprocessors/postprocessors + Zero physicalai deps +-- runs inference through pipeline ``` --- -## 2. Unified Manifest Format +## Table of Contents + +- [Executive Summary](#executive-summary) +- [1. Architecture Overview](#1-architecture-overview) +- [2. Converged Manifest Format](#2-converged-manifest-format) + - [Schema Overview](#schema-overview) + - [Full Example: ACT Policy](#full-example-act-policy) + - [Runner Variants](#runner-variants) + - [Field Reference](#field-reference) + - [Dual Component Resolution](#dual-component-resolution) +- [3. How PhysicalAI Loads the Manifest](#3-how-physicalai-loads-the-manifest) +- [4. How LeRobot Uses the Manifest](#4-how-lerobot-uses-the-manifest) +- [5. Runner Mapping](#5-runner-mapping) +- [6. Normalization Handling](#6-normalization-handling) +- [7. Usage Examples](#7-usage-examples) +- [8. Supported Policies](#8-supported-policies) +- [Related Documents](#related-documents) -All packages (physicalai-train, LeRobot, custom) use the same `manifest.json` schema. This section describes the fields relevant to LeRobot policies specifically. +--- -### Package Detection +## 1. Architecture Overview -A directory is an exported model package if it contains `manifest.json` with `"format": "policy_package"`: +Both frameworks share the same manifest schema. PhysicalAI's `InferenceModel` reads the manifest, resolves components (runner, preprocessors, postprocessors, adapter), and runs inference --- regardless of which framework produced the export. -```python -def is_policy_package(path: Path) -> bool: - manifest_path = path / "manifest.json" - if not manifest_path.exists(): - return False - manifest = json.loads(manifest_path.read_text()) - return manifest.get("format") == "policy_package" +```text ++-----------------------------------------------------------------------+ +| PhysicalAI | +| | +| +----------------+ +-----------------+ +------------------------+ | +| | Adapters | | Built-in | | Callbacks | | +| | (backends) | | Runners | | (instrumentation) | | +| | | | | | | | +| | ONNX, OpenVINO | | SinglePass | | TimingCallback | | +| | TensorRT | | ActionChunking | | LoggingCallback | | +| | TorchExportIR | | Iterative | | ActionSafetyCallback | | +| | | | TwoPhase | | | | +| +----------------+ +-----------------+ +------------------------+ | +| | +| +---------------------------------------------------------------+ | +| | Manifest Loader | | +| | | | +| | manifest.json --> parse --> resolve components --> run | | +| | (same schema for all sources: PhysicalAI, LeRobot, custom) | | +| +---------------------------------------------------------------+ | ++-----------------------------------------------------------------------+ + | + | reads (pure JSON file I/O) + v + +----------------------------+ + | Exported Package | + | (any source) | + | | + | manifest.json | + | model.onnx | + | stats.safetensors | + +----------------------------+ ``` -### Manifest Fields Used +**What PhysicalAI adds over LeRobot's standalone runtime:** -| Field | physicalai Usage | -| --------------- | -------------------------------------------------------------------------------------- | -| `format` | Package type detection | -| `version` | Schema compatibility check | -| `policy.kind` | Runner selection (`single_shot` → `SinglePassRunner`, `iterative` → `IterativeRunner`) | -| `artifacts` | Backend artifact paths | -| `io` | Input/output validation | -| `action` | Action semantics (chunk_size, n_action_steps) | -| `iterative` | Loop parameters (num_steps, scheduler) | -| `normalization` | Normalizer configuration | -| `x-physical-ai` | Extension fields (callbacks, adapter options) | +| Feature | LeRobot Standalone | PhysicalAI | +| ----------------------------------- | ------------------ | ---------------------- | +| Load exported policy | Yes | Yes | +| Single-pass / iterative / two-phase | Yes | Yes | +| Action chunking | Yes | Yes | +| Callbacks (timing, logging, safety) | No | Yes | +| Multi-backend with fallback | ONNX + OpenVINO | ONNX + OpenVINO + TRT | +| Preprocessor/postprocessor chains | Fixed pipeline | Extensible chain | +| HuggingFace Hub loading | No | Yes (`hf://user/repo`) | +| `select_action()` / `reset()` API | No | Yes | --- -## 3. Manifest Loader Implementation +## 2. Converged Manifest Format -### How It Works +### Schema Overview -The manifest loader is unified — there is no separate "LeRobot loader" vs "physicalai-train loader". The same code parses `manifest.json` for all model sources. The `policy.kind` field determines which built‑in runner to use. +The manifest mirrors PhysicalAI's `InferenceModel` class hierarchy: -```python -# physicalai/manifest_loader.py - -class ManifestLoader: - """Unified manifest loader for all model sources.""" - - @staticmethod - def detect(path: Path) -> bool: - """Check if path contains a valid manifest.""" - manifest_path = path / "manifest.json" - if not manifest_path.exists(): - return False - try: - manifest = json.loads(manifest_path.read_text()) - return manifest.get("format") == "policy_package" - except (json.JSONDecodeError, KeyError): - return False - - @staticmethod - def load( - path: Path, - backend: str | None = None, - device: str = "cpu", - **kwargs - ) -> "InferenceModel": - """Load a model package into an InferenceModel.""" - manifest = json.loads((path / "manifest.json").read_text()) - - # Validate schema version - version = manifest.get("version", "1.0") - if not version.startswith("1."): - raise ValueError(f"Unsupported manifest version: {version}") - - # Select backend - backend = backend or _select_default_backend(manifest) - artifact_path = path / manifest["artifacts"][backend] - - # Create adapter (via inference core) - adapter = get_adapter(backend)(artifact_path, device=device) - - # Select runner based on policy kind - kind = manifest["policy"]["kind"] - runner = _create_runner(kind, manifest, **kwargs) - - # Create normalizer (if specified) - normalizer = _create_normalizer(path, manifest) - - # Load callbacks from extensions - callbacks = _load_callbacks(manifest) - - return InferenceModel( - adapter=adapter, - runner=runner, - normalizer=normalizer, - callbacks=callbacks, - metadata=manifest, - ) - - -def _create_runner(kind: str, manifest: dict, **kwargs) -> InferenceRunner: - """Map policy.kind to a built‑in runner.""" - if kind == "single_pass": - return SinglePassRunner() - - elif kind == "iterative": - iter_config = manifest.get("inference", {}) - return IterativeRunner( - num_steps=kwargs.get("num_steps", iter_config.get("num_steps", 10)), - scheduler=kwargs.get("scheduler", iter_config.get("scheduler", "euler")), - timestep_spacing=iter_config.get("timestep_spacing", "linear"), - ) - - elif kind == "two_phase": - iter_config = manifest.get("inference", {}) - return TwoPhaseRunner( - num_steps=kwargs.get("num_steps", iter_config.get("num_steps", 10)), - scheduler=kwargs.get("scheduler", iter_config.get("scheduler", "euler")), - ) - - elif kind == "custom": - # Custom runner specified via class_path - runner_config = manifest.get("runner", {}) - return instantiate(runner_config) - - else: - raise ValueError(f"Unknown policy kind: {kind}") +```text +manifest.json ++-- format + version (envelope --- what is this file?) ++-- policy (identity --- what policy is this?) +| +-- name (human-readable name) +| +-- source (provenance: repo_id, class_path) ++-- model (exported model --- how to run it?) +| +-- n_obs_steps (observation window size) +| +-- runner (execution pattern + parameters) +| +-- artifacts (model files by named role) +| +-- preprocessors (input transforms: normalize, etc.) +| +-- postprocessors (output transforms: denormalize, etc.) ++-- hardware (deployment --- what hardware?) +| +-- robots (robot configurations) +| +-- cameras (camera configurations) ++-- metadata (provenance --- when/who created this?) ``` -### Installation - -The manifest loader is **built‑in** — it ships with physicalai. No extra install needed. +### Full Example: ACT Policy -```bash -# This is all you need to run any exported model (physicalai-train, LeRobot, custom) -pip install physicalai +```json +{ + "format": "policy_package", + "version": "1.0", + "policy": { + "name": "act", + "source": { + "repo_id": "lerobot/act_aloha_sim_transfer_cube_human", + "class_path": "physicalai.policies.act.policy.ACT" + } + }, + "model": { + "n_obs_steps": 1, + "runner": { + "type": "action_chunking", + "chunk_size": 100, + "n_action_steps": 100 + }, + "artifacts": { + "model": "model.onnx" + }, + "preprocessors": [ + { + "type": "normalize", + "mode": "mean_std", + "artifact": "stats.safetensors", + "features": ["observation.state"] + } + ], + "postprocessors": [ + { + "type": "denormalize", + "mode": "mean_std", + "artifact": "stats.safetensors", + "features": ["action"] + } + ] + }, + "hardware": { + "robots": [ + { + "name": "main", + "type": "SO-100", + "state": { + "shape": [6], + "dtype": "float32", + "order": ["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll", "gripper"] + }, + "action": { + "shape": [6], + "dtype": "float32", + "order": ["shoulder_pan", "shoulder_lift", "elbow_flex", "wrist_flex", "wrist_roll", "gripper"] + } + } + ], + "cameras": [ + {"name": "top", "shape": [3, 480, 640], "dtype": "uint8"}, + {"name": "wrist", "shape": [3, 480, 640], "dtype": "uint8"} + ] + }, + "metadata": { + "created_at": "2026-03-27T12:00:00Z", + "created_by": "lerobot.export" + } +} ``` -The loader reads `manifest.json` (pure JSON parsing) and maps `policy.kind` to built‑in runners. No `lerobot` import. No `physicalai-train` import. No `physicalai[lerobot]` extra. +> **Note on image inputs:** Image normalization (uint8 to float32, divide by 255) is baked into the ONNX graph during export. Only non-image features that use dataset-level statistics (e.g., `observation.state`) need explicit preprocessor entries. ---- +### Runner Variants -## 4. Usage Examples +The `model.runner` section is open-ended --- policy-specific parameters go directly in the runner object alongside `type`. -### Basic Usage (Unified API) +**ACT / VQBeT** (single-pass with action chunking): -```python -from physicalai import InferenceModel +```json +"runner": { + "type": "action_chunking", + "chunk_size": 100, + "n_action_steps": 100 +} +``` -# Load LeRobot package (auto-detected via plugin) -model = InferenceModel("./pi0_exported") +**Diffusion Policy** (iterative denoising): -# Run inference (raw outputs) -observation = { - "observation.images.top": image_array, - "observation.state": state_array, +```json +"runner": { + "type": "iterative", + "horizon": 16, + "n_action_steps": 8, + "num_inference_steps": 100, + "scheduler": "ddpm" } -outputs = model(observation) -action_chunk = outputs["action"] ``` -### With Callbacks +**PI0** (two-phase: encode once + denoise iteratively): -```python -from physicalai import InferenceModel -from physicalai.callbacks import TimingCallback, LoggingCallback +```json +"artifacts": { + "encoder": "encoder.onnx", + "denoise": "denoise.onnx" +}, +"runner": { + "type": "two_phase", + "chunk_size": 50, + "n_action_steps": 50, + "num_inference_steps": 10, + "scheduler": "euler" +} +``` -model = InferenceModel( - "./pi0_exported", - callbacks=[ - TimingCallback(), - LoggingCallback(log_inputs=False, log_outputs=True), - ], -) +### Field Reference -# Callbacks fire automatically on predict -action = model(observation) -# -> logs timing and output summary -``` +#### Top-Level Envelope -### Override Runner Parameters +| Field | Type | Required | Description | +| --------- | ------ | -------- | ------------------------------------------------ | +| `format` | string | Yes | Always `"policy_package"`. Schema identification | +| `version` | string | Yes | Schema version (semver). Currently `"1.0"` | -```python -# Override num_steps at load time (no re-export needed) -model = InferenceModel( - "./pi0_exported", - num_steps=20, # Override manifest default of 10 - scheduler="ddim", -) -``` +#### `policy` --- Identity -### Real-Time Control (Policy API) +| Field | Type | Required | Description | +| -------------------------- | ------ | -------- | --------------------------------------------------- | +| `policy.name` | string | Yes | Human-readable policy name (e.g., `"act"`, `"pi0"`) | +| `policy.source` | object | No | Provenance information | +| `policy.source.repo_id` | string | No | HuggingFace repo ID | +| `policy.source.class_path` | string | No | Original Python class path | -```python -from physicalai import InferenceModel +#### `model` --- How to Run -policy = InferenceModel("./pi0_exported") -policy.reset() +| Field | Type | Required | Description | +| -------------------------- | ------ | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | +| `model.n_obs_steps` | int | Yes | Number of observation timesteps needed by the model | +| `model.runner` | object | Yes | Runner configuration (see [Runner Variants](#runner-variants)) | +| `model.runner.type` | string | Yes | Runner type: `action_chunking`, `iterative`, `two_phase` | +| `model.artifacts` | object | Yes | Map of artifact role to filename. Single-model: `{"model": "model.onnx"}`. Two-phase: `{"encoder": "encoder.onnx", "denoise": "denoise.onnx"}` | +| `model.preprocessors` | array | No | Input transforms (normalize, etc.) | +| `model.postprocessors` | array | No | Output transforms (denormalize, etc.) | -while not done: - action = policy.select_action(observation) - observation, reward, done, info = env.step(action) -``` +#### `hardware` --- Deployment -### Explicit Backend Selection +| Field | Type | Required | Description | +| ------------------------------- | ------ | -------- | ------------------------------------------------------------------ | +| `hardware.robots` | array | No | Robot configurations | +| `hardware.robots[].name` | string | Yes | Logical name (e.g., `"main"`, `"left_arm"`) | +| `hardware.robots[].type` | string | No | Robot model string (informational, e.g., `"SO-100"`) | +| `hardware.robots[].state` | object | No | Expected state tensor: `shape`, `dtype`, `order` (joint ordering) | +| `hardware.robots[].action` | object | No | Expected action tensor: `shape`, `dtype`, `order` (joint ordering) | +| `hardware.cameras` | array | No | Camera configurations | +| `hardware.cameras[].name` | string | Yes | Logical name matching training data keys (e.g., `"top"`, `"wrist"`) | +| `hardware.cameras[].shape` | array | No | `[C, H, W]` tensor shape (e.g., `[3, 480, 640]`) | +| `hardware.cameras[].dtype` | string | No | Numpy dtype string (default: `"uint8"`) | -```python -# Use specific backend -model = InferenceModel( - "./pi0_exported", - backend="onnx", - device="cuda:0", -) +The `order` field in robot specs declares joint ordering. This is critical for multi-arm setups where `[left, right]` vs `[right, left]` concatenation produces valid shapes with wrong semantics. When present, the runtime can compare declared order against the robot's actual joint order and catch mismatches at startup. Camera and robot `name` fields are **logical names** matching the keys used during training — at deployment, the user maps these to physical devices. -# Or with adapter options -model = InferenceModel( - "./pi0_exported", - backend="onnx", - adapter_options={ - "providers": ["TensorrtExecutionProvider", "CUDAExecutionProvider"], - }, -) -``` +#### `metadata` --- Provenance ---- +| Field | Type | Required | Description | +| --------------------- | ------ | -------- | ------------------ | +| `metadata.created_at` | string | No | ISO 8601 timestamp | +| `metadata.created_by` | string | No | Creator identifier | -## 5. Extension Fields +#### Preprocessor / Postprocessor Entry -physicalai-specific configuration can be embedded in the manifest under `x-physical-ai`. These fields are ignored by LeRobot's own runtime: +| Field | Type | Required | Description | +| ------------ | ------ | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | +| `type` | string | Yes | Processor type: `"normalize"`, `"denormalize"`, or custom | +| `class_path` | string | No | Full Python class path (required for custom types; built-in types resolve by convention) | +| `mode` | string | No | Normalization mode: `"mean_std"`, `"min_max"`, `"identity"` | +| `artifact` | string | No | Path to stats file (e.g., `"stats.safetensors"`) | +| `features` | array | No | Feature names to process (e.g., `["observation.state"]`) | -```json -{ - "format": "policy_package", - "version": "1.0", +### Dual Component Resolution - "policy": { ... }, - "artifacts": { ... }, +The manifest supports two ways to specify components (runners, preprocessors, postprocessors): -"x-physical-ai": { - "callbacks": [ - "timing", - {"class_path": "myproject.callbacks.SafetyCallback", "init_args": {"max_velocity": 1.0}} - ], - "adapter": { - "providers": ["CUDAExecutionProvider", "CPUExecutionProvider"], - "graph_optimization_level": "all" - }, - "preprocessors": [ -{"class_path": "physicalai.preprocessors.ImageNormalize", "init_args": {"mean": [0.485, 0.456, 0.406]}} - ] - } -} -``` +| Format | Who writes | Who reads | Example | +| ------------------------------ | ---------------------------------- | -------------------- | --------------------------------------------------------------------- | +| **`type` + flat params** | LeRobot, simple PhysicalAI exports | Both (interoperable) | `{"type": "action_chunking", "chunk_size": 100}` | +| **`class_path` + `init_args`** | PhysicalAI (full-power) | PhysicalAI only | `{"class_path": "physicalai.inference.runners.ActionChunkingRunner", "init_args": {"chunk_size": 100}}` | + +Both formats resolve through the same `ComponentRegistry` + `instantiate_component()` pipeline: -### Extension Schema +- **`class_path`** (full Python path) → direct import → instantiate +- **`type`** (short name) → registry lookup → resolve to full path → instantiate -| Field | Type | Description | | -| ---------------- | --------------------------- | ----------------------- | ------------------- | -| `callbacks` | `list[str \ | CallbackConfig]` | Callbacks to attach | -| `adapter` | `dict` | Adapter/backend options | | -| `preprocessors` | `list[PreprocessorConfig]` | Input preprocessors | | -| `postprocessors` | `list[PostprocessorConfig]` | Output postprocessors | | +```json +// LeRobot writes (type + flat params): +{"type": "action_chunking", "chunk_size": 100, "n_action_steps": 100} + +// PhysicalAI writes (class_path + init_args): +{"class_path": "physicalai.inference.runners.ActionChunkingRunner", "init_args": {"chunk_size": 100, "n_action_steps": 100}} -**Note**: LeRobot ignores `x-physical-ai` fields entirely. They are only read by physicalai. +// Both resolve to the same ActionChunkingRunner(chunk_size=100, n_action_steps=100) +``` --- -## 6. Runner Mapping +## 3. How PhysicalAI Loads the Manifest -### `policy.kind` → Built‑in Runner +The manifest is parsed directly into nested Pydantic models --- no intermediate flattening step: -| `policy.kind` | Runner | Notes | -| ------------- | ------------------ | -------------------------------- | -| `single_pass` | `SinglePassRunner` | Direct forward pass | -| `iterative` | `IterativeRunner` | Configurable loop with scheduler | -| `two_phase` | `TwoPhaseRunner` | Encode once + denoise loop | -| `custom` | via `class_path` | User-provided runner class | +```python +# In InferenceModel.load(): +raw = json.loads((path / "manifest.json").read_text()) +manifest = Manifest.model_validate(raw) + +# Resolve components from typed manifest fields +runner = resolve_runner(manifest.model.runner) +adapter = create_adapter(manifest.model.artifacts, path) +preprocessors = resolve_processors(manifest.model.preprocessors, path) +postprocessors = resolve_processors(manifest.model.postprocessors, path) +``` -### IterativeRunner Configuration +Runner and processor resolution both use **dual-path resolution** --- a single if-check, not an if-chain per type: ```python -class IterativeRunner(InferenceRunner): - """Runner for iterative/flow-matching policies.""" - - def __init__( - self, - num_steps: int = 10, - scheduler: str = "euler", - timestep_spacing: str = "linear", - timestep_range: tuple[float, float] = (1.0, 0.0), - ): - self.num_steps = num_steps - self.scheduler = scheduler - self.timestep_spacing = timestep_spacing - self.timestep_range = timestep_range - - def run(self, adapter: RuntimeAdapter, inputs: dict) -> dict: - # Initialize from noise - action_shape = self._infer_action_shape(inputs) - x_t = np.random.randn(*action_shape).astype(np.float32) - - # Generate timesteps - timesteps = self._generate_timesteps() - dt = -1.0 / self.num_steps - - # Iterative denoising - for t in timesteps: - step_inputs = { - **inputs, - "x_t": x_t, - "timestep": np.array([t], dtype=np.float32), - } - v_t = adapter.predict(step_inputs)["v_t"] - x_t = self._step(x_t, v_t, dt) - - return {"action": x_t} - - def _step(self, x: np.ndarray, v: np.ndarray, dt: float) -> np.ndarray: - if self.scheduler == "euler": - return x + dt * v - elif self.scheduler == "ddim": - # DDIM update rule - ... - else: - raise ValueError(f"Unknown scheduler: {self.scheduler}") +def resolve_runner(runner_config: dict) -> InferenceRunner: + if "class_path" in runner_config: + # PhysicalAI-native: class_path + init_args → ComponentSpec → instantiate + spec = ComponentSpec.model_validate(runner_config) + return instantiate_component(spec) + + # Framework-agnostic: type → registry lookup → instantiate + runner_type = runner_config["type"] + init_args = {k: v for k, v in runner_config.items() if k != "type"} + spec = ComponentSpec(class_path=runner_type, init_args=init_args) + return instantiate_component(spec) ``` ---- +Processors follow the same pattern, with one addition: the `artifact` key in `type`-format specs is resolved to an absolute `stats_path` at load time. -## 7. Callbacks for Robotics +> **Legacy `metadata.yaml` files** (pre-manifest era) are handled separately by `from_legacy_metadata()` in `manifest.py`. + +--- -physicalai provides callbacks useful for robotics applications: +## 4. How LeRobot Uses the Manifest -### ActionSafetyCallback +LeRobot reads the same `manifest.json` with its own tooling (no PhysicalAI dependency): ```python -# physicalai/callbacks/safety.py - -class ActionSafetyCallback(Callback): - """Clamp actions to safe ranges.""" - - def __init__( - self, - action_min: np.ndarray | float = -1.0, - action_max: np.ndarray | float = 1.0, - velocity_limit: float | None = None, - ): - self.action_min = action_min - self.action_max = action_max - self.velocity_limit = velocity_limit - self._last_action = None - - def on_predict_end(self, outputs: dict) -> dict: - action = outputs["action"] - - # Clamp to range - action = np.clip(action, self.action_min, self.action_max) - - # Velocity limiting - if self.velocity_limit and self._last_action is not None: - delta = action - self._last_action - delta = np.clip(delta, -self.velocity_limit, self.velocity_limit) - action = self._last_action + delta - - self._last_action = action.copy() - outputs["action"] = action - return outputs - - def on_reset(self): - self._last_action = None -``` +import json +from pathlib import Path -### EpisodeLoggingCallback +def load_exported_policy(path: str | Path) -> ExportedPolicy: + path = Path(path) + raw = json.loads((path / "manifest.json").read_text()) -```python -# physicalai/callbacks/logging.py - -class EpisodeLoggingCallback(Callback): - """Log episode data for replay/debugging.""" - - def __init__(self, log_dir: Path, log_observations: bool = True): - self.log_dir = Path(log_dir) - self.log_observations = log_observations - self._episode_data = [] - self._episode_count = 0 - - def on_predict_end(self, outputs: dict, inputs: dict | None = None) -> dict: - step_data = {"action": outputs["action"].tolist()} - if self.log_observations and inputs: - step_data["observation"] = {k: v.tolist() for k, v in inputs.items()} - self._episode_data.append(step_data) - return outputs - - def on_reset(self): - if self._episode_data: - self._save_episode() - self._episode_data = [] - self._episode_count += 1 - - def _save_episode(self): - path = self.log_dir / f"episode_{self._episode_count:04d}.json" - path.parent.mkdir(parents=True, exist_ok=True) - path.write_text(json.dumps(self._episode_data)) + # Build LeRobot's own runner (standalone, numpy-only) + runner_config = raw["model"]["runner"] + runner = build_runner(runner_config) + + # Load normalizer from manifest specs + preprocessors = raw["model"].get("preprocessors", []) + postprocessors = raw["model"].get("postprocessors", []) + normalizer = Normalizer.from_specs(preprocessors + postprocessors, path) + + # Load backend adapter + artifacts = raw["model"]["artifacts"] + adapter = ONNXRuntimeAdapter(path / artifacts["model"]) + + return ExportedPolicy(runner=runner, adapter=adapter, normalizer=normalizer) ``` +LeRobot's runners, normalizer, and adapters are its own implementations with zero overlap with PhysicalAI's. The only shared artifact is `manifest.json` on disk. + --- -## 8. Unified Manifest Format +## 5. Runner Mapping -All exported models — regardless of source framework — use the same `manifest.json` format. +### `model.runner.type` to Runner -### Why One Format +| `runner.type` | PhysicalAI Runner | LeRobot Runner | Policies | +| ----------------- | ------------------------------------ | ----------------------------------------- | ---------------- | +| `action_chunking` | `ActionChunkingRunner(SinglePass())` | `ActionChunkingWrapper(SinglePassRunner)` | ACT, VQBeT | +| `iterative` | `IterativeRunner(SinglePass())` | `IterativeRunner` | Diffusion, TDMPC | +| `two_phase` | `TwoPhaseRunner(encoder, Iterative)` | `TwoPhaseRunner` | PI0, SmolVLA | -Previous designs had two formats: `metadata.yaml` for physicalai-train and `manifest.json` for LeRobot. This created unnecessary divergence: +### Runner Parameters (All in `model.runner`) -- Two parsers to maintain -- Two sets of schema conventions -- Confusion about which format to use +| Parameter | Used By | Description | +| --------------------- | ------------------------------------- | --------------------------------------- | +| `chunk_size` | action_chunking | Size of predicted action chunk | +| `n_action_steps` | action_chunking, two_phase, iterative | Actions to execute per chunk | +| `num_inference_steps` | iterative, two_phase | Number of denoising steps | +| `scheduler` | iterative, two_phase | Scheduler algorithm (euler, ddpm, ddim) | +| `horizon` | iterative | Planning horizon (Diffusion, TDMPC) | -The unified `manifest.json` format eliminates this. Benefits: +--- -- **One parser** — simpler codebase, fewer bugs -- **One schema** — consistent across all model sources -- **No special-casing** — the loader doesn't need to know where a model came from -- **JSON for data, not code** — `policy.kind` maps to built‑in runners; `class_path` is only for exotic patterns +## 6. Normalization Handling -### Unified Loading +LeRobot policies operate on **normalized** inputs and produce **normalized** outputs. The manifest declares normalization as transforms in `model.preprocessors` and `model.postprocessors`: -```python -# Works with LeRobot packages -model = InferenceModel("./lerobot_package") +```json +"preprocessors": [ + { + "type": "normalize", + "mode": "mean_std", + "artifact": "stats.safetensors", + "features": ["observation.state"] + } +], +"postprocessors": [ + { + "type": "denormalize", + "mode": "mean_std", + "artifact": "stats.safetensors", + "features": ["action"] + } +] +``` -# Works with physicalai-train packages -model = InferenceModel("./physicalai_train_package") +PhysicalAI resolves these to `StatsNormalizer` (preprocessor) and `StatsDenormalizer` (postprocessor), which load stats from `stats.safetensors` and apply per-feature transforms. -# Works with custom packages -model = InferenceModel("./custom_package") +### Normalization Modes -# All read manifest.json — same code path -``` +| Mode | Normalize | Denormalize | +| ---------- | --------------------------------- | --------------------------------- | +| `mean_std` | `(x - mean) / std` | `x * std + mean` | +| `min_max` | `(x - min) / (max - min) * 2 - 1` | `(x + 1) / 2 * (max - min) + min` | +| `identity` | passthrough | passthrough | + +Statistics are stored in `safetensors` format with `{feature}/mean`, `{feature}/std`, `{feature}/min`, `{feature}/max` tensors. --- -## 9. Testing Compatibility +## 7. Usage Examples -### Conformance Test Suite +### Basic Usage ```python -# tests/format_loaders/test_lerobot_loader.py - -class TestLeRobotFormatLoaderConformance: -"""Verify physicalai correctly loads LeRobot packages.""" - - def test_detect_lerobot_package(self, lerobot_package_path): - """Format loader detects LeRobot packages.""" - assert LeRobotFormatLoader.detect(lerobot_package_path) - - def test_load_single_shot(self, act_package_path): - """Load single_shot policy.""" - model = InferenceModel(act_package_path) - assert isinstance(model.runner, SinglePassRunner) - - def test_load_iterative(self, pi0_package_path): - """Load iterative policy.""" - model = InferenceModel(pi0_package_path) - assert isinstance(model.runner, IterativeRunner) - assert model.runner.num_steps == 10 # from manifest - - def test_override_num_steps(self, pi0_package_path): - """Override iterative params at load time.""" - model = InferenceModel(pi0_package_path, num_steps=20) - assert model.runner.num_steps == 20 - - def test_parity_with_lerobot_runtime(self, pi0_package_path): - """Output matches LeRobot's own runtime.""" -# Load with physicalai -ik_model = InferenceModel(pi0_package_path) - - # Load with LeRobot - from lerobot.export import load as lerobot_load - lr_runtime = lerobot_load(pi0_package_path) - - # Compare outputs - obs = generate_test_observation() - np.random.seed(42) - ik_output = ik_model(obs) - np.random.seed(42) - lr_output = lr_runtime.predict_action_chunk(obs) - - np.testing.assert_allclose(ik_output["action"], lr_output, rtol=1e-5) +from physicalai import InferenceModel + +# Load LeRobot-exported policy (detected automatically via manifest.json) +model = InferenceModel("./act_exported") + +observation = { + "observation.image": image_array, # float32, shape (1, 3, 96, 96) + "observation.state": state_array, # float32, shape (1, 14) +} +outputs = model(observation) +action = outputs["action"] # float32, shape (1, 14) ``` ---- +### With Callbacks -## 10. Summary +```python +from physicalai import InferenceModel +from physicalai.inference.callbacks import TimingCallback -### What physicalai Adds Over LeRobot Runtime +model = InferenceModel("./pi0_exported", callbacks=[TimingCallback()]) +outputs = model(observation) +``` -| Feature | LeRobot Runtime | physicalai | -| ----------------------------------- | --------------- | ----------------------- | -| Load PolicyPackage | ✓ | ✓ | -| Single-pass inference | ✓ | ✓ | -| Iterative inference | ✓ | ✓ | -| Two-phase inference | ✓ | ✓ | -| Action queue wrapper | ✓ | ✓ | -| Callbacks (timing, logging, safety) | ✗ | ✓ | -| Multi-backend with fallback | ✗ | ✓ | -| Preprocessor/postprocessor chains | ✗ | ✓ | -| Unified manifest format | ✗ | ✓ (same format for all) | +### Override Runner Parameters -### Dependency Direction +```python +model = InferenceModel( + "./diffusion_exported", + num_steps=20, # Override manifest default of 100 + scheduler="ddim", # Override manifest default of "ddpm" +) +``` -```text -LeRobot ──────────────────────────────────────────────────┐ - │ │ - │ defines (proposed) │ - ▼ │ -manifest.json (unified format) │ - │ │ - │ consumed by │ - ▼ │ -physicalai (unified manifest loader) ◄─────────┘ - no dependency on LeRobot code +### Real-Time Control + +```python +policy = InferenceModel("./act_exported") +policy.reset() + +while not done: + action = policy.select_action(observation) + observation, reward, done, info = env.step(action) + +policy.reset() ``` -**LeRobot does not depend on physicalai.** -**physicalai can load LeRobot packages without importing LeRobot.** -**physicalai-train exports the same manifest.json format — no special handling needed.** +--- + +## 8. Supported Policies -> **Reminder:** This integration depends on LeRobot adopting the proposed PolicyPackage export format. If LeRobot adopts a different format, the manifest loader implementation changes but the architecture (unified loader, no runtime dependency) remains the same. +| Policy | `runner.type` | Runner Stack | Artifact Roles | +| --------- | --------------- | -------------------------------------------- | -------------------- | +| ACT | action_chunking | ActionChunking(SinglePass) | `model` | +| VQBeT | action_chunking | ActionChunking(SinglePass) | `model` | +| Diffusion | iterative | ActionChunking(Iterative(SinglePass)) | `model` | +| TDMPC | iterative | Iterative(SinglePass) with MPC | `model` | +| PI0 | two_phase | ActionChunking(TwoPhase(encoder, Iterative)) | `encoder`, `denoise` | +| SmolVLA | two_phase | ActionChunking(TwoPhase(encoder, Iterative)) | `encoder`, `denoise` | --- ## Related Documents -- **[Strategy](../../architecture/strategy.md)** - Big-picture architecture -- **[Inference Core Design](./inferencekit.md)** - Domain-agnostic inference layer -- **[LeRobot Export Suggestions](../internal/lerobot-export-suggestions.md)** - Our proposed improvements to LeRobot's export API +- **[Inference Core Design](../components/inferencekit.md)** --- Domain-agnostic inference layer +- **[Strategy](../architecture/strategy.md)** --- Big-picture architecture and layering decisions +- **[Architecture](../architecture/architecture.md)** --- PhysicalAI runtime CLI and packaging --- -_Document version: 3.0_ -_Last updated: 2026-02-16_ +_Document version: 6.0_ +_Last updated: 2026-03-31_