triton-ng

WIP — work in progress, API is unstable

triton-ng

Rust SDK for NVIDIA Triton Inference Server.

Provides two things:

A safe Rust API for writing custom Triton backends (compiled as .so and loaded by Triton)
A high-level async gRPC client for sending inference requests to a running Triton server

Crates

Crate	Description
`triton-ng-sys`	Raw FFI bindings generated by bindgen from `tritonbackend.h`
`triton-ng`	Safe Rust wrapper over `triton-ng-sys`
`triton-ng-macros`	Proc-macros for `triton-ng`
`triton-ng-client`	High-level async gRPC client
`example/custom-backend`	Example custom backend (MNIST, proxies to ONNX model)
`example/app`	Example client application

Writing a custom backend

Implement the Backend trait and register it with declare_backend!:

use triton_ng::backend::Backend;
use triton_ng::{BackendHandle, DataType, Error, InferenceRequest, Response};

struct MyBackend;

impl Backend for MyBackend {
    fn initialize(backend: &BackendHandle) -> Result<(), Error> {
        Ok(())
    }

    fn model_instance_execute(
        model: triton_ng::Model,
        requests: &[triton_ng::Request],
    ) -> Result<(), Error> {
        for request in requests {
            let input = request.get_input("INPUT")?;
            let data = input.as_fp32_vec()?;

            // ... run inference ...

            let mut response = Response::new(request)?;
            response
                .create_output("OUTPUT", DataType::Fp32, &[1, 10])?
                .write_fp32_vec(&result)?;
            response.send()?;
        }
        Ok(())
    }
}

triton_ng::declare_backend!(MyBackend);

Build as a cdylib:

# Cargo.toml
[lib]
crate-type = ["cdylib"]

Using the gRPC client

use triton_ng_client::{InferInput, InferOptions, TritonClient, TritonClientConfig};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let client = TritonClient::new(TritonClientConfig::new("http://localhost:8001")).await?;

    let meta = client.model_metadata("my_model", None, None).await?;
    let n: usize = meta.inputs[0].shape.iter().map(|&d| d as usize).product();

    let response = client
        .infer(
            "my_model",
            None,
            [InferInput::fp32("INPUT", meta.inputs[0].shape.clone(), vec![0.0f32; n])],
            ["OUTPUT"],
            InferOptions::default(),
        )
        .await?;

    println!("{:?}", response.outputs[0].data);
    Ok(())
}

TLS:

use triton_ng_client::{ClientTlsConfig, TritonClientConfig};

let config = TritonClientConfig::new("https://triton.example.com:8001")
    .with_tls(ClientTlsConfig::new()); // uses system roots

Getting started

Prerequisites

Rust stable
NVIDIA driver 570+ (580+ for Blackwell / RTX 50xx)
NVIDIA Container Toolkit
Docker

First run

git submodule update --init --recursive
make build           # compile custom backend → target/release/libtriton_custom_backend.so
make download-model  # download mnist_onnx + create model version dirs
make docker-env-up   # start Triton (mounts .so and models/)

Run the example app

cargo run --manifest-path=example/app/Cargo.toml --release

Triton must be running with both models in READY state.

Run integration tests

make tests           # cargo nextest run --workspace

Tests require a running Triton instance (make docker-env-up).

Rebuild after backend changes

make build
make docker-env-down && make docker-env-up

Features

Feature	Description
`cuda`	Enable GPU and pinned memory allocation in `ResponseAllocator`

triton-ng = { version = "0.1", features = ["cuda"] }

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.config		.config
deps		deps
example		example
models		models
proto		proto
triton-ng-client		triton-ng-client
triton-ng-macros		triton-ng-macros
triton-ng-sys		triton-ng-sys
triton-ng		triton-ng
.clippy.toml		.clippy.toml
.env		.env
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENCE		LICENCE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

triton-ng

Crates

Writing a custom backend

Using the gRPC client

Getting started

Prerequisites

First run

Run the example app

Run integration tests

Rebuild after backend changes

Features

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

triton-ng

Crates

Writing a custom backend

Using the gRPC client

Getting started

Prerequisites

First run

Run the example app

Run integration tests

Rebuild after backend changes

Features

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages