Skip to content

bijinc/helios

Repository files navigation

helios

Distributed model inference server with a router and multiple workers.

Build

On macOS, keep protobuf ABI aligned with grpc + onnxruntime from Homebrew.

cmake -S . -B build
cmake --build build

Run router + workers

Use the helper script after build:

chmod +x scripts/start_cluster.sh
scripts/start_cluster.sh 2 resnet50 models/resnet50-v1-7.onnx

This starts:

  • workers on 127.0.0.1:50052, 127.0.0.1:50053, ...
  • router on 127.0.0.1:50051

Client test

Install Python dependencies:

python -m pip install -r requirements.txt

Generate Python gRPC stubs (if needed):

python -m grpc_tools.protoc -I proto --python_out=. --grpc_python_out=. proto/inference.proto

Run the client:

python client.py --router localhost:50051 --model_id resnet50 --tokens 1.0,2.0,3.0,4.0

Throughput Performance on Simultaneous Requests

Workload used for comparison:

  • model: resnet50
  • input shape: 1 x 3 x 224 x 224
  • total requests: 200
  • concurrency: 50 threads

Measured results from the latest run:

Workers Throughput (req/s) Success Errors
1 10.512 200 0
3 22.259 200 0
5 36.883 200 0

Highlights:

  • Request handling is stable (0 errors in all scenarios).
  • Throughput now increases with worker count in this environment.
  • Router dispatch is balanced across workers in multi-worker runs.

To reproduce this benchmark:

python scripts/throughput_benchmark.py

To evaluate scale under a latency SLO/SLA:

python scripts/scaling_benchmark.py --workers 1,3,5 --concurrency-levels 20,40,60,80 --total-requests 180 --warmup 6 --sla-p95-ms 1500

About

distributed model inference server

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors