This repository details a high-performance face recognition pipeline deployed on the NVIDIA Triton Inference Server using an ensemble strategy.
This service implements a complete face recognition workflow by chaining three distinct models:
- UltraFace: For fast face detection.
- face_preprocess (Python Backend): Acts as the orchestrator, performing detection post-processing, face cropping, image format conversion, and synchronous internal calls to the ArcFace model.
- ArcFace: For generating L2-normalized 512-D face embeddings.
The pipeline is defined entirely by the face_recognition_ensemble configuration.
| I/O | Name | Data Type | Dimensions | Description |
|---|---|---|---|---|
| Input | image_input |
TYPE_FP32 |
[1, 3, 480, 640] |
Normalized input image (NCHW). |
| Output | face_tokens |
TYPE_FP32 |
[-1, 512] |
L2-normalized 512-D embeddings for N faces. |
| Output | face_images |
TYPE_FP32 |
[-1, 3, 112, 112] |
Cropped face images (N faces, NCHW, normalized). |
| Model Name | Type | Input (Name/Dims) | Output (Name/Dims) | Key Role |
|---|---|---|---|---|
ultraface |
Detector | input / [1, 3, 480, 640] |
scores / [1, 17640, 2], boxes / [1, 17640, 4] |
Face localization. |
arcface |
Embedder | input_1 / [1, 112, 112, 3] |
embedding / [1, 512] |
Feature extraction. |
face_preprocess |
Python Backend | scores, boxes, image |
face_tokens, face_images |
NMS, Cropping, ArcFace Orchestration. |
- NVIDIA Triton Inference Server: Ensure the server is installed and running, supporting the Python Backend.
- Model Files: You must acquire and place the
ultrafaceandarcfaceONNX/TensorRT model files into their respective version directories (e.g.,ultraface/1/model.onnx). - Python Dependencies (for
face_preprocess): The Triton Python Backend container must have the following packages installed:pip install numpy opencv-python tritonclient
Organize your model repository as follows:
<model_repository>
├── ultraface/
│ └── 1/
│ └── model.onnx # UltraFace model file
├── arcface/
│ └── 1/
│ └── model.onnx # ArcFace model file
├── face_preprocess/
│ ├── config.pbtxt
│ └── 1/
│ └── model.py # The Python backend script
└── face_recognition_ensemble
└── config.pbtxt # Ensemble definitionStart the Triton server, pointing it to your configured model repository.
tritonserver --model-repository=/path/to/model_repositoryThe client must prepare the input image to match the ensemble's expectation:
- Shape:
[1, 3, 480, 640](NCHW format). - Data Type:
TYPE_FP32(32-bit float). - Normalization: The image must be pre-normalized according to the UltraFace model's requirements (e.g., typically a zero-mean, unit-variance or similar scale, depending on the model's training).
Use the Triton client (e.g., Python tritonclient.grpc) to request inference on the ensemble model:
# Pseudo-code for client request
client.infer(
model_name="face_recognition_ensemble",
inputs=[
grpcclient.InferInput("image_input", image_data.shape, "FP32").set_data_from_numpy(image_data)
],
outputs=[
grpcclient.InferRequestedOutput("face_tokens"),
grpcclient.InferRequestedOutput("face_images")
]
)The service returns two variable-sized (dynamic batch size, N) outputs:
-
face_tokens: Embeddings for N detected faces. Use these 512-D vectors for similarity calculation. -
face_images: The cropped and normalized$112 \times 112$ face images. Useful for debugging or visualization.
1, 112, 112, 3
-
UltraFace outputs (scores, boxes) are normalized coordinates (
[0, 1]) relative to the$640 \times 480$ input dimensions. - The
face_preprocessscript scales these back to pixel values for cropping and performs necessary image format conversions.
- The
face_preprocessscript internally handles the model's required input format: NCHW ([1, 3, 112, 112]) must be transposed to NHWC ([1, 112, 112, 3]) to correctly match thearcfacemodel's configuration.
- The current implementation of
_crop_and_resize_faceuses a simple crop and resize to$112 \times 112$ based on the bounding box. It does NOT perform landmark-based affine alignment. This design choice (using the simple crop) may impact the quality of the ArcFace embeddings compared to a pipeline using sophisticated alignment.
- The
face_preprocessincludes error handling (_call_arcface) that returns random normalized features as a fallback if the internal ArcFace call fails, preventing the entire ensemble from crashing.
name: "face_recognition_ensemble"
platform: "ensemble"
input [
{
name: "image_input"
data_type: TYPE_FP32
dims: [1, 3, 480, 640]
}
]
output [
{
name: "face_tokens"
data_type: TYPE_FP32
dims: [-1, 512]
},
{
name: "face_images"
data_type: TYPE_FP32
dims: [-1, 3, 112, 112]
}
]
ensemble_scheduling {
step [
{
model_name: "ultraface"
model_version: -1
input_map { key: "input", value: "image_input" }
output_map { key: "scores", value: "ultraface_scores" }
output_map { key: "boxes", value: "ultraface_boxes" }
},
{
model_name: "face_preprocess"
model_version: -1
input_map { key: "scores", value: "ultraface_scores" }
input_map { key: "boxes", value: "ultraface_boxes" }
input_map { key: "image", value: "image_input" }
output_map { key: "face_tokens", value: "face_tokens" }
output_map { key: "face_images", value: "face_images" }
}
]
}