CUDA backend branch on my fork

I have a full CUDA backend implementation on my fork:

- https://github.com/peters/sam3.cpp/tree/cuda-backend (`bf83144`)
- https://github.com/peters/ggml/tree/cuda-backend (`04623eff`)

Includes CUDA backend support, demos, benchmark/test tooling, and the ggml CUDA fix needed for PCS/text-prompt mode.

Perf notes:
https://github.com/peters/sam3.cpp/blob/bf831442965c4918b566086b9e3aaa8ad1ab40ff/docs/cuda_performance.md#cuda-performance-optimization-log

RTX 4080:
- SAM3 f16 ~1100 ms/frame
- SAM3 q4_0 1030 ms/frame
- SAM2.1 tiny f16 118 ms/frame
- SAM2.1 tiny q8_0 111 ms/frame
- SAM2.1 tiny q4_0 125 ms/frame


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA backend branch on my fork #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUDA backend branch on my fork #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions