upload for Multi-graphics inference by qqtang-code · Pull Request #10 · ByteDance-Seed/FlexPrefill

qqtang-code · 2025-03-26T11:17:45Z

On the 3090 or 4090 graphics card, the maximum block_size = 64 is supported. At the same time, you need to add with torch.cuda.device(x.device): and change the mask position to support multi-card reasoning

upload for Multi-graphics inference

5dfb12a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upload for Multi-graphics inference#10

upload for Multi-graphics inference#10
qqtang-code wants to merge 1 commit intoByteDance-Seed:mainfrom
qqtang-code:main

qqtang-code commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qqtang-code commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants