Skip to content

feat: FA3-FP8-extension#552

Open
Marius-Graml wants to merge 4 commits intomainfrom
feat/fa3-fp8-extension
Open

feat: FA3-FP8-extension#552
Marius-Graml wants to merge 4 commits intomainfrom
feat/fa3-fp8-extension

Conversation

@Marius-Graml
Copy link
Contributor

@Marius-Graml Marius-Graml commented Feb 24, 2026

Description

Add fp8 quantization to flashattn3 algorithm as optional boolean hyperparameter. Further, add optional target modules for fa3 as well as fp8 quantization. The logic is as follows:

  1. Apply target modules + no fp8: FA3 is only applied to target modules
  2. Apply target modules + fp8: FP8 quantization is applied to target modules only. FA3 is applied to all modules.

Following speed ups / results are for Wan-AI/Wan2.2-TI2V-5B-Diffusers.

Settings

  • num_frames = 241
  • height = 1280
  • width = 704
  • guidance_scale = 6.0
  • num_inference_steps = 30
  • seed: 20

1) Original

Inference Speed: 483s
Result:

original.mp4

2) FA3-FP16

Inference speed diffusers version >= 0.35.0.dev0: 362s
Inference speed diffusers version < 0.35.0.dev0: 360s
Result (identical to original as FA3 is lossless):

fa3.mp4

3) FA3-FP8

Inference speed diffusers version >= 0.35.0.dev0: 321s
Inference speed diffusers version < 0.35.0.dev0: 360s
Result:

fa3_fp8.mp4

4) FA3-FP8 (first and last transformer block excluded from quantization)

Inference speed diffusers version >= 0.35.0.dev0: 324s
Result:

fa3_fp8_excluded.mp4

5) FA3-FP8 (first and last two transformer blocks excluded from quantization)

Inference speed diffusers version >= 0.35.0.dev0: 327s
Result:

fa3_fp8_excluded2.mp4

Related Issue

/

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Manual test runs, see above

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes

/

@Marius-Graml Marius-Graml changed the title Feat: FA3-FP8-extension feat: FA3-FP8-extension Feb 24, 2026
@ParagEkbote
Copy link
Contributor

ParagEkbote commented Feb 24, 2026

I think it fixes #373, feel free to rectify if needed.

@Marius-Graml
Copy link
Contributor Author

I think it fixes #373, feel free to rectify if needed.

Yes you're right. Thanks for pointing that out :)

@Marius-Graml Marius-Graml requested review from gsprochette and removed request for gsprochette and johannaSommer February 25, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants