Skip to content

Record: 11L EMA + GPTQ-lite + LeakyReLU^2 + QAT@0.15#926

Open
NandhuRajRK wants to merge 3 commits intoopenai:mainfrom
NandhuRajRK:draft/nandh-11l-gptqlite-leakyrelu
Open

Record: 11L EMA + GPTQ-lite + LeakyReLU^2 + QAT@0.15#926
NandhuRajRK wants to merge 3 commits intoopenai:mainfrom
NandhuRajRK:draft/nandh-11l-gptqlite-leakyrelu

Conversation

@NandhuRajRK
Copy link
Copy Markdown

Summary

This PR adds a new record attempt based on the public 2026-03-22_11L_EMA_GPTQ-lite_warmdown3500_QAT015_1.1233 family, with LeakyReLU(0.5)^2 in the MLP path.

Architecture and Training

  • 11 transformer layers
  • model dim 512
  • 8 attention heads
  • 4 KV heads
  • 3x MLP expansion
  • EMA
  • late QAT with threshold 0.15
  • warmdown 3500
  • LeakyReLU(0.5)^2
  • int6 GPTQ-lite style export
  • portability fixes so the folder can also run in non-FA3 environments

Result

On an 8xH100 run, this folder produced:

  • step:4260/20000 val_bpb:0.8705
  • DIAGNOSTIC post_ema val_bpb:0.8705
  • final_int6_roundtrip_exact val_bpb:0.87762377
  • Total submission size int6+zstd: 15825448 bytes

Notes

  • record folder is self-contained
  • artifact is under the 16 MB cap
  • matched control and follow-up verification are in progress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants