Skip to content

WIP: 1.1xxx BPB - MDL-T Stack (LeakyReLU² + EMA + GPTQ-lite + LateQAT + warmdown3500 + int6+zstd-22)#934

Open
tuanaqeelbohoran wants to merge 1 commit intoopenai:mainfrom
tuanaqeelbohoran:mdlt-stack-submission
Open

WIP: 1.1xxx BPB - MDL-T Stack (LeakyReLU² + EMA + GPTQ-lite + LateQAT + warmdown3500 + int6+zstd-22)#934
tuanaqeelbohoran wants to merge 1 commit intoopenai:mainfrom
tuanaqeelbohoran:mdlt-stack-submission

Conversation

@tuanaqeelbohoran
Copy link
Copy Markdown

Summary

Novel MDL-T (Minimum Description Length Training) regularizer that directly optimises weight compressibility as part of the training objective, stacked with several complementary techniques:

  • LeakyReLU(0.5)² MLP activation (proven +0.003 BPB gain on leaderboard)
  • MDL-T regularizer during warmdown: pulls weights toward int6 quantisation gridpoints by minimising mean[Var(W - Q(W)) / Var(W)] — a scale-invariant measure of quantisation noise fraction
  • EMA decay=0.997 (CPU shadow copy, swapped in at serialisation)
  • warmdown_iters=3500 (extended from 1200 to give MDL-T more clustering time)
  • GPTQ-lite per-tensor clip search (5 percentile candidates, min MSE)
  • Late QAT STE int6 (last 15% of training, triggers single recompile)
  • int6 per-row for all blocks.* weights (31 levels), int8 for embeddings
  • zstd-22 compression

Status

WIP — BPB pending H100 run. Local 3060 smoke tests completed (2000 steps). Full 20k-step results will be filled in when H100 access is confirmed.

Test plan

  • Local 2000-step smoke test passes, int6+zstd roundtrip verified
  • Full 20k-step run on 8×H100 (~10 min)
  • Update submission.json with final val_bpb and bytes_total
  • Verify compressed artifact ≤ 16,000,000 bytes
  • Convert from draft to ready

…T + int6+zstd-22)

Novel MDL-T regularizer pulls weights toward int6 gridpoints during warmdown,
stacked with LeakyReLU(0.5)^2, EMA(0.997), GPTQ-lite clip search, Late QAT STE,
and warmdown=3500. BPB pending H100 run — submitting as WIP/draft for visibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant