[Notable Non-Record Submission] 1.1090 BPB - 74.3M Ternary U-Net Transformer (100k steps/3h) by CiprianFlorin-Ifrim · Pull Request #923 · openai/parameter-golf

CiprianFlorin-Ifrim · 2026-03-27T01:41:36Z

Notable: 1.1090 BPB - 74.3M Ternary U-Net Transformer (100k steps, unconstrained)

Extended training of #640 / #641 / #920 config with SmearGate enabled

val_bpb: 1.1090 (sliding, stride=16, T=0.90) | 15.95 MB artifact | 8xH100 SXM, ~3h

Same architecture as #920 (10L 768d ternary, EMBED_DIM=312, BF16 scales) trained for 100k steps unconstrained, with SmearGate enabled. Not a valid 10-minute submission - included to show scaling behaviour of the ternary U-Net architecture.

Results

Metric	10min best (#641)	100k steps (this)	Delta
Sliding BPB	1.1535	1.1090	-0.0445
val_bpb	1.1802	1.1344	-0.0458
RT bpb	1.1808	1.1366	-0.0442
RT gap	0.0006	0.0022	+0.0016
Steps	6,530	100,000	15.3x
Training time	600s	~3h	-
Artifact	15.95 MB	15.95 MB	identical
zero_frac	0.236	0.181	-0.055

Extended training reduces zero_frac (0.236 -> 0.181) as the model utilises more of its ternary weight capacity. RT gap grows slightly (0.0006 -> 0.0022) due to the shrinkage correction amplification at longer training, but remains well-controlled with BF16 scale storage.

Why BF16 scales matter for extended training

Ternary dequantization applies a shrinkage correction 1/(1-zero_frac) to compensate for zeros reducing the group mean. FP16 scale storage introduces rounding error that gets multiplied by this factor. As training progresses and zero_frac changes, the amplification grows:

zero_frac	Correction 1/(1-z)
0.25	1.33x
0.30	1.43x
0.35	1.54x
0.40	1.67x
0.50	2.00x

The practical impact - FP16 vs BF16 scale storage at different training lengths:

Config	Steps	Scale storage	RT gap	Notes
Ternary 10min	6,530	FP16	0.0021	Original #640 submission
Ternary 10min	6,533	BF16	0.0011	#641, same arch
Ternary extended	150k	FP16	0.039	Catastrophic - unusable
Ternary extended	100k	BF16	0.0022	This run - well controlled

Without the changes applied, this extended run would have produced a 0.03+ BPB roundtrip gap, making the artifact unusable. The changes cost zero bytes and keep the gap at 0.0022 even at 100k steps.

Changes from #940

SmearGate enabled (SMEAR=1): learnable per-block gating for residual smoothing. Adds minimal params, provides small quality benefit at extended training.
100k iterations, no wallclock cap (MAX_WALLCLOCK_SECONDS=0)
Checkpointing every 5k steps for interruptible compute

Architecture, quantisation, compression, and all other hyperparameters identical to #940.

Setup and Run

bash setup.sh
conda activate golf
bash run_cuda_ternary.sh

…Record Leaderboard.

MatoTeziTanka · 2026-03-27T04:54:21Z

Really interesting scaling data — seeing the ternary architecture go from 1.1535 (10 min) to 1.1090 (100K steps) is exactly the kind of research this competition needs more of. The BF16 vs FP16 scale storage finding is a great catch too — that 0.039 BPB roundtrip gap at 150K steps with FP16 would've been brutal to debug without this data.

The zero_frac drop (0.236 → 0.181) with extended training is fascinating — the model actively learning to use more of its ternary capacity over time. Curious whether you've looked at whether that trend continues past 100K or if it plateaus.

One thing worth noting for others reading: the 1.1090 number is from a 3-hour unconstrained run, not the 10-minute track. The valid 10-min submission is #920 at 1.1539. Still, the architecture itself is one of the more creative entries in the competition. Ternary weights with U-Net skips is a direction nobody else is exploring.

Disclosure: I use Claude Code CLI, Codex CLI, and Gemini Pro as tools in my workflow. Human first, AI-assisted.

CiprianFlorin-Ifrim · 2026-03-27T12:43:14Z

@MatoTeziTanka yep, the 150k steps/3h is part of the title, also the "notable non-record" at the beginning of the title is showcasing that this is for the other leaderboard, not the main one. Thanks for your reply!

…_8192BPE_YaRN_NeoMuon_v2 directory as it is part of another branch/PR.

…10L_UNet_INT4FP8QAT_Brotli directory as it is part of another branch/PR.

Ciprian-Florin Ifrim and others added 5 commits March 26, 2026 22:11

Record: 1.2064 BPB - 32.8M LeWM-style JEPA Mamba-2 SSM + JEPA + SIGReg

7b7657f

Added diagrams to README.md.

eb9e6c1

Added submission files for the 10min Ternary v2 record.

522b7fe

Added the files for the Ternary U-Net submission for the Notable Non-…

b602e3c

…Record Leaderboard.

Updated PR numbers in README

daeb523

CiprianFlorin-Ifrim and others added 3 commits March 27, 2026 14:18

Delete records/track_10min_16mb/2026-03-27_74.3M_Ternary_UNet_FP8_10L…

3696d14

…_8192BPE_YaRN_NeoMuon_v2 directory as it is part of another branch/PR.

Delete records/track_non_record_16mb/2026-03-26_37M_LeWM_Jepa_Mamba2_…

a312bc6

…10L_UNet_INT4FP8QAT_Brotli directory as it is part of another branch/PR.

Small change to offer more customisation to the user.

d1aa5e7

CiprianFlorin-Ifrim mentioned this pull request Apr 5, 2026

[Notable Non-Record Submission] Everything Everywhere All in One Bit: XNOR-mally I'd use floats - 118M XNOR-Net - 1.539 BPB - 10-Min and Unconstrained Runs #1388

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Notable Non-Record Submission] 1.1090 BPB - 74.3M Ternary U-Net Transformer (100k steps/3h)#923

[Notable Non-Record Submission] 1.1090 BPB - 74.3M Ternary U-Net Transformer (100k steps/3h)#923
CiprianFlorin-Ifrim wants to merge 8 commits intoopenai:mainfrom
CiprianFlorin-Ifrim:submission_ternary_notable

CiprianFlorin-Ifrim commented Mar 27, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Mar 27, 2026 •

edited

Loading

Uh oh!

CiprianFlorin-Ifrim commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CiprianFlorin-Ifrim commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notable: 1.1090 BPB - 74.3M Ternary U-Net Transformer (100k steps, unconstrained)

Results

Why BF16 scales matter for extended training

Changes from #940

Setup and Run

Uh oh!

MatoTeziTanka commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CiprianFlorin-Ifrim commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CiprianFlorin-Ifrim commented Mar 27, 2026 •

edited

Loading

MatoTeziTanka commented Mar 27, 2026 •

edited

Loading