Recursive Transformer 4B/7L + VE + QAT + TTT — val_bpb 1.1696 (3-seed mean)#927
Open
Tonyy1977 wants to merge 2 commits intoopenai:mainfrom
Open
Recursive Transformer 4B/7L + VE + QAT + TTT — val_bpb 1.1696 (3-seed mean)#927Tonyy1977 wants to merge 2 commits intoopenai:mainfrom
Tonyy1977 wants to merge 2 commits intoopenai:mainfrom
Conversation
… mean) True Universal Transformer: 4 shared blocks x 7 loops (7x weight reuse), dim=1024, int6 QAT from step 0, score-first TTT+sliding window eval. 3-seed mean: 1.1696 BPB, 15.85MB artifact, 600s training on 8xH100.
Required for zstd-22 compression of the int8 quantized model artifact. Without it, the script falls back to zlib which produces 17.5MB (over 16MB budget). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Recursive transformer: 4 shared blocks × 7 loops (7× weight reuse) at dim=1024, with ValueEmbedding, int6 QAT from step 0, and score-first TTT+sliding window eval.
3-seed mean: 1.1696 BPB | ~15.85MB artifact | 600s on 8xH100 SXM
Key novelty
Unlike other depth recurrence submissions that repeat 1-2 layers on top of 10-11 unique blocks (~1.2× reuse), this uses 4 shared blocks looped 7 times (7× reuse). This enables dim=1024 (2× wider than standard 512) while staying under 16MB.
Architecture highlights
See README.md in the submission folder for full details and negative results.