Minimal GPT

UPDATE-1, Aug 6

Added Support for Rotary Positional Embeddings. Improved performance for poorly tokenized languages.
Replaced LayerNorm with RMSNorm, however the difference seems minimal, on compute and performance, both.
Added a Dataset preprocessor/generator. Tokenizes a dataset, shuffles and writes val and train datasets to pickle binaries.
A better Dataloader, shuffles datasets, after each epoch.*
Also added an Inferencing script, currently supports Top-P and Top-K generation strategies.

epoch* : Epoch here means 1 full run through the dataset. If your dataset has 131072 tokens, and each training step you're going through 32768 tokens, then 32768 * 4 = 131072, 4 training steps will be equal to 1 epoch.

Features

Maintains most of the original params from the paper.
Didn't use weight sharing scheme of LM Head and token embeddings. (hence the increased model size at 163M params)
Uses Shakespeare Toy dataset.
Gradient Accumulation is done to an approximate of 32k tokens instead of 0.5M as stated in the paper.
Optimizations done:
- Flash Attention instead of usual attn calculation (10X speedup)
- autocasting and reduced matmul precision; my gpu (RTX3060) supports bfloat16
- changing params to a multiple of 2, like vocab size, batch size

Also has code for Bigram Language Model, and attention variant of it as well.(gpt)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ReadME.md		ReadME.md
bigram.py		bigram.py
dataset.py		dataset.py
generate.py		generate.py
gpt.py		gpt.py
gpt2.py		gpt2.py
playground.ipynb		playground.ipynb
train_gpt2.py		train_gpt2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Minimal GPT

UPDATE-1, Aug 6

Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Minimal GPT

UPDATE-1, Aug 6

Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages