In this folder we implement a set of basic components for tokenizers and transformer layers. Additionally we include an experimental folder with a set of models that we are currently working on.
We provide the basic interface that all models must meet in model_shell.py. The shell is assumed to contain three components:
- Embedder: This component takes care of both tokenizing input, and also embedding the tokens into a dense, continuous representation that can be processed by the transformer layers. The embedder interface is given in embedding_models.py.
- Transformer Core: This component is the core of the model, and typically consists in a stack of transformer layers. We don't assume any particular interface for this, however we do implement a `generic transformer' that is intended to subsume most use cases.
- LM Head: This component takes the output of the transformer core and maps it to the output space of the model. We define the interface in model_heads.py.
- Tokenizer: The tokenization interface can be found in tokenizers/base_class.py. We additionally provide a BPE tokenizer and GPT2 style Tokenizer.
- Generator: In generator.py we implement a simple generator which only implements top-$k$ sampling. Feel free to extend this, but if so add an interface from which generators can inherit. N.B. while we have previously implemented kv-caching, it is not really worth it for the tiny models we are working with and thus was removed since it just adds complexity.
- Normalization: In normalization.py we implement RMSNorm, LayerNorm, and a pass-through layer.
- Positional Encodings: In positional_encodings.py we implement a variety of positional encodings, including the standard sinusoidal positional encodings, and the relative positional encodings from Shaw et al. (2018).
- Attention: Our attention layer implements, causal masks, groups, multi-head, and rotary embeddings.
- Feed Forward: In feed_forward.py we implement both the standard feedforward layer (with variable activation as well as the SwiGLU activation from Shazeer et al. (2020).)
Our experimental folder includes a number of models that we are currently working on. These are obviously highly subject to change, and in general we expect most components to be implemented here first before being added to the main folders if at all.
- Byte Level This includes components for building models that operate on the byte level
- Next Thought This includes components that are intended to function on a latent - latent basis, rather than a token - token basis.
- Huggingface Interface This just wraps the huggingface models in our interface so that we can test those models with our code, and compare them to our models fairly.