This project is the implementation of a light-weight library for LLM management and monitoring, from training to inference. It also includes an interface to chat with the model, and with models from 🤗 API, locally or remotly.
Explore the docs »
Request Feature
- Tokenization
- BPE implementation in Python
- Rust implementation
- Positional embedding
- Absolute
- rotary
- DenseTransformer
- Attention mechanism
- Multihead attention
- flash attention
- FFN, RMSNorm layers
- Training
- Pre-training
- fine-tuning
- intruction tuning
- rlhf, dpo
- ddp, fsdp method
- Sampling
- temperature
- top-k, top-p
- beam-search
- Too move beyond
- KV-cache
- sliding window
- memory layers?
- MoE
- Quantization
- Training on Synthetic Data
- generate data
- model teacher
This project has been developed and tested with Python 3.12. To manage dependencies, I recommend using uv.
- Clone the repo
git clone git@github.com:art-test-stack/gpt-lib.git
- Install dependencies
If running on Linux with CUDA available, you can install the GPU version of PyTorch by running:
uv sync
uv sync --extra cuda
Note
Make sure to adjust the CUDA version in uv.toml if needed. This extra is only available for Linux systems with compatible NVIDIA GPUs. It permits using flash_attention for faster attention computation.
The tokenizer training script is located in scripts/train_tokenizer.py. It allows you to train a BPE tokenizer on a custom corpus, using different implementations (tiktoken, HuggingFace, or custom BPE implementations). You can also choose to write the corpus from sources (e.g., Wikipedia, OpenWebText) or load an existing corpus.
Training time benchmarks for different implementations and configurations. All the tokenizers were trained on corpus generated from gpt_lib.tokenizer.corpus.TokenizerCorpus() with default settings, tuned with variable vocab_size.
| Implementation | Vocabulary size | Num proc | Corpus size | Training time |
|---|---|---|---|---|
| huggingface | 32,000 | 7 | 112.58 MB | 11.45 seconds <!-- |
Coming soon...
In this section, you will find instructions to run the chat interface with different models.
Under development environment (ENV='development' in .env), you can run the chat interface with auto-reloading, use the following command:
uv run gradio scripts/chat_app.py --demo-name=appOtherwise, if you don't want auto-reloading, use:
uv run python -m scripts.chat_appThen, open your browser and go to http://127.0.0.1:7860/. It is quite straightforward to use. You can select different models (local or remote), choose some hyperparameters for inference, and chat with the model.
- Attention is all you need
- Building a text generation model from scratch by Vincent Bons
- nanoGPT by Andrej Karpathy
- Training Compute-Optimal Large Language Models
- Training language models to follow instructions with human feedback
Distributed under the MIT License. See LICENSE.txt for more information.
Arthur Testard - arthur.testard.pro@gmail.com
Project Link: https://github.com/art-test-stack/gpt-lib