Skip to content

art-test-stack/gpt-lib

Repository files navigation

Stargazers MIT License LinkedIn


Generative Pre-trained DenseTransformer Library

This project is the implementation of a light-weight library for LLM management and monitoring, from training to inference. It also includes an interface to chat with the model, and with models from 🤗 API, locally or remotly.
Explore the docs »

Request Feature

About The Project

Built With

  • Torch <<3
  • huggingface-shield (datasets, transformers, tokenizer, hub)
  • gradio-shield (web interface)
  • tiktoken-shield (tokenizer)

Roadmap

  • Tokenization
    • BPE implementation in Python
    • Rust implementation
  • Positional embedding
    • Absolute
    • rotary
  • DenseTransformer
    • Attention mechanism
    • Multihead attention
    • flash attention
    • FFN, RMSNorm layers
  • Training
    • Pre-training
    • fine-tuning
    • intruction tuning
    • rlhf, dpo
    • ddp, fsdp method
  • Sampling
    • temperature
    • top-k, top-p
    • beam-search
  • Too move beyond
    • KV-cache
    • sliding window
    • memory layers?
    • MoE
    • Quantization
  • Training on Synthetic Data
    • generate data
    • model teacher

Get Started

This project has been developed and tested with Python 3.12. To manage dependencies, I recommend using uv.

  1. Clone the repo
    git clone git@github.com:art-test-stack/gpt-lib.git
  2. Install dependencies
     uv sync
    If running on Linux with CUDA available, you can install the GPU version of PyTorch by running:
    uv sync --extra cuda

Note

Make sure to adjust the CUDA version in uv.toml if needed. This extra is only available for Linux systems with compatible NVIDIA GPUs. It permits using flash_attention for faster attention computation.

Usage

Tokenizer

The tokenizer training script is located in scripts/train_tokenizer.py. It allows you to train a BPE tokenizer on a custom corpus, using different implementations (tiktoken, HuggingFace, or custom BPE implementations). You can also choose to write the corpus from sources (e.g., Wikipedia, OpenWebText) or load an existing corpus.

Training time benchmarks for different implementations and configurations. All the tokenizers were trained on corpus generated from gpt_lib.tokenizer.corpus.TokenizerCorpus() with default settings, tuned with variable vocab_size.

Implementation Vocabulary size Num proc Corpus size Training time
huggingface 32,000 7 112.58 MB 11.45 seconds <!--

Training a model

Coming soon...

Chat with the model

In this section, you will find instructions to run the chat interface with different models.

Under development environment (ENV='development' in .env), you can run the chat interface with auto-reloading, use the following command:

uv run gradio scripts/chat_app.py --demo-name=app

Otherwise, if you don't want auto-reloading, use:

uv run python -m scripts.chat_app

Then, open your browser and go to http://127.0.0.1:7860/. It is quite straightforward to use. You can select different models (local or remote), choose some hyperparameters for inference, and chat with the model.

Data

Pre-training Data Summary

Sources

  1. Attention is all you need
  2. Building a text generation model from scratch by Vincent Bons
  3. nanoGPT by Andrej Karpathy
  4. Training Compute-Optimal Large Language Models
  5. Training language models to follow instructions with human feedback

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Arthur Testard - arthur.testard.pro@gmail.com

Project Link: https://github.com/art-test-stack/gpt-lib

(back to top)

About

A light weight Python library to fully manage LLMs, from training to inference, on local or remote server.

Resources

License

Stars

Watchers

Forks

Contributors